I moved to the US in September of 2005. I set a goal for myself to stay organized with regards to my finances and collect as much data as possible for data analysis. Since then, every single entry in my credit card statements is explicitly reviewed and categorized. I used Microsoft Money and then Quicken (after Money got discontinued) to collect and manage all the transactions.
A question that recently came into my mind was this…
Has the almost exclusive use of my motorcycle* for commuting had any positive impact on the consumption of gasoline over the years?
(* for reference: my BMW R1200GS :-)
I thought that I should be able to figure this out by doing some data analysis. Ok… it’s not a “big data” analysis problem but the exercise does incorporate the necessary steps one needs to undertake in order to get insight from some data, whether big or small.
Total spending per month
First step is to figure out my total spending on gasoline per month, which should be easy. Indeed, Quicken allowed me to sort all the transactions from the last 6 years. I copied the ones that were under the “gasoline” category to Excel and voila…
It was easy to calculate the monthly spending using Excel’s grouping function. I did have to play a bit with the dates so that I could make grouping work.*
There is definitely a trend towards less spending. Of course, the above doesn’t take into consideration the fluctuating price of the gasonline.
Finding historical gasoline prices
So, I had to go find the gasosline prices over time for the state of Washington. It took me a while on Bing to find a free source. The Department of Energy maintains historical data. Since I always use premium gas (better for the environment), I didn’t have to worry about averaging between the different types. I downloaded the Seattle data.
Grouping the data
Now that I had the data, I had to bring it into the same shape as my monthly-spending data. Again, some massaging of the dates, grouping, and I have the monthly average price of premium gasonline in the Seattle area.
Comparing the data
Unfortunately, I haven’t been collecting the actual miles that I did on per month basis. This makes it difficult to get an accurate view of the monthly spending in relation to the actual miles travelled and when compared to the prices of gasoline. Some months I travelled many more miles than others (e.g. road trips, visitors).
I do know that I have 63,000mi and 18,000mi on my car and motorcycle respectively. If I distribute these miles throughout the months, I find that on average I have been driving/riding 1,094.5mi per month. WOW!!!
With that information at hand, I can now calculate my monthly gallons consumption (again, given the even distribution of my total miles throughout the months**).
So, even though my car has been getting older, my miles/gallon efficiency has been going up. This is definitely a result of the heavier use of the motorcycle over the last two years. Also notice how the average price of gasonline has been going up in Seattle.
Here are few things I’ve learnt through this exercise…
- Excel is a great tool for playing with small amounts of data, even though some necessary features (for this scenario) were difficult to discover.
- The discovery of data that I didn’t have seemed to be the most difficult part.
- Filtering and massaging the data took most of the time.
- I felt that reporting and making sense of the data could be automated.
- Visualizing the numbers makes all the difference in the world :-)
* Perhaps the most difficult part of the entire process was my inability to copy-paste only the results of the grouping. Finally I discovered the “Go to special…” feature in Excel that allows one to copy only the visible parts of a selection.
** I can probably figure out the miles per month if I first calculate the miles/gallon efficiency of my car and motorcycle.