A few years back I got to work on a project for a client that was interested in relating weather data (rain precipitation, temperature, cloudiness, etc.) to their sales metrics in order to validate an assumption that the weather had an impact on how their sales performed in different parts of the country. During management meetings, regional managers needed evidence that there was a strong correlation between bad weather and weak sales, and between significant weather events (such as hurricanes) and customers stocking up on their product. So we built a weather data warehouse that pulled weather metrics from the National Oceanic and Atmospheric Administration (NOAA) and tied it to the company’s financial data. The common denominator between the two datasets was the geolocation of the weather stations and stores, and the date the information was captured (when sales were made, and when the weather readings were taken).
Back then, you needed a subscription to access NOAA’s FTP site in order to extract daily weather information across all the stations in the U.S. and use it for your own data analysis. Today you can still access the data via an FTP site, but you no longer need a subscription for it. So I thought this would make for an interesting dataset to play with again and do some weather analysis using Power BI.
To get started, I downloaded all historical weather information across the globe, along with the metadata files associated with it (stations, states, countries). Trying to parse through 100K+ files and over 50GB of data proved no easy task, and after a 10 hour unfinished ETL run, I decided to scale down my dataset. So I focused on weather stations in the continental United States. I loaded the data into a SQL database, did some data parsing, filtered out some of the more obscure weather details I wasn’t particularly interested in (who looks at “Thickness of ice on water” anyway?), and kept only the last 6 years’ worth of data. That brought my dataset down to a comfortable 10.5 million records.
Once I had a working dataset, I started mocking up a dashboard in Power BI and looking for some interesting insights to share. The preliminary end result is published below (I’m planning on adding 1-2 additional dashboards over the weekend). Feel free to interact with the dashboard and do your own analysis!
Sometimes leveraging external data and relating it to your company’s metrics can yield new insights. Having a good understanding of your business, along with some outside the box thinking can lead to new discoveries that improve how you manage your operations and increase your company’s performance over time. If you want to know more about how you can leverage this solution, feel free to contact me.