Attention conservation notice: This may mostly be interesting for Nordics.
Many of us in the Nordics are a bit obsessed with the weather. Especially during summer, we keep checking different weather apps or newspaper prognoses to find out whether we will be able to go to the beach or have a barbecue party tomorrow. In Sweden, the main source of predictions is the Swedish Meteorological and Hydrological Institute, but many also use for instance the Klart.se site/app, which uses predictions from the Finnish company Foreca. The Norwegian Meteorological Institute’s yr.no site is also popular.
Various kinds of folk lore exists around these prognoses, for instance one often hears that the ones from the Norwegian Meteorological Institute (at yr.no) are better than those from the Swedish equivalent (at smhi.se)
As a hobby project, we decided to test this claim, focusing on Stockholm as that is where we currently live. We started collecting data in May 2016, so we now (July 2017) have more than one year’s worth of data to check how well the two forecasts perform.
The main task we considered was to predict the temperature in Stockholm (Bromma, latitude 59.3, longitude 18.1) 24 hours in advance. As SMHI and YR usually don’t publish forecasts at exactly the same times, we can’t compare them directly data point by data point. However, we do have the measured temperature recorded hourly, so we can compare each forecast from either SMHI or YR to the actual temperature.
SMHI forecasts were downloaded through their API via this URL every fourth hour using crontab.
YR forecasts were downloaded through their API via this URL every fourth hour using crontab.
Measured temperatures were downloaded from here hourly using crontab.
First, some summary statistics. On the whole, there are no dramatic differences between the two forecasting agencies. It is clear that SMHI is not worse than YR on predicting the temperature in Stockholm 24h in advance (probably not significantly better either, judging from some preliminary statistical tests conducted on the absolute deviations of the forecasts from the actual temperatures).
Both institutes are doing well in terms of correlation (Pearson and Spearman correlation ~0.98 between forecast and actual temperature). The median absolute deviation is 1, meaning that the most typical error is to get the temperature wrong by one degree Celsius in either direction. The mean squared error is around 2.5 degrees for both.
|Forecaster||Correlation with measured temperature||Mean squared error||Median absolute deviation||Slope in linear model||Intercept in linear model|
Let’s take a look at how this looks visually. Here is a plot of SMHI predictions vs temperatures measured 24 hours later. There are about 2400 data points here (6 per day, and a bit more than a year’s worth of data). The color indicates the density of points in that part of the plot.
And here is the corresponding plot for YR forecasts.
Again, there are about 2400 data points here.
Unfortunately, those 2400 data points are not exactly for the same times in the SMHI and YR datasets, because the two agencies do not publish forecasts for exactly the same times (at least the way we collected the data). Therefore we only have 474 data points where both SMHI and YR had made forecasts for the same time point 24h into the future. Here is a plot of how those forecasts look.
This doesn’t really say that much about weather forecasting unless you are specifically interested in Stockholm weather. However, the code can of course be adapted and the exercise can be repeated for other locations. We just thought it was a fun mini-project to check the claim that there was a big difference between the two national weather forecasting services.
Code and data
If anyone is interested, I will put up code and data on GitHub. Leave a message here, on my Twitter or email.
Accuracy in predicting rain (probably more useful).
Accuracy as a function of how far ahead you look.