Sunday, February 3, 2013

Predicting Superbowl XLVII (That's 47 in Roman)


Are there ways to meaningfully predict the Superbowl winner using readily available and intuitive pieces of information? The two teams that compete in the Superbowl represent two different conferences that played 64 interconference games during each of the last 12 NFL regular seasons (the number of interconference games has varied between 28 and 64 since the merger). These interconference games may provide information about the relative strength of the two conferences and thus could provide a useful predictor for the outcome of the Superbowl. I used historical data to test whether the probability the AFC team will win the Superbowl is predicted by the proportion of regular season games between the AFC and NFC won by the AFC in the 42 complete seasons since the NFL merger in 1970 (Figure 1) and conversely, whether interconference records predict the probability the NFC team will win the Superbowl.

Aside: This may all seem like magic or mumbo-jumbo if you are not statistically inclined.  If you are highly statistically inclined, you might worry that I have misused a term or I have not provided enough information to be appropriately critical of the technique used and inference.  Well, I guess you'll just have to deal with that because it's the internet.  But I'll send you my code if you want it (unlike Nate Silver, who makes shit football predictions anyway) in the interests of transparency.  This was all done in R, an outstanding open-source statistical computing environment.
Figure 1.  Proportion of AFC-NFC interconference games won by the AFC and AFC Superbowl wins, 1970-2012.  The 'NFC Dominance' period is clearly visible in the middle of the figure.

Bayesian logistic regression with minimally informative prior distributions revealed a positive relationship between the probability of the AFC winning the Superbowl and the proportion of regular season interconference games won by the AFC (Logit-scale beta posterior probability mean = 8.41, 95% credible interval = -0.04 - 17.7; Figure 2). 
Figure 2.  Relationship between predicted probability (with 95% credible intervals) of an AFC Superbowl win and the proportion of the regular-season AFC-NFC interconference games won by the AFC.
Given the weak AFC showing against the NFC during the 2012 regular season (25 wins, 39 losses) and this posterior probability distribution, the Bayesian posterior prediction for an AFC (Baltimore Ravens) win was low (predicted probability = 0.242, 95% prediction interval = 0.063 - 0.490).

Figure 3.  Posterior prediction of a Ravens (AFC champion) win given the dreadful record of the AFC against the NFC (25 wins, 39 losses) during the 2012 regular season.  The mean predicted probability was 0.242 with a 95% prediction interval of 0.063-0.490.
Alternatively, given the conversely strong NFC showing against the AFC during the 2012 regular season (39 wins, 25 losses) the Bayesian posterior prediction for an NFC (San Francisco 49ers) win was high (predicted probability = 0.758, 95% prediction interval = 0.510-0.936; Figure 4; note that this is just the complement of the posterior of a Ravens win since the two events are mutually exclusive).

Figure 4.  Posterior prediction of a 49ers (NFC champion) win given the strong record of the NFC against the AFC during the 2012 regular season.  The mean predicted probability was 0.758 with a 95% prediction interval of 0.510-0.936.
Based upon the NFC's domination of the 2012 interconference games and the historically predictive relationship between interconference regular season records and the conference of the Superbowl winner, the probability the NFC team, the 49ers, will win Superbowl XLVII is thus 0.758 (95% prediction interval 0.510-0.936).  Place your bets accordingly.



Several disclaimers:  This has not been peer reviewed.  I am not a statistician, although I am a practicing scientist and do have a background in math and statistics, but I am dabbling here.  I did all of this using off the shelf generally accepted methods of statistical inference using program R, and I'm happy to share the script I used.