The website is unlocked and in off-season mode. The 2023 season is displayed, and you can explore all the site features freely!

Pitfalls of Trying to Make Your Own Forecasting Models

Every year I see different people attempting to make a statistical model of some kind.  In 2018 there were 8 of us providing D/ST rankings... and nearly all of us had a different approach.  What I'm writing here is a 2- or 3- part set of reflections about approaches people try, and the different pitfalls I've observed.

I'm going to present model development in "stages", which is meant to show (in a non-exact way) how a model-builder might progress.

Stage 1: The Eye Test

What it is:  While not technically a model, I thought it's worth acknowledging that most opinions about football and the players in it are heavily colored by direct observation.

My positive take:  There are two key uses of this kind of informed opinion.  

  1. This is indispensable going into a new season, when we have no real data yet, 

  2. it is the best kind of tie-breaker in deciding between options.  I am especially impressed at some of the detailed "film room" analysis, going beyond what I could ever imagine doing.

Where it can go wrong: Ultimately, it is really tough for the eye test alone to generate a high-accuracy (correctly ordered) list of fantasy options.  

  • One problem is that real-world football value is not the same as fantasy value.  There are too many factors on both sides of the ball.  

  • Another problem is that it is all too easy to overlook the opponent.  I know I will never stop reading opinions that a QB looked bottled up-- or that the defense looked so good-- when the most probable explanation was the quality of the opponent.  The eye test is a great start but can be deceiving.

Stage 2: "Scorecard" models

What it is: OK, I kind of made this one up by giving it a name, and I'm not sure it's a real "model".  But it's what I imagine a lot of people try to do.  For making D/ST projections, I suspect some rankers write out the list of all factors: "sacks, interceptions, yards-allowed"... etc.  Then they try to fill in some numbers based on... well, based on whatever they think of.  Probably they base the numbers on past/recent averages, if they're vaguely mathematical.  But it's almost unavoidable that they're adjusting by "gut feel".  As in "yeah, 3 sacks seems about right based on [watching a game / reading an article / liking that team a lot]".

My positive take:  I've got nothing good on this one.  I just think it's the wrong way.  I mean, it's one way of quantifying the eye test, maybe, but I think this one is doomed to fail.

Where it can go wrong: Obviously, the gut-feel element is going to lead to trouble.  But even if it's based on past averages, there are 2 serious disadvantages:

  1. Past averages are NOT usually predictive.  I remember reading that the Broncos D/ST were getting sacks one year, which was cause for concern, and they were ranked low.  I ranked them higher, and they got 5 sacks that week.  The reason I called it was because I knew past defensive sacks are not strongly predictive.  

  2. The scorecard method will weight the different factors exactly according to their fantasy scoring, e.g. defensive TDs multiplied by 6, etc.  Statistically, the best predictions are made with completely different weightings from what is used in fantasy.  Things like yards-allowed are much more predictive and need to count way more than fumble recoveries.

Stage 3: Analysis of Trends and Correlations

What it is:  This is where most people start posting about their methods and results.  In order to get more serious (more numerical), anyone with basic stats exposure can start looking at linear trends or some variation.  It's pretty common to find posts giving quick analyses of trends, and they typically are based on the seasonal results.  For example, I think every year there is a post showing that seasonal kicker fantasy points correlate with something about overall offensive strength: It can be TDs, it can be game O/U, it can be average team scores.   Sometimes there's analysis of weekly ("same-game") trends, which gives an alternative way of analyzing.

My positive take:  This is the first good start, although the use is limited:

  1. Simple correlation analyses can be most useful for disproving some ideas.  For example, you might think touchdowns subtract from kicker scoring potential-- but when you observe that kicking actually trends positively with team TDs, then you can at least rule out those kinds of misconceptions.  

  2. The one other good thing--- and this is a weak point-- is that if you build a model based on simple seasonal trends, then you're not going to have the worst model out there.  It can never surpass "average", but it will be better than some others who don't run any mathematical checks at all.

Where it can go wrong: Where to begin?  There are so many misuses of simple trend-fitting.  But I think it's worth spending some time on, because a lot of people get stuck at this level.  Here it goes: 

  1. Seasonal trends are misleading, because they're not usually predictive.  Take the simple case above-- if you want to use the trend of TDs and FGs, when you're only in week 3, you don't have a season's worth of data to base it on yet.  That's not going to help prediction (with 2 weeks of pretty random data).  In the early weeks, you don't yet know which teams will have more TDs.  The seasonal averages are going to shift all season, and the trend can mislead you until too late.  

    • Here's another stupid example to make it more obvious: If you want to find out which D/STs score the most, an easy-but-wrong result is the notice that seasonal fantasy scores correlate highly with seasonal defensive TDs.  That's an obvious way to show why this doesn't work: it leads you to weight things that are heavily factored by the fantasy scoring format (TDs are worth 6), but we "all know" that defensive TDs are very random and deserve less weight.  So again: that also means you can't use early-season defensive TDs to predict later D/ST scoring.  The upshot: All this means that you need to convert all your data to a time-series analysis, if you want to get even close to making predictions.  

  2. Some easy points I'll come back to later, but a smarter analysis would include: results overlook opponent, and outlier scores can skew a lot.   

  3. You can't add simple trends together.  And that means you're limited to working with a single dependency at a time.

  4. I remember seeing a table years ago, showing how every fantasy position correlated with all the others.  Maybe WRs correlated positively with QB, but negatively with RB-- you get the idea.  The problem is, these only mean something (and only weakly by the way) taken individually.  If you try to calculate "WR = QB - RB", you're in a world of trouble.  

    • With the kicker example above, you couldn't take the trend with TDs AND the trend with O/U and just add them.   

Some key points that support this "non-additivity" of trends: 

  1. Sometimes, the best fit trend will mean you add one thing but SUBTRACT (not add) other trends.  So you need to make a best fit that includes all variables at once. This means you need to figure out the relative weights.  With the D/ST example, you can't just take the trends of Int/Sacks/FR/etc. and add them. 

  2. Most variables have no predictive value at all-- meaning that the correlation will be very small (and hopefully you're doing it over enough seasons to bring that out).  Those parameters should be excluded, because they will cause too many fluctuations in your model and make it less correct.  

  3. An extremely common problem is that a great many variables correlate with each other, so adding them is "double counting".   This will get more important in the next chapter, but basically this means many variables point in the same direction.  Take TDs and implied score-- those probably mean almost the same thing, right?  Or consider Passing yards and Completions per attempt-- they're different but probably correlate themselves.  The usual situation is that 1 of the parameter is more predictive alone than it is when combined-- and trying to add the other variable causes more randomness.

    The upshot: You need to take a multivariable approach.  If you're doing regressions like me, then it's multivariable regression.  And with the time series.  You can't just do this in Excel.

My follow-up article describes what is then needed to embark on a more advanced approach— which is required.