What's In the Model?

Although I heavily guide the process, I don't choose what ends up into each model. Subvertadown is about a framework for developing the predictive models.  For this, I test hundreds of variables for significance, and cross-validate, add/remove/iterate, cross-validate. The method is multiple linear regression, sometimes including interaction terms, and the data is all painstakingly converted for processing in weekly time-series (no in-sample data, all foreknowledge). Additionally, I have a data-processing engine to: especially account for past opponent strength, to treat outliers, to account for meaningful trends, and to include the right ramp-down of previous season data. 

Factors analyzed include things like: previous game scores, current betting lines, total yards, rushing yards, passing yards, total TDs, passing TDs, home/away, dome/outdoors, turf, weather -wind, temp, precipitation-, day of the week, post-bye, win-loss record, sacks/FR/INT, completions, red zone efficiency, drive success rate, passing efficiency, points per play, conversions, lots of ratios (and products) between all of these positional fantasy points --QB/RB/WR/etc.--, division, and some sensible products or ratios of these. Data from both teams is tested. Also all "factors-allowed" to opposing teams --e.g. "points-allowed".

Of course, most variables gets excluded; usually only 10-15 variables survive to the final model, in part because some are redundant (which causes overfit-- hence the need for cross validation). I have tested regularizing with modern Lasso regression, but the best lambda value is 0, which just means OLS is already optimal -- there are plenty more samples than variables; bias is low. I deal with team changes --like when entering a new season or else injuries in season-- by adjusting "+/- 1 standard deviation" to the given factor, based on reports of the positive/negative expectations. I also account for secondary effects based on historical correlations. Although I heavily guide the process, I don't choose what ends up into each model. Subvertadown is about a framework for developing the predictive models.  For this, I test hundreds of variables for significance, and cross-validate, add/remove/iterate, cross-validate. The method is multiple linear regression, sometimes including interaction terms, and the data is all painstakingly converted for processing in weekly time-series (no in-sample data, all foreknowledge). Additionally, I have a data-processing engine to: especially account for past opponent strength, to treat outliers, to account for meaningful trends, and to include the right ramp-down of previous season data. 

Factors analyzed include things like: previous game scores, current betting lines, total yards, rushing yards, passing yards, total TDs, passing TDs, home/away, dome/outdoors, turf, weather -wind, temp, precipitation-, day of the week, post-bye, win-loss record, sacks/FR/INT, completions, red zone efficiency, drive success rate, passing efficiency, points per play, conversions, lots of ratios (and products) between all of these positional fantasy points --QB/RB/WR/etc.--, division, and some sensible products or ratios of these. Data from both teams is tested. Also all "factors-allowed" to opposing teams --e.g. "points-allowed".

Of course, most variables gets excluded; usually only 10-15 variables survive to the final model, in part because some are redundant (which causes overfit-- hence the need for cross validation). I have tested regularizing with modern Lasso regression, but the best lambda value is 0, which just means OLS is already optimal -- there are plenty more samples than variables; bias is low. I deal with team changes --like when entering a new season or else injuries in season-- by adjusting "+/- 1 standard deviation" to the given factor, based on reports of the positive/negative expectations. I also account for secondary effects based on historical correlations.

What do my models overlook? Hm... With most of the major stuff included, I'd say the missing elements would be all the weekly details— the things that are difficult to account for precisely. E.g. I might miss changes in the OL/DL for example, or all the effects of coaching changes.

Kicker

My Kicker model was created by testing every variable I’ve encountered for predictive value, plus more: special efforts to account for field goal production at different distances and kicking specific data like accuracy. 

You can be sure I have accounted for weather, domes, accuracy, and all the offensive + defensive data of both teams.

D/ST

D/ST fantasy scoring includes points-allowed, yards-allowed, etc., and many of these same parameters also have value in the predictive model.  Things like sacks have a larger influence from the opponent (i.e. how many sacks the opposing QB/OL allows).  Passing habits and rushing yards contribute, as do weather, field, and home games.  If I have read information about DL lineup changes (injuries), I also have a tool to make minor adjustments.

My D/ST model is currently an amalgamation of sub-models, due to successful experimentation over the years.  I have one model that uses betting lines, and another that uses my own game score projections.  I have a special model to sort only the top-most teams.  And I have the collection of sub-models for supplementary stats, such as a dedicated model to predict number of sacks (from 10 other parameters).  A more comprehensive list of tested parameters is listed in my ”What’s in the Model?” article.

Finally, a good amount of adjustment comes from the data processing: identifying trends and accounting for prior opponents that the D/ST faced.

QB

For QB, some of the main drivers are: overall offense, QB rushing yards, an optimized combination of passing characteristics.  There is additional predictive value from various parameters I have previously commented on: opposing defense characteristics, competition with own-team rushing game, and the opposing quarterback.