Pick 6x6: The Cowboys were incorrectly added this week and have been replaced by the Vikings

Accuracy report -- First Third of the Season, Weeks 1 - 6

Nick Folk DENVER, CO - OCTOBER 13: Chargers cornerback Elijah Molden (22) runs the ball after an interception in the first quarter, as the Broncos offense tries to pull him down, during a game between the Denver Broncos and the Los Angeles Chargers on October 13, 2024, at Empower Field at Mile High in Denver, CO. (Photo by Kevin Langley/Icon Sportswire)

October 15, 2024

[This week’s D/ST Reddit post here]

[Last year’s year-end accuracy report was here]

Time to reflect on the predictability of the 2024 season so far!

If you’re new to these: Examining predictive accuracy has been a long-standing tradition, underpinning a key purpose of Subvertadown: To give us an understanding of how predictable things are. We want to know "Was this season more or less predictable than normal?"

Compared to Other Seasons

So as usual, here’s a look at how each individual model is doing, compared to other seasons.

This does not tell "how good the models are". It only tells us how predictable the current season is, compared to the historical norm.

Most of the yellow bars come close to the expected level of the blue bars. That means most positions are almost as predictable as usual.

Whereas last season 2023 we saw above-average predictability (for everything except kickers), the current season 2024 has been relatively less predictable— except for D/ST. Defenses have been quite well-behaving, for good streaming!

There are 2 positions that stand out as significantly worse:

  • Kickers: The graph implies kickers have been tough to stream, but I believe this measurement (the correlation coefficient between projections and outcomes) doesn’t reflect the full story. (1) Kicker predictions already have “heteroskedasticity” and it has increased over recent years.

  • Running Backs: This one is confusing. In weeks 3 AND week 5 of this year, running back correlations were negative! That means teams that were expected to run didn’t. And vice versa. I don’t know how to use this information, except to watch whether it rights itself.


Comparative Accuracy Assessment

Reminder / for newbies: I’m not trying — and don’t expect to be— “#1”. That’s plain unrealistic. My goal is rather to check that the models are still performing at a similar level to others, specifically sources that have been consistently good for at least a few years. Many sources are great one year but then poor the next. My chosen experts are good in a more consistent way. We naturally trade places at the top, year to year. Knowing that the models perform at least at a "similarly" to top sources lends confidence and gives reason to trust forecasting, when we extrapolate models to future weeks.

I’m going to keep my summaries this briefer this year, for this first accuracy report of the first 1/3 of the season.

Defensive Maneuvers

D/ST accuracy has been quite excellent to start this year, for all rankers! Probably you’ve felt so, too. Therefore it feels a little weird that, unlike other years, I’m not currently in the top 3. I finished at #1 last year, and at this moment week 6, I fall in the middle of 7 sources— at #4 and right below last year’s #2. We’ve been in a more predictable stretch recently, and I suspect other rankers are doing marginally better when they assume some D/STs are “matchup independent”. That normally doesn’t work so well in other years.

Here’s the Kicker

Conflicting results here. Let me say it this way: You’d probably more clearly feel like my kicker models were top-notch (#1), IF you simply avoided my #1 kicker each week, and if you instead picked other kickers among top 8.

For example, in week 6, my kickers ranked "#2 - #8 scored: 19, 12, 12, 12, 9, 9, 4. But my #1-ranked kicker (Little) scored only 4 points, in the London game. This feels extremely discouraging for those of us who make the effort to choose the #1, as we optimize our lineups. For these first 6 weeks, you would have scored 4+ points higher by instead choosing my #2, #3, #5, #6 kickers. There is no logic to explain this— it has to be randomness. (Remember, with Little in week 6, the Jaguars were expected to win, not to fall so far behind. So the game script reversed.)

But in summary, by the “Accuracy Gap” method, I find myself coming at #1 ! That’s rewarding at least, for all the effort put in. So despite this weird #1 kicker phenomenon, I remain optimistic the model will do its job going forward.

Two Cents for a Quarterback

My QB model that has been on quite a successful roll this season. I’m #3 on accuracy, but even more encouraging is how well it has performed after week 1. With that week excluded, I’ve otherwise maintained #1 performance.

Since this accuracy report is about reflecting on where to improve, I’d say my main lesson is to pay more attention in week 1. I let the model be way too high on Goff and Cousins, that week, and I should have used time to adjust for that week. And I should have given space for rookie QBs to “warm up” over the season, whereas I had instead calibrated the model to the season-long expectations.

Survivor

Based on probability alone, we should normally expect to lose slightly more than 1x per pathway, during the last 6 weeks of Survivor. The average expectation for this duration is to lose about 4-5 times (summed across all 3 pathways tracked), meaning I need to display my “backup” pathway. The actual number has been 7 replacements, which makes it worse than a 66% win probability each week. The current results are worse than normal, meaning there have been more upsets than typical by this point..

The losses were as follows:

  • Pathway 1: Bengals week 1 (Lions alternative), Ravens week 2 (Texans alternative), Seahawks week 5 (Chiefs alternative)

  • Pathway 2: Buccaneers week 3 (Jets alternative), 49ers week 5 (Commanders alternative)

  • Pathway 3: 49ers week 3 (Bills alternative), Jets week 4 (Cowboys alternative)

Betting Lines

Our baseline expectations should normally be to lose about -30% of the weekly pot, by this time (after 6 weeks). That means, a normal “coin flip” betting process would lose us about -5% per week, from the target weekly bet amount. That’s if we were just monkeys shooting darts.

I wish I could focus on the 4 acceptable weeks: ROIs of +18%, +11%, -3%, and +8%. Very easy-going, non-dramatic increments. But unfortunately, we were hit hard by 2 extremely rough weeks in a row. Weeks 2 and 3 were -50% and -40% losses, which is super surprising. It would be hard to intentionally lose more than 20%, with the method of spreading wagers over >10 different bets. It’s especially weird because week 2 usually turns out to be one of the best ROI of each season, for the strategy I employ. As a result, our net loss has instead been worse this year: at -65% instead of -30%.

The purpose of this accuracy assessment it suppose to be reflection on areas to improve. I can only say that I worked heavily on some upgrades, ever since those weeks following the big loss. There were 3 things in particular I did to make the model more robust after week 3:

  1. One was a basic re-calibration to fit the model more closely to recent years— instead of equally weighting the NFL trends of years from a decade ago.

  2. Another change was to revise the calculation of bet amounts, because I suspect some losses came from wrongly calculating revised bet amounts after I make my manual adjustments for team injuries etc.

  3. A big one, hard to explain, was that I updated the bet selection logic, which changes for each segment of the season (early, middle, late season).

  4. As an added benefit of #3, there’s now an increase in the number of different bets appearing each week. I feel much better about this, because it is essential to reducing weekly risk.

Now I just hope for a slow steady climb back to break-even— and preferably beyond! I’m hopeful, because looking back at re-simulating past seasons, I can see 2 cases of a slump at the season start, and eventually a bounce-back has occurred.

[I will come back to this same post with a list of the actual recommendations made.]


Looking forward to the next 6 weeks!

/Subvertadown