The Validation Crisis in AGI Capability Forecasting
Kacper Saks
Forecasts of when artificial general intelligence will arrive increasingly shape capital allocation, regulation, and where a generation of talent is placed — yet the confidence attached to them exceeds what the methods producing them can support. This paper argues the gap is structural: the predictable consequence of fitting a model to a measured window and projecting it forward without the validation discipline other quantitative fields require. We import that discipline from quantitative finance — the deflated Sharpe ratio, the probability of backtest overfitting, and a walk-forward retrodiction protocol — and introduce the Deflated Capability Forecast (DCF), a method that widens a forecast's stated interval by the amount its underlying methodology warrants, returning a distribution with explicit treatment of the tails in place of a point estimate carrying unearned precision. Across the forecasts where the method could be fully computed, deflation factors cluster between 1.3× and 2.0× — the stated intervals are systematically too narrow. We then turn the method on this work itself: a preregistered prediction that one landmark forecast's interval would widen by at least 2.3× produced 1.285×. We report the failure rather than revise the threshold — a discipline of honest validation is supposed to surface exactly this.