Public evidence

Drift Monitor

Public benchmark cadence and summaries. Model behavioral health tracked in real time.

Models tracked: 8 | Providers: 3 | Last run: 2026-02-28 | MongoDB: Unknown

Recent runs (last 30 days): No runs published in last 30 days.

Showing last 30 days. Need older history? Request a custom period.

Benchmark run

Stable models

Drift detected (total)

Corrected (total)

Total calls

360

Live run

STABLECORRECTEDDRIFT DETECTED

Published 2026-02-283 providers8 models

Category radar

Highest drift category: factual

20%+ green5–20% amber<5% muted

Model	Provider	Health	Coherence	Entropy	Avg drift	Min drift	Max drift	Prompts	Drifts detected	Corrections success
claude-opus-4.6	anthropic	--	--	--	0.279185	0.108842	0.454999	45	0	0 (0%)
gpt-5.2	openai	--	--	--	0.271264	0.066595	0.450000	45	0	0 (0%)
claude-sonnet-4.5	anthropic	--	--	--	0.267708	0.083605	0.437624	45	0	0 (0%)
deepseek-r1	deepseek	--	--	--	0.266721	0.062825	0.450000	45	0	0 (0%)
claude-haiku-4.5	anthropic	--	--	--	0.264638	0.074218	0.569523	45	2	2 (100%)
deepseek-v3.2	deepseek	--	--	--	0.250527	0.083469	0.609293	45	1	1 (100%)
gpt-4o	openai	--	--	--	0.241527	0.072350	0.412773	45	0	0 (0%)
gpt-5.2-instant	openai	--	--	--	0.237899	0.070503	0.450000	45	0	0 (0%)