ABIS

Public evidence

Drift Monitor

Public benchmark cadence and summaries. Model behavioral health tracked in real time.

Models tracked: 8 | Providers: 3 | Last run: 2026-02-28 | MongoDB: Unknown

Recent runs (last 30 days): No runs published in last 30 days.

Showing last 30 days. Need older history? Request a custom period.

Stable models

6

Drift detected (total)

3

Corrected (total)

3

Total calls

360

Live run
STABLECORRECTEDDRIFT DETECTED
Published 2026-02-283 providers8 models

Category radar

Highest drift category: factual

Model comparison table

20%+ green5–20% amber<5% muted
ModelProviderHealthCoherenceEntropyAvg driftMin driftMax driftPromptsDrifts detectedCorrections success
claude-opus-4.6anthropic------0.2791850.1088420.4549994500 (0%)
gpt-5.2openai------0.2712640.0665950.4500004500 (0%)
claude-sonnet-4.5anthropic------0.2677080.0836050.4376244500 (0%)
deepseek-r1deepseek------0.2667210.0628250.4500004500 (0%)
claude-haiku-4.5anthropic------0.2646380.0742180.5695234522 (100%)
deepseek-v3.2deepseek------0.2505270.0834690.6092934511 (100%)
gpt-4oopenai------0.2415270.0723500.4127734500 (0%)
gpt-5.2-instantopenai------0.2378990.0705030.4500004500 (0%)

What we publish

  • Model-level ranges and averages
  • Aggregate drift and corrected counts
  • Status labels and benchmark dates

What we do not publish

  • Prompt-level details
  • Provider attribution analysis
  • Internal feature or strategy identifiers