OpenAI compatible API. Attested gateway. Public status.

Fusion eval results

TrustedRouter is reproducing Fusion-style DRACO evals with exact criterion scoring before publishing a headline comparison.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

Fusion eval results

2026-06-14

Source context: OpenRouter Fusion announcement.

Reproducing Fusion in the open. TrustedRouter is running the same class of routing experiment with public code, explicit model lists, and measurable cost/quality tradeoffs instead of a hidden benchmark harness.

Comparable full-run results are not published yet. The prior holistic-judge run is excluded from this post because it does not match OpenRouter's DRACO scoring method.

Reference Results

RunOpenRouter scoreTrustedRouter scoreStatus
Solo Gemini 3 Flash43.129.35 on 10-task smokeInvestigating
Solo Kimi K2.653.7Not enough completed rowsInvestigating
Solo DeepSeek V4 Pro60.3Not run with exact scorer yetPending
Fusion budget panel64.7Not run with exact scorer yetPending

Replication Rules

  • Mode: micro-hybrid runs the small public smoke before any expensive full pass.
  • Judge model: google/gemini-3.1-pro-preview.
  • Scoring: DRACO criterion-level grading, three independent passes, normalized 0-100.
  • Search: Exa with DRACO/rubric hostnames excluded and result leakage checks enabled.
  • Publication rule: raw solo baselines must be close before any Fusion headline is published.

The exact scorer and leakage guard are implemented in the open-source harness. Full comparable results will replace this table when the raw baselines replicate.

Sign in

Choose a sign in method.