OpenAI compatible API. Attested gateway. Public status.

Chasing Mythos-level Fusion in the open

A live engineering note on the first frontier Fusion attempt: what ran, what failed, and why we are not claiming a benchmark win yet.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

Chasing Mythos-level Fusion in the open

2026-06-14

Source context: Open Fusion methodology.

We tried to push TrustedRouter Fusion toward Mythos and Fable-class DRACO performance. The target panel was GPT-5.5, Claude Opus 4.8, Kimi K2.7 Code, GLM 5.2, MiniMax M3, Gemini 3 Flash, and Gemini 3.1 Pro, with Opus 4.8 synthesizing the final answer and Gemini 3.1 Pro judging against DRACO criteria.

That exact run is not publishable yet. Two blockers showed up immediately: GPT-5.5 needs special long-reasoning handling on DRACO prompts, and our Z.AI account is not entitled for GLM 5.2 yet. Z.AI returns a permission error for glm-5.2, so substituting it silently would be dishonest.

What actually ran

RunTask sliceResultStatus
Exact 7-model targetNon-financial DRACO pilotNo scoreBlocked by GPT-5.5 gateway handling and GLM 5.2 entitlement
Available 6-model fallbackFirst completed non-financial DRACO task19.85Completed, far below target

The fallback panel used Opus 4.8, Kimi K2.7 Code, GLM 5.1, MiniMax M3, Gemini 3 Flash, and Gemini 3.1 Pro. It completed one task before the pilot was stopped for speed and reliability. A score of 19.85 is not close to the target, and we are not presenting it as a win.

What changed in the harness

  • GPT-5.5 eval calls now omit temperature and use max_completion_tokens.
  • Panel and final synthesis calls stream so long answers do not wait for full completion before parsing.
  • Analysis and judge calls stay non-streaming because they require structured JSON reliability.
  • The live runner now has explicit six-model and seven-model frontier Fusion configs behind a hard budget.
  • The recommended DRACO slice for this experiment is --task-filter non-financial.

Next gates

The next clean run needs three fixes before any headline claim: enable GLM 5.2 on the Z.AI account, make GPT-5.5 long-reasoning responses produce useful content through the attested gateway, and finish a 10-task non-financial DRACO pilot without task-level hangs.

This is the point of doing the work in the open. If TrustedRouter clears a Mythos/Fable-class target, the result should be reproducible from code, model ids, task filters, budget limits, and artifacts. Until then, the honest result is: not there yet.

Sign in

Choose a sign in method.