Frontier Smart, Cheap, Fast: Pick 3 with Open Source
Smart, cheap, fast used to be a cruel little triangle. You picked two and pretended you were happy.
I don't think that is true anymore. A model name used to mean one blob of weights behind one vendor endpoint. That was a bad abstraction. The right abstraction is a package: a fast worker, a judge, a synthesizer, a fallback list, maybe an advisor, maybe a panel, all sitting behind the same token API that every developer already knows how to call.
We have been publishing the numbers because this argument is worthless without numbers. Socrates-1.1 scored 72 on Terminal-Bench Hard. Synth hit 73.4 on DRACO. DeepSeek V4 Pro drew level with Opus on SimpleQA Verified in our run. A four-cent Gemma run did most of a small biology slice. OpenPatcher-S1 scored 7 out of 16 on a hard ExploitBench target where the listed open baselines were at 3, 2, and 2.
Those are very different benchmarks. That is why I like the pattern. This is showing up in code, research, factuality, biology, and cyber. The common thing is structure. Open models are now good enough to be building blocks, and the router can decide which block to use.
| Result | What it shows | Why it matters |
|---|---|---|
| Socrates-1.1 scored 72 on Terminal-Bench Hard | A combo model beat the frontier baselines in that run. | Smart can come from structure, not only from one expensive model. |
| Synth reached 73.4 on DRACO | A panel plus judge plus synthesizer beat the strongest solo runs. | Open panels can recover strengths no single model owns. |
| Open models caught frontier models on factuality | DeepSeek V4 Pro drew level with Opus on SimpleQA Verified in our run. | Cheap open-weight models are no longer toy alternatives. |
| A four-cent Gemma run solved most of a biology slice | For some tasks, small open models do nearly all the useful work. | The expensive model should be reserved for the parts that need it. |
| OpenPatcher-S1 hit 7 / 16 on ExploitBench CVE-2024-2887 | A specialized open cyber model more than doubled the listed open baseline in the chart. | Open specialized models can move faster than general-purpose vendor models. |
The obvious objection is cost. Ensembles sound expensive because bad ensembles are expensive. Asking seven huge models every time is dumb. Most prompts do not need that. A routine request should hit the fast cheap model and be done. A suspicious request should ask an advisor. A research request should use Synth. A security request should go to a specialist. The expensive path should be a conditional branch. Otherwise it becomes a tax on every token.
That is how the triangle breaks. You get frontier-level answers by spending extra only on the hard parts. You get cheap answers because the common path is open-weight and small. You get speed because the default path stays short and because fast providers can sit at the front. You get reliability because the route has fallbacks instead of one vendor outage taking the whole app down.
The open source part is the part people underrate. The TrustedRouter software is open. The eval harnesses are open. The blog posts link the runs. The model pages show the routes and privacy classes. Open-weight models like DeepSeek, GLM, Kimi, MiniMax, Gemma, and Qwen can be first-class building blocks instead of "budget models" people apologize for using.
Privacy matters more when a model becomes a graph. Combo models create subcalls. If those subcalls go through a black-box proxy, you multiplied your trust problem. The routing graph should be inspectable, the privacy class should be explicit, and the gateway should be something an agent can verify before it sends the prompt.
Smart, cheap, fast was a vendor tradeoff. Open source routing turns it into an engineering problem.
Pick 3.