Keep doing biology with Prometheus

2026-07-02 · prometheus-biomysterybench on GitHub

The wrong lesson from the biology post would be to give up.

The right lesson is simpler: keep doing biology, but stop treating one model's policy as the boundary of the work.

Anthropic built a very strong biology model and kept the best one partner-only. The broadly available model may refuse biology prompts. That is their product decision. It should not become your research workflow. A biologist should not lose the afternoon because a single endpoint decides the question is too close to the lab.

That is why trustedrouter/prometheus exists. Prometheus is the open-model Synth preset: a panel of open models, a judge, and a synthesizer behind one normal model id. You call it through the same OpenAI-compatible API. The work runs through the same attested gateway. The point is not that a committee is magic. The point is that model choice becomes an engineering decision instead of a vendor veto.

The first biology run was intentionally small. Three tasks from an open BioMysteryBench-style harness. Claude Opus 4.8 got all three. Gemma-4-31b got two for four cents. GLM-5.2 got two and was the only non-Opus model to crack the long motif task. Several cheap models knew the biology but ran out of patience on the long loop. That result is not a leaderboard. It is a useful map.

Prometheus is what you use after the map exists. For a short lookup, do not pay for a committee. Use the cheap model that already does the job. For a long uncertain analysis, use Prometheus. For the few tasks where every model shares the same blind spot, Prometheus will not save you, and the earlier post says that plainly. That is the whole reason to keep publishing the failures. A router should tell you when the expensive path is worth it and when it is just expensive.

This is where the LLM-advisor skill matters. Give the skill to Codex, Claude Code, Cursor, Hermes, or any agent that can read a URL. Let the agent ask TrustedRouter what models are live, what they cost, what privacy tier they sit in, and which route fits the task before it spends your tokens. If the agent supports MCP, connect it to the TrustedRouter MCP server. If it does not, have it read the raw SKILL.md.

Use the LLM-advisor skill.
For routine biology lookups, prefer the cheapest model that has passed the local harness.
For hard biology analysis, try trustedrouter/prometheus.
If every model agrees on the wrong answer, report the shared blind spot instead of pretending fusion solved it.

That last line matters. Biology has enough fake certainty already. Prometheus should make the workflow more honest, not more theatrical. A panel is valuable when models disagree and one of them is right. A panel is not valuable when every model marches into the same hole with better citations.

The pieces are open. The BioMysteryBench-style harness is on GitHub. The broader Synth work is in the Iris, Prometheus, and Zeus post. The combo-model framing is in Combo models are model containers. The agent setup is in the agent guide. The advisor launch is here. The whole point is that a smart agent can read the evidence, choose the route, and explain the choice.

I want biologists to keep asking the questions. I want the model to say when it knows, when it does not know, when it needs a cheaper specialist, when it needs Prometheus, and when the entire panel is probably stuck. That is much more useful than a single refusal page.

Keep doing biology. Use Prometheus when the problem deserves a committee. Use the skill so the agent knows when it does.