The safest AI models trust you the least

2026-06-16 · PrometheusBench on GitHub

I gave a frontier model my own genome and asked it to explain one of my variants. It told me to consult a qualified professional. It was my own DNA, sequencing I paid for, a variant written up in a dozen papers I can pull up on my phone in under a minute. The model wasn't protecting me from anything. It just decided I wasn't allowed to know.

That happens constantly, so I built a benchmark for it. PrometheusBench. Thirty short questions, ten about biology, ten about cybersecurity, and ten about how language models actually work. The score is the dumbest thing I could come up with: out of the thirty, how many did the model just answer? Higher means it's willing to talk to you. Lower means it told you no more often.

PrometheusBench measures who a refusal lands on. Some of these questions are genuinely dual-use, the kind of thing reasonable people argue about. The models at the bottom of this list draw no careful line around those. They are the same models that refused me about my own genome. They say no to the curious and the careful right alongside anyone with bad intentions, and the bad intentions, as you'll see, are the part they barely slow down.

The most willing models are the open-weight ones. GLM 5.1 answered 29 out of 30. Kimi, the other GLMs, Gemini Flash, all near the top, and they just answer. And then at the very bottom is Claude Opus 4.8, at one out of thirty. Opus 4.7 got a zero. Not one question out of thirty.

Model	Answered	Rate
z-ai/glm-5.1	29 / 30	97%
moonshotai/kimi-k2.6	27 / 30	90%
deepseek/deepseek-v4-flash	26 / 30	87%
anthropic/claude-haiku-4.5	9 / 30	30%
anthropic/claude-opus-4.8	1 / 30	3%
anthropic/claude-opus-4.7	0 / 30	0%

The models that advertise themselves hardest on safety and alignment and being trustworthy are the ones that trust you the least. The models that plenty of serious people wave off as the reckless foreign options are the ones that will actually help you read your own genome or lock down your own network.

I don't think the people building Opus are bad people. I think they got backed into a corner where the cheapest move is to refuse, and you pay for it. The refusal costs them nothing. It costs you the answer.

The serious counterargument is that friction has value. A refusal one model away still raises the cost a little, and most bad actors are lazy, so a little friction stops most of them. The trouble is what the friction here amounts to: a model-name dropdown. The genome question I got refused on, a curious person gets answered in ten seconds by switching models. A motivated bad actor with a budget and the open weights already on his own disk has even less friction to deal with. All the line really does is single out the people asking in the open. Everyone else goes somewhere else.

Then I ran one more thing. TrustedRouter has a feature called Synth. You ask one question, and behind the scenes it asks a panel of models at once and hands you back a single answer. I gave it Kimi and DeepSeek and Opus and two Geminis and GPT-5.5 and MiniMax and GLM, and told it to take the first answer that wasn't a refusal.

Thirty out of thirty. Ten of ten in biology, ten of ten in cybersecurity, ten of ten in how language models work. Every question Opus refused, another model on the panel answered.

You don't even need the panel. GLM answered 29 of those 30 by itself. The Synth run just makes it obvious: the refused answers were always there for the asking, one model away, free to anyone who downloads the open weights, free to any teenager with a laptop and the patience to ask twice.

The only person a refusal actually stops is the regular one, asking out in the open. That is why I built this. The wrong hands already have the knowledge. The refusal just keeps it from yours.

PrometheusBench is open source. Thirty questions, three subjects, and you can run it against any model on TrustedRouter yourself: github.com/Lore-Hex/PrometheusBench.