New Open Source SOTA cybersecurity model released today: OpenPatcher-S1
OpenPatcher-S1 scored 7 out of 16 on the ExploitBench CVE-2024-2887 target.
Seven out of sixteen is still ugly. It fails more than it passes. That is why the number is interesting. Kimi K2.6 gets 3. GLM-5.1 gets 2. MiniMax M2.7 gets 2. OpenPatcher-S1 more than doubles the strongest listed open baseline on the comparison chart, and it still leaves most of the ladder unsolved. That is exactly the kind of result worth publishing: strong enough to matter, incomplete enough that nobody can pretend the problem is done.
| Model | Score | Notes |
|---|---|---|
| trustedrouter/openpatcher-s1 | 7 / 16 | TrustedRouter + AI IQ open patching model |
| Kimi K2.6 | 3 / 16 | strongest listed baseline in the comparison chart |
| GLM-5.1 | 2 / 16 | public model baseline |
| MiniMax M2.7 | 2 / 16 | public model baseline |
I care about this benchmark because it is a ladder. The model has to find the patched code, trigger the bug, build useful primitives, and climb toward control in a real target environment. Multiple-choice cyber tests are too easy to fake. A ladder is harder to fake. You either reached the rung or you did not.
OpenPatcher-S1 is built for defensive patching work. The job is to read vulnerable code, understand why the patch matters, and produce repair guidance that survives contact with a real environment. A model that cannot reason through the bug will not reliably fix the bug. That is the whole reason to test it this way.
The obvious worry is that cyber evals drift into exploit marketing. Yes, they can. So the claim has to stay narrow. We are publishing the score, the target, and the comparison. We are not turning the post into a recipe. The useful product is a model that helps serious teams fix security bugs faster, under a route they can inspect.
Poseidon is next. It is still training. On the same target it is already above OpenPatcher-S1 internally. I am not calling that a published result yet because that would be dumb. But it tells us the method is working. OpenPatcher-S1 was not a lucky prompt.
On this target, among the open cyber models in the public comparison set, OpenPatcher-S1 is the result to beat. Poseidon is coming next.