OpenAI compatible API · Attested · Public status

Meta: Llama 3.3 70B Instruct Benchmarks

Benchmark and measurement links for Meta: Llama 3.3 70B Instruct, with TrustedRouter route data first.

Verify gateway

Onebase URL to migrate

100sof models and routes

Noneprompt logs by default

`meta-llama/llama-3.3-70b-instruct`

open weights Benchmarks

All models

Published benchmark scores

Benchmark scores for Meta: Llama 3.3 70B Instruct — every row links to its source, and a score is only ever attached to the exact checkpoint it was measured on. Vendor model-card and open-leaderboard numbers are cited, not run by us. Rows marked TrustedRouter · replays published are our own runs of this model through the gateway, with the full per-item replay published in trustedrouter-benchmarks so anyone can re-grade them.

Benchmark	Category	Score	Source
HumanEval	Coding	88.4%	Meta — Llama 3.3 70B model card 2024-12-06
IFEval	Instruction following	92.1%	Meta — Llama 3.3 70B model card 2024-12-06
MMLU 0-shot, CoT	Knowledge	86.0%	Meta — Llama 3.3 70B model card 2024-12-06
MMLU-Pro CoT	Knowledge	68.9%	Meta — Llama 3.3 70B model card 2024-12-06
MATH 0-shot, CoT	Math	77.0%	Meta — Llama 3.3 70B model card 2024-12-06
GPQA Diamond 0-shot, CoT	Science	50.5%	Meta — Llama 3.3 70B model card 2024-12-06

TrustedRouter measurements

TrustedRouter publishes route and status measurements without storing prompt or output content. Provider latency and uptime are exposed through the model performance and uptime pages.

External benchmark references

TrustedRouter performance pageTrustedRouter measurement
TrustedRouter uptime pageTrustedRouter measurement
LMArena leaderboardIndependent benchmark index
LiveBenchIndependent benchmark index
Artificial Analysis modelsIndependent benchmark index
HELMIndependent benchmark index