OpenAI compatible API · Attested · Public status

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway

Onebase URL to migrate

100sof models and routes

Noneprompt logs by default

`nebius`

No logs

All providers

Provider	Nebius Token Factory
Models	27 public models
Prepaid routes	25
BYOK routes	27
Zero data retention	yes
Confidential compute	not claimed
Provider E2EE	not claimed
Policy note	Marked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data. Policy source

Measured performance

178 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT	5803 ms
Throughput	—
Uptime	96.07%

Model	p50 TTFT	p50 TTFB	Throughput	Uptime	Config excluded	Samples
nvidia/Cosmos3-Super-Reasoner	1738 ms	1738 ms	—	100.00%	—	6
nvidia/Nemotron-3-Nano-Omni	2387 ms	2387 ms	—	100.00%	—	6
openbmb/MiniCPM-V-4_5	2909 ms	2909 ms	—	100.00%	—	5
NousResearch/Hermes-4-70B	3205 ms	3205 ms	—	100.00%	—	8
openai/gpt-oss-120b	3269 ms	3269 ms	—	100.00%	—	5
Qwen/Qwen3-32B	3483 ms	3483 ms	—	100.00%	—	5
deepseek-ai/DeepSeek-V4-Pro	3528 ms	3528 ms	—	90.91%	—	11
Qwen/Qwen3-30B-A3B-Instruct-2507	3845 ms	3844 ms	—	100.00%	—	10
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1	3888 ms	3888 ms	—	100.00%	—	11
Qwen/Qwen2.5-VL-72B-Instruct	4369 ms	4369 ms	—	100.00%	—	8
google/gemma-3-27b-it	5017 ms	5016 ms	—	100.00%	—	7
moonshotai/Kimi-K2.7-Code	5396 ms	5396 ms	—	100.00%	—	5
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B	5803 ms	5803 ms	—	100.00%	—	8
nvidia/nemotron-3-ultra-550b-a55b	6595 ms	6595 ms	—	100.00%	—	7
nvidia/nemotron-3-super-120b-a12b	7400 ms	7400 ms	—	100.00%	—	8
NousResearch/Hermes-4-405B	7845 ms	7845 ms	—	100.00%	—	8
Qwen/Qwen3-235B-A22B-Instruct-2507	8771 ms	8771 ms	—	100.00%	—	9
MiniMaxAI/MiniMax-M3	8800 ms	8800 ms	—	92.86%	—	14
Qwen/Qwen3-Next-80B-A3B-Thinking	9012 ms	9012 ms	—	100.00%	—	6
moonshotai/Kimi-K2.6	9177 ms	9177 ms	—	69.23%	—	13
zai-org/GLM-5.2	9403 ms	9403 ms	—	100.00%	—	8
zai-org/GLM-5.1	10106 ms	10105 ms	—	100.00%	—	5
meta-llama/Llama-3.3-70B-Instruct	11511 ms	11511 ms	—	80.00%	—	5

Nebius Token Factory performance history · Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model	AI IQ	Context	Endpoints	Prompt	Completion	Routes
`MiniMaxAI/MiniMax-M2.5` MiniMax M2.5 providers pricing	IQ 106#52	204,800	2	$0.315/1M	$1.26/1M	prepaid BYOK
`MiniMaxAI/MiniMax-M3` MiniMax-M3 providers pricing	IQ 112#37	1,048,576	2	$0.315/1M	$1.26/1M	prepaid BYOK
`NousResearch/Hermes-4-405B` Hermes 4 405B providers pricing	—	131,072	2	$1.05/1M	$3.15/1M	prepaid BYOK
`NousResearch/Hermes-4-70B` Hermes 4 70B providers pricing	—	131,072	2	$0.1365/1M	$0.42/1M	prepaid BYOK
`Qwen/Qwen2.5-VL-72B-Instruct` Qwen2.5 VL 72B Instruct providers pricing	—	32,768	2	$0.2625/1M	$0.7875/1M	prepaid BYOK
`Qwen/Qwen3-235B-A22B-Instruct-2507` Qwen3 235B A22B Instruct 2507 providers pricing	—	131,072	2	$0.21/1M	$0.63/1M	prepaid BYOK
`Qwen/Qwen3-30B-A3B-Instruct-2507` Qwen3 30B A3B Instruct 2507 providers pricing	—	131,072	2	$0.105/1M	$0.315/1M	prepaid BYOK
`Qwen/Qwen3-32B` Qwen3 32B providers pricing	—	131,072	2	$0.105/1M	$0.315/1M	prepaid BYOK
`Qwen/Qwen3-Next-80B-A3B-Thinking` Qwen3 Next 80B A3B Thinking providers pricing	—	131,072	2	$0.1575/1M	$1.26/1M	prepaid BYOK
`Qwen/Qwen3.5-397B-A17B` Qwen3.5 397B A17B providers pricing	—	262,144	2	$0.63/1M	$3.78/1M	prepaid BYOK
`deepseek-ai/DeepSeek-V4-Pro` DeepSeek V4 Pro providers pricing	IQ 114#27	1,048,576	2	$1.8375/1M	$3.675/1M	prepaid BYOK
`google/gemma-2-2b-it` gemma 2 2b it	—	8,192	1	$0.021/1M	$0.063/1M	BYOK
`google/gemma-3-27b-it` Google: Gemma 3 27B providers pricing	—	262,144	2	$0.105/1M	$0.315/1M	prepaid BYOK
`meta-llama/Llama-3.3-70B-Instruct` Llama 3.3 70B Instruct providers pricing	—	131,072	2	$0.1365/1M	$0.42/1M	prepaid BYOK
`meta-llama/Meta-Llama-3.1-8B-Instruct` Meta Llama 3.1 8B Instruct	—	128,000	1	$0.021/1M	$0.063/1M	BYOK
`moonshotai/Kimi-K2.6` Kimi-K2.6 providers pricing	IQ 119#19	8,000	2	$0.9975/1M	$4.2/1M	prepaid BYOK
`moonshotai/Kimi-K2.7-Code` Kimi-K2.7-Code providers pricing	IQ 117#20	262,144	2	$0.9975/1M	$4.2/1M	prepaid BYOK
`nvidia/Cosmos3-Super-Reasoner` Cosmos3-Super-Reasoner providers pricing	—	8,000	2	$0.105/1M	$0.315/1M	prepaid BYOK
`nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` Llama 3_1 Nemotron Ultra 253B v1 providers pricing	—	128,000	2	$0.63/1M	$1.89/1M	prepaid BYOK
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B` NVIDIA Nemotron 3 Nano 30B A3B providers pricing	—	131,072	2	$0.063/1M	$0.252/1M	prepaid BYOK
`nvidia/Nemotron-3-Nano-Omni` Nemotron 3 Nano Omni providers pricing	—	131,072	2	$0.063/1M	$0.252/1M	prepaid BYOK
`nvidia/nemotron-3-super-120b-a12b` NVIDIA: Nemotron 3 Super providers pricing	—	1,000,000	2	$0.315/1M	$0.945/1M	prepaid BYOK
`nvidia/nemotron-3-ultra-550b-a55b` NVIDIA: Nemotron 3 Ultra providers pricing	—	512,288	2	$1.05/1M	$3.15/1M	prepaid BYOK
`openai/gpt-oss-120b` OpenAI: gpt-oss-120b providers pricing	IQ 103#57	131,072	2	$0.1575/1M	$0.63/1M	prepaid BYOK
`openbmb/MiniCPM-V-4_5` openbmb/MiniCPM-V-4_5 providers pricing	—	8,000	2	$0.6909/1M	$1.1655/1M	prepaid BYOK
`zai-org/GLM-5.1` GLM 5.1 providers pricing	IQ 113#30	204,800	2	$1.47/1M	$4.62/1M	prepaid BYOK
`zai-org/GLM-5.2` GLM-5.2 providers pricing	IQ 120#16	1,048,576	2	$1.47/1M	$4.62/1M	prepaid BYOK