OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory performance

Measured TTFT, TTFB, throughput, uptime, and sampled model routes for Nebius Token Factory.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

nebius

269 samples

Provider overview

Continuously sampled provider performance. TrustedRouter reports unsupported route and probe-configuration rows separately from provider downtime. Prompt and output content is not stored.

p50 TTFT983 ms
p95 TTFT ms
p50 TTFB ms
Throughput
Uptime98.88%

Measured model routes

Modelp50 TTFTp50 TTFBThroughputUptimeConfig excludedSamples
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 791 ms 725 ms 100.00% 17
meta-llama/Llama-3.3-70B-Instruct 805 ms 804 ms 100.00% 14
NousResearch/Hermes-4-70B 825 ms 825 ms 100.00% 22
NousResearch/Hermes-4-405B 827 ms 723 ms 100.00% 20
Qwen/Qwen3-235B-A22B-Instruct-2507 904 ms 903 ms 100.00% 17
Qwen/Qwen3-32B 922 ms 921 ms 100.00% 17
google/gemma-3-27b-it 974 ms 867 ms 100.00% 19
openai/gpt-oss-120b 983 ms 936 ms 100.00% 23
Qwen/Qwen3-30B-A3B-Instruct-2507 1055 ms 952 ms 100.00% 24
Qwen/Qwen2.5-VL-72B-Instruct 1093 ms 990 ms 100.00% 16
deepseek-ai/DeepSeek-V4-Pro 1548 ms 1466 ms 94.74% 19
nvidia/nemotron-3-super-120b-a12b 1585 ms 1482 ms 100.00% 18
Qwen/Qwen3-Next-80B-A3B-Thinking 2017 ms 1914 ms 100.00% 17
zai-org/GLM-5.1 4389 ms 4286 ms 100.00% 24
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B 0.00% 18 probe_config_error 1
nvidia/Nemotron-3-Nano-Omni 0.00% 11 probe_config_error 1

Sign in

Choose a sign in method.