OpenAI compatible API. Attested gateway. Public status.
Nebius Token Factory performance
Measured TTFT, TTFB, throughput, uptime, and sampled model routes for Nebius Token Factory.
1 URLbase_url migration
100smodels and routes
0prompt logs by default
nebius
269 samples
Continuously sampled provider performance. TrustedRouter reports unsupported route and probe-configuration rows separately from provider downtime. Prompt and output content is not stored.
| p50 TTFT | 983 ms |
|---|---|
| p95 TTFT | ms |
| p50 TTFB | ms |
| Throughput | — |
| Uptime | 98.88% |
Measured model routes
| Model | p50 TTFT | p50 TTFB | Throughput | Uptime | Config excluded | Samples |
|---|---|---|---|---|---|---|
| nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 | 791 ms | 725 ms | — | 100.00% | — | 17 |
| meta-llama/Llama-3.3-70B-Instruct | 805 ms | 804 ms | — | 100.00% | — | 14 |
| NousResearch/Hermes-4-70B | 825 ms | 825 ms | — | 100.00% | — | 22 |
| NousResearch/Hermes-4-405B | 827 ms | 723 ms | — | 100.00% | — | 20 |
| Qwen/Qwen3-235B-A22B-Instruct-2507 | 904 ms | 903 ms | — | 100.00% | — | 17 |
| Qwen/Qwen3-32B | 922 ms | 921 ms | — | 100.00% | — | 17 |
| google/gemma-3-27b-it | 974 ms | 867 ms | — | 100.00% | — | 19 |
| openai/gpt-oss-120b | 983 ms | 936 ms | — | 100.00% | — | 23 |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | 1055 ms | 952 ms | — | 100.00% | — | 24 |
| Qwen/Qwen2.5-VL-72B-Instruct | 1093 ms | 990 ms | — | 100.00% | — | 16 |
| deepseek-ai/DeepSeek-V4-Pro | 1548 ms | 1466 ms | — | 94.74% | — | 19 |
| nvidia/nemotron-3-super-120b-a12b | 1585 ms | 1482 ms | — | 100.00% | — | 18 |
| Qwen/Qwen3-Next-80B-A3B-Thinking | 2017 ms | 1914 ms | — | 100.00% | — | 17 |
| zai-org/GLM-5.1 | 4389 ms | 4286 ms | — | 100.00% | — | 24 |
| nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B | — | — | — | 0.00% | 18 probe_config_error |
1 |
| nvidia/Nemotron-3-Nano-Omni | — | — | — | 0.00% | 11 probe_config_error |
1 |