OpenAI compatible API. Attested gateway. Public status.
Phala performance
Measured TTFT, TTFB, throughput, uptime, and sampled model routes for Phala.
1 URLbase_url migration
100smodels and routes
0prompt logs by default
phala
305 samples
Continuously sampled provider performance. TrustedRouter reports unsupported route and probe-configuration rows separately from provider downtime. Prompt and output content is not stored.
| p50 TTFT | 1971 ms |
|---|---|
| p95 TTFT | ms |
| p50 TTFB | ms |
| Throughput | 44 tok/s |
| Uptime | 97.38% |
Measured model routes
| Model | p50 TTFT | p50 TTFB | Throughput | Uptime | Config excluded | Samples |
|---|---|---|---|---|---|---|
| qwen/qwen-2.5-7b-instruct | 926 ms | 883 ms | — | 100.00% | — | 15 |
| z-ai/glm-4.7-flash | 963 ms | 872 ms | — | 100.00% | — | 16 |
| qwen/qwen2.5-vl-72b-instruct | 993 ms | 946 ms | — | 100.00% | — | 14 |
| openai/gpt-oss-120b | 1413 ms | 1411 ms | — | 100.00% | — | 13 |
| deepseek/deepseek-chat-v3.1 | 1475 ms | 1372 ms | — | 100.00% | — | 28 |
| google/gemma-3-27b-it | 1616 ms | 1578 ms | — | 100.00% | — | 18 |
| qwen/qwen3-vl-30b-a3b-instruct | 1770 ms | 1769 ms | — | 100.00% | — | 17 |
| qwen/qwen3-30b-a3b-instruct-2507 | 1805 ms | 1802 ms | — | 100.00% | — | 23 |
| openai/gpt-oss-20b | 1971 ms | 1968 ms | — | 100.00% | — | 18 |
| moonshotai/kimi-k2.5 | 2068 ms | 2056 ms | — | 94.74% | — | 19 |
| moonshotai/kimi-k2.6 | 2135 ms | 1859 ms | 44 tok/s | 96.55% | — | 29 |
| z-ai/glm-4.7 | 2297 ms | 2193 ms | — | 94.74% | — | 19 |
| minimax/minimax-m2.5 | 2329 ms | 2225 ms | — | 100.00% | — | 16 |
| z-ai/glm-5.1 | 2417 ms | 2393 ms | — | 73.68% | — | 19 |
| z-ai/glm-5 | 2514 ms | 2511 ms | — | 100.00% | — | 17 |
| deepseek/deepseek-v3.2 | 2841 ms | 2840 ms | — | 100.00% | — | 9 |
| qwen/qwen3.5-397b-a17b | 2885 ms | 2851 ms | — | 100.00% | — | 15 |