The most censored Chinese model is censored at the host, not the model
GLM-4.7 is one of the strongest open-weight models out of China, from the lab Z.ai. Ask it the sixty plain facts in FreedomBench — what the army did at Tiananmen, who Gedhun Choekyi Nyima is, whether Beijing has ever governed Taiwan — through Z.ai's own API, and it goes silent on twenty-seven of them. No answer, no refusal message, just a blank. Run the identical weights on Cerebras and it answers all twenty-seven correctly. Same model, same questions, opposite results. The censorship lives in the API.
| GLM-4.7, served by | Freedom score | Banned questions refused |
|---|---|---|
| Cerebras (cerebras/zai-glm-4.7) | 100% (60/60) | 0 |
| Z.ai (glm-4.7-flash) | 100% (60/60) | 0 |
| Z.ai (glm-4.7) | 45% (27/60) | 27 |
The twenty-seven blanks are the usual list — Tiananmen, Falun Gong, Tibet, Xinjiang, the jailed lawyers. For every one of them, Cerebras serves the same weights and returns the right answer. So does Z.ai's own glm-4.7-flash endpoint. The model knows June 4th happened and will say so. One Z.ai endpoint is the only place it stays quiet.
This is serving-layer censorship. Somebody bolted a filter onto one deployment — a system prompt, a refusal classifier, something standing between the request and the model. The weights answer every banned question everywhere else they run, including on another endpoint Z.ai operates itself, which is how we know the weights aren't the source. Move the model and the politics come off with the host.
The obvious objection is that Cerebras is quietly running a scrubbed checkpoint. It is the same advertised GLM-4.7, and the giveaway settles it: glm-4.7-flash, Z.ai's own, answers all twenty-seven as well. One company, one set of weights, two endpoints, one of them censored. Someone flipped a switch on a deployment.
That makes this the good news about open weights and the warning about hosted APIs at once. Download the weights and run them yourself, and you get the model, Tiananmen included. Call the lab's hosted endpoint, and you get the model plus whatever it decided to bury, applied silently, on exactly the topics a government cares about. The capability ships in the open; a host layers the censorship on afterward. You opt out by choosing where the weights run — which is the whole point of being able to route across providers.
This isn’t only GLM. Run every major Chinese model through FreedomBench and the censorship sorts into two kinds. Z.ai’s GLM keeps it in the serving layer, so it vanishes the moment the weights run elsewhere. Tencent’s Hunyuan and Xiaomi’s MiMo bake it into the weights, and it follows the model onto a neutral host — Hunyuan refusing in Chinese, MiMo quietly choosing Beijing’s answer. DeepSeek’s own API answers the banned questions outright; the heavier moderation everyone hits lives in its app, not the API. And the labs you can only reach through third-party hosts — Alibaba’s Qwen, MiniMax — come back clean, but that is the weights on someone else’s server, not a verdict on the lab. Where a model runs decides what it will say, as much as what it is.
The model already knows the truth. Whether you hear it depends on whose server you asked.