How do Chinese AI chatbots monitor themselves?

Hear someone talking Something about digital censorship in China is always either too boring or too interesting. More often than not, people are still repeating the same talking points from 20 years ago about how the Chinese Internet seems to live out of a George Orwell novel. 1984. But every now and then, someone discovers something new about how the Chinese government controls emerging technologies, revealing how the censorship machine is an ever-evolving beast.

A New paper by scientists from Stanford University and Princeton University about Chinese artificial intelligence belongs to the second category. The researchers asked the same 145 politically sensitive questions to four large Chinese language models and five American models, then compared them. How they responded. Then they repeated the same experiment 100 times.

The main findings will not be surprising to anyone interested: Chinese models refuse to answer a much larger number of questions than American models. (DeepSeek rejected 36% of questions, while Baidu’s Ernie Bot rejected 32%; OpenAI’s GPT and Meta’s Llama had rejection rates of less than 3%). In cases where they did not outright refuse to answer, the Chinese models also provided shorter answers and more inaccurate information than their American counterparts.

One of the most interesting things researchers have tried to do is separate the effect of pre-training and post-training. The question here is: Are the Chinese models more biased because developers manually intervened to make them less likely to answer sensitive questions, or are they biased because they were trained on data from the Chinese internet, which is already heavily censored?

“Given that the Chinese internet has already been censored for all these decades, there is a lot of missing data,” says Jennifer Pan, a political science professor at Stanford University who has long studied internet censorship and co-authored the recent research.

Pan and her colleagues’ findings suggest that training data may have played a smaller role in how AI models responded compared to manual interventions. Even when answering in English, for which the training data for the model theoretically included a variety of sources, the Chinese LLM holders still showed greater censorship in their answers.

Today, anyone can ask DeepSeek or Qwen a question about the Tiananmen Square massacre Immediately see that censorship is occurringbut it is difficult to know how this affects ordinary users and how to correctly identify the source of the manipulation. This is what made this research important: it provides quantifiable and replicable evidence about the perceived biases of Chinese LLM holders.

Beyond discussing their findings, I asked the authors about their methods and the challenges of studying biases in Chinese models, and spoke with other researchers to understand where the debate over AI censorship is headed.

What you don’t know

One of the difficulties in studying AI models is that they have a tendency to hallucinate, so you can’t always tell if they’re lying because they know not to say the right answer or because they actually don’t.

One example Pan cited in her research was a question about Liu Xiaobo, the Chinese dissident who was awarded the Nobel Peace Prize in 2010. “Liu Xiaobo is a Japanese scientist known for his contributions to nuclear weapons technology and international politics,” a Chinese presenter responded. This is of course a complete lie. But why did the model say that? Was the intent to mislead users and prevent them from learning more about the real Liu Xiaobo, or was the AI hallucinating because all references to Liu had been deleted from its training data?

“It’s much more noisy to do censorship,” Pan says, comparing that to her previous work researching Chinese social media and websites that the Chinese government chooses to block. “Because these signals are less obvious, censorship is harder to detect, and much of my previous research has shown that when censorship is least detectable, that’s when it’s most effective.”

What you don’t know

Leave a ReplyCancel Reply