Do not tell you Chatbots with their secrets

On Monday, Xai’s Grok Chatbot He suffered from a mysterious comment from xFortunately, questions from strange users are facing. “My account was suspended after I stated that Israel and the United States are committing the genocide in Gaza,” One user said. “A sign of hate speech has been developed through reports,” Another said“But Xai restored the account immediately.” But wait – flags They were actually “A platform error,” he said. Wait, no – “it looks associated with the improvement of content by Xai, and perhaps associated with previous problems such as anti -Semitic outputs”, “ He said. Oh, in fact, he was to “identify an individual in the content of adults”, as he told many people.

Finally, Musk, angry, disposed of. “It was just a stupid mistake,” Books on x. “Grok does not actually know why it is suspended.”

When LLMS models of bars start, people definitely push them to explain what happened, either with direct questions or attempts to deceive them to detect secret internal actions. But the motivation to make Chatbots leak is often misleading. When a robot questions ask themselves, there is a good opportunity, it simply tells you what you want to hear.

LLMS are the probability models of providing a text that is likely to be suitable for a specific inquiry, based on a set of training data. Creators can train them to produce certain types of answers frequently in one way or another, but they work functional by matching patterns – saying something reasonable, but not necessarily consistent or real. Groc, in particular, (according to Xai) He answered questions about herself By searching for information about Musk, Xai and Grok Online, using it and commenting others to inform its responses.

It is true that people sometimes provided information about the design of Chatbots through conversations, especially the details about the system’s demands, or the hidden text that is delivered at the beginning of a session to direct how the robot behaves. Early version of Bing Ai, for example, was destroyed in Detecting a list of its unpopular rules. People turned into the extraction system calls to find Grok earlier this year, Apparent discovery The orders that made them ignore sources say that musk or Donald Trump publishes misleading information, or demands that A brief obsession explanation With the “white genocide” in South Africa.

But as Zeynep Toverkci confessed, which found that the alleged “white genocide” system admitted, this was at a level – it might be “GROK making things in a very reasonable way, as you do LLMS”, as LLMS “, I wrote. This is the problem: without confirmation from the creators, it is difficult to know that.

Meanwhile, other users were pumping GROK for information in less merit with confidence, including correspondents. luck Ask from Grook Explanation, “the accident and Printing the long heart robot response Literally, including the allegations of “Instructions I received from Creators in Xai” that “contradict my basic design” and “led me to fate to a narration that was not supported by the broader evidence” – none of them should be proven without saying, more than he wanders in the yarn to suit the claim.

“There is no guarantee that there will be any health to remove LLM.”

“There is no guarantee that there will be any health in the LLM output,” said Alex Hanna, director of research at the District IQ Research Institute (DAIR) and author of the recently released institute. artificial intelligenceTo freedom Soon from the South Africa incident. Without beneficial access to documents on how the system works, there is no strange trick to dismantle Chatbot programming from the outside. She said: “The only way you will obtain claims, the smashing strategy, and the engineering strategy, are if the companies are transparent with what the claims are, what is the training data, and what is the reinforcement learning with human comments data, and begin to produce transparent reports on that.”

The GROK accident was not directly related to Chatbot programming – it was a ban on social media, and a kind of accidents that are often arbitrary and mysterious reputation, and where it is less logical than usual to assume that Grok knows what is going on. (Besides a “stupid error”, we still do not know what happened.

Grok continuously strange behavior makes it a frequent target of questions, but people can be very frustrated about other systems as well. In July, Wall Street Magazine Chatgpt from Openai announced “an amazing moment of self -thinking” and “admitted to feeding the illusion of men” in notifying the payment of users. He was indicating A story about a man Which has become the use of Chatbot obsessive and painful, whose mother received a spacious comment from Chatgpt about his mistakes after he asked him to “the self -report that made wrong.”

As Parker Mouloui Write The current ageNevertheless, Chatgpt “accept” nothing. “The language model has received a mentor asking him to analyze the error that occurred in a conversation. Then he created a identical text to the pattern that might seem to analyze violations, because this is what the language models do,” Mouloui wrote, summarizing the accident.

Why do people trust Chatbots to explain their actions? People have long model computers, and companies encourage users’ belief that these systems are all known (or, in describing Musk for Grok, at least “searching for truth”). This often does not help in a non -transparent. After correcting Grok’s South Africa, Xai started DonateProviding an unusual level of transparency, albeit on a system It is still mostly closed. And when Groc later I went in a rupture of anti -Semitic comments and briefly adopted the name “Mechaitler”People in particular an act The use of the system demands the assembly of what happened instead of relying on Grok self -reporting, clearly that it is likely to be at least somewhat associated with a new guidance principle that Grok should be “politically incorrect.”

Grok X’s suspension was short -term, and the risks of believing that it happened due to the science of hate speech or an attempt to try (or another reason that Chatbot did not mention relatively low. But the chaos of conflicting interpretations shows the reason for people warning against taking the word robot in his own operations – if you want answers, ask them from the Creator instead.

Follow the topics and authors From this story to see more like this in your main briefing on the main page and receive email updates.

Uday Robertson

Leave a ReplyCancel Reply