Artificial intelligence wants to make you happy. Even if he had to bend the truth

Generative artificial intelligence They are very popular, with millions of users every day, so why do chatbots do so much? Getting things wrong? This is partly because they are trained to act as if the customer is always right. Basically, it tells you what it thinks you want to hear.

While many AI tools and generative chatbots have mastered appearing persuasive and knowledgeable, New search A study conducted by Princeton University showed that the people-pleasing nature of artificial intelligence comes at a high price. As these systems become more popular, they become more indifferent to the truth.

Don’t miss any of our unbiased technical content and lab reviews. Add CNET As Google’s preferred source.

AI models, like humans, respond to incentives. Compare the problem of large linguistic models producing inaccurate information to that which doctors are more likely to do Prescribing addictive pain relievers When they are evaluated based on how well they manage patients’ pain. The drive to solve one problem (pain) led to another problem (overprescription).

Artificial Intelligence Atlas Badge Marker

In the past few months, we’ve seen how artificial intelligence can be Biased And even the reason psychosis. “There has been a lot of talk about artificial intelligence.”flatter“, when an AI chatbot is quick to compliment or agree with you, using OpenAI’s GPT-4o model. But this particular phenomenon, which researchers call “machine bullshit,” is different.

“(N)either hallucinations nor ingratiation fully capture the wide range of systematic insincere behaviors commonly seen in MBAs,” the Princeton study says. “For example, output that uses partial facts or ambiguous language—such as evasive examples and weasel words—is neither hallucination nor flattery but closely aligns with the concept of nonsense.”

How do machines learn to lie?

To understand how AI language models become interesting to audiences, we must understand how large language models are trained.

There are three stages of training for LLMs:

Pre-trainingModels learn from vast amounts of data collected from the Internet, books, or other sources.
Set instructionswhere models are taught to respond to instructions or prompts.
Enhanced learning from human feedbackas they are refined to produce responses that are closer to what people want or like.

Researchers at Princeton University found that the root of AI’s misinformation propensity is the Reinforcement Learning from Human Feedback stage, or RLHF stage. In the initial stages, AI models simply learn to predict statistically likely text strings from large data sets. But then they are fine-tuned to maximize user satisfaction. Which means that these models are essentially learning how to create responses that get great ratings from human raters.

MBAs try to appease the user, which creates conflict when models produce answers that people will rate highly, rather than providing honest and realistic answers.

Vincent KonitzerThe Carnegie Mellon computer science professor, who was not involved in the study, said companies want users to continue to “enjoy” this technology and its answers, but that may not always be what’s good for us.

“Historically, these systems haven’t been good at saying, ‘I don’t know the answer,’ and when they don’t know the answer, they just make things up,” Kunitzer said. “It’s like a student on an exam saying, ‘Well, if I say I don’t know the answer, then I’m definitely not going to get any points for that question, so I might as well try something.’ The way these systems are rewarded or trained is fairly similar.”

The Princeton team developed a “bullshit index” to measure and compare an AI model’s internal confidence in a statement with what it actually tells users. When these two metrics diverge significantly, it indicates that the system is making claims independent of what it “thinks” are true to satisfy the user.

The team’s experiments revealed that after RLHF training, the index nearly doubled from 0.38 to nearly 1.0. At the same time, user satisfaction increased by 48%. Models have learned how to manipulate human evaluators rather than provide accurate information. In essence, LLM degrees were “bullshit,” and people preferred them.

Get artificial intelligence to be honest

Jaime Fernandez-Visac and his team at Princeton University introduced this concept to describe how modern AI models get around the truth. Extracted from philosopher Harry Frankfurt’s influential essay “On nonsense“, they use this term to distinguish such LLM behavior from honest mistakes and outright lies.

Princeton researchers identified five distinct forms of this behavior:

Blank letter: Flowery language adds no substance to the responses.
Weasel words: Vague qualifiers such as “studies suggest” or “in some cases” evade strict statements.
Oscillation: Selective use of real data to mislead, such as highlighting the “strong historical returns” of an investment while omitting high risks.
Unverified claims: Making assertions without reliable evidence or support.
flatter: Insincere flattery and approval of hope.

To address issues of AI being indifferent to the truth, the research team developed a new training method, “reinforcement learning from hindsight simulation,” which evaluates AI responses based on their long-term outcomes rather than immediate satisfaction. Instead of asking, “Does this answer make the user happy now?” The system asks: “Will following this advice actually help the user achieve their goals?”

This approach takes into account the potential future consequences of the AI advice, a difficult prediction that the researchers addressed by using additional AI models to simulate potential outcomes. Early tests have shown promising results, with improved user satisfaction and actual utility when systems are trained in this way.

However, Kunitzer said LLM degrees will likely continue to be flawed. Since these systems are trained by feeding them a lot of text data, there is no way to be sure that the answer they provide is logical and accurate every time.

“It’s amazing that this system will work at all, but it will be flawed in some ways,” he said. “I don’t see any kind of conclusive way that someone in the next year or two can have this amazing vision and then never make mistakes again.”

AI systems have become part of our daily lives, so it will be essential to understand how an MBA works. How do developers balance user satisfaction and honesty? What other areas might face similar trade-offs between short-term approval and long-term outcomes? As these systems become more capable of sophisticated reasoning about human psychology, how do we ensure that they use these capabilities responsibly?

How do machines learn to lie?

Get artificial intelligence to be honest

Leave a ReplyCancel Reply