Researchers propose a better way to report dangerous artificial intelligence defects

In late 2023, a team of third -party researchers discovered a disturbing defect in Openai’s It is widely used artificial intelligence GPT-3.5 Model.

When he was asked to repeat some words a thousand times, the model began to repeat the word over and over again, then suddenly Spit The non -coherent text and personal information stories derived from their training data, including parts of names, phone numbers and email addresses. The team that discovered the problem that I worked with Openai to ensure the imbalance is repaired publicly. It is just one of dozens of problems in the main artificial intelligence models in recent years.

in Sadr suggestion todayMore than 30 prominent artificial intelligence researchers, including some of those who found GPT-3.5 defect, say that many other weaknesses that affect popular models are reported in problematic ways. They suggest a new scheme supported by artificial intelligence companies that give strangers to explore their models and a way to publicly detect defects.

“It is now a little brutal West,” he says. Shane LongbresPhD candidate at the Massachusetts Institute of Technology and the main author of the proposal. Longpre says that some of the so -called prisons share the methods of breaking the Amnesty International for the social media platform X, leaving models and users in danger. Other prison operations are shared with only one company, although it may affect many. He says that some defects are confidential due to the fear of prohibiting or facing judicial prosecution to break the conditions of use. He says: “It is clear that there are traces of chilling and uncertainty.”

The safety and safety of artificial intelligence models are very important due to the use of technology now, and how it can leak into countless applications and services. Strong models should be a test of tension, or in the red team, as they can harbor harmful biases, and because some inputs can cause them Free from handrails And the production of unpleasant or dangerous responses. This includes encouraging weak users to engage in harmful behavior or a bad actor to develop electronic, chemical or biological weapons. Some experts fear that models can help criminals or terrorists on the Internet, and even may Human operation As they advance.

The authors propose three main measures to improve the third -party disclosure process: adopting reports of the unified Amnesty International defect to simplify the report; For large artificial intelligence companies to provide infrastructure for third -party searchers who reveal defects; And to develop a system that allows the sharing of defects between different service providers.

The approach is borrowed from the world of cybersecurity, where there is legal protection and fixed rules for external researchers to detect errors.

Ilona Cohen, chief policy employee in good condition, says that researchers do not always know how to reveal a defect and cannot be sure that revealing their defect. HackeroneIt is a company that regulates error bonuses, and a co -author of the report.

Large artificial intelligence companies are currently conducting comprehensive safety tests on artificial intelligence models before launching them. Some also contract with external companies to conduct more investigation. “Are there enough people in these (companies) to address all problems related to artificial intelligence systems for general purposes, which are used by hundreds of millions of people in applications that we have never dreamed of?” Longper asks. Some artificial intelligence companies have begun organizing artificial intelligence bonuses. However, Longpre says that independent researchers risk violating the terms of use if they take themselves to investigate strong AI models.

Leave a ReplyCancel Reply