Gen AI needs artificial data. We need to be able to trust it

Tructured artificial intelligence models today, such as those behind them Chatgpt and twinIt is trained in groups of real world data, but even all the content on the Internet is not enough to prepare a model for every possible situation.

To continue to grow, these models must be trained in simulating or artificial data, which are reasonable, but not real scenarios. Experts in a committee in the southwest of the southwest, or things may go quickly.

The use of data simulated in training artificial intelligence models has gained new attention this year since the launch Deepseek aiA new model produced in China has been trained using more artificial data from other models, providing money and processing strength.

But experts say it is more about providing and processing data. Artificial data -The computer that has often created from artificial intelligence itself can teach a model about the scenarios that are not found in the real information provided but may face in the future. This possibility is one of all millions that should not be a surprise to the artificial intelligence model if a simulation is seen from it.

“With simulation data, you can get rid of the idea of the edge cases, assuming that you can trust them,” said Odezo, who led the product teams on Twitter, Atlesean, Microsoft and other companies. He and others of the committee were speaking on Sunday at the SXSW conference in Austin, Texas. “We can build a product that works for 8 billion people, in theory, as long as we can trust it.”

The difficult part is to make sure that you can trust it.

Simulation data problem

Simulating data has a lot of benefits. For anyone, it costs less production. You can escalate the test thousands of simulated cars using some programs, but to get the same results in real life, you must already destroy cars – which cost a lot of money – Odezo.

If you are training a self -driving car, for example, you will need to capture some of the less common scenarios that the car may face on the roads, even if it is not in training data, said Taher Ecoin, professor of business analyzes at Texas State University. He used the state of bats that make an amazing appearance of the Austin Avenue Avenue Bridge. This may not appear in training data, but the self -driving car will need some feeling how to respond to a set of bats.

The risks come from how the device responds using the artificial data of the real world changes. Ecin said he could not be present in an alternative fact, becomes less useful, or even dangerous. He asked, “How will you feel, enter a self -driving car that was not trained on the road, was only trained in simulation data?” He said that any system that uses simulation data should “focus on the real world”, including comments on how to align logical thinking simulation with what is already happening.

Udezue compare the problem with the creation of social media, which began as a way to expand communication around the world, which is a goal it has achieved. He said that social media has also been used, noting that “tyranny is now using it to control people, and people use it to tell jokes at the same time.”

With the growth of artificial intelligence tools in size and popularity, the scenario has become easier by using artificial training data, and the potential effects in the real world increased for confidence and separate models from reality more important. “The burden on the builders of the United States, scientists, to be weak, sure that the system is reliable,” Odizo said. “It is not a fantasy.”

How to keep the simulation data under selection

One way is to ensure confidence in making their training transparent, as users can choose the model that must be used based on their assessment of that information. The committee members have repeatedly used the nutritional label, which makes it easy for the user to understand.

There is some transparency, such as the available forms cards through the developer platform Embroidery Which breaks the details of the various systems. This information should be as clear and transparent as possible. “These types of things should be in place,” he said.

Hollinger ultimately said, it’s not only the developers of artificial intelligence but also artificial intelligence users who will determine the best industry practices.

Odezo said the industry also needs to put morals and risks. “The artificial data will make a lot of things easier to do,” he said. “This will reduce the cost of building things. But some of these things will change society.”

Odizo said that the ability to observe, transparent and trust should be built into models to ensure its reliability. This includes updating training forms to reflect accurate data and do not inflate errors in artificial data. One of the anxiety is the collapse of the model, when the artificial intelligence model is trained on the data produced by other artificial intelligence models far from reality, to the point that it becomes useless.

“Whenever you move away from capturing the diversity of the real world, the responses may be unhealthy,” said Odezo. He said the solution is to correct the error. “These do not feel uncomfortable problems if you combine the idea of confidence and transparency and correct the error in it.”

Simulation data problem

How to keep the simulation data under selection

Leave a ReplyCancel Reply