Artificial intelligence has learned almost everything: Lack of data would be a cruel joke, scientists say

Experts wonder how artificial intelligence models will develop in the future, when there is nowhere left to receive new text or images. They’ve already come up with a few options.

Scientists say that if humanity continues the rhythm and methodology of training large language models (LLM) by 2026, AI will have nothing to learn due to the lack of data, which will slow down or even change the development of AI. The conversation portal writes about this.

In their published work, they note that ChatGPT, for example, was trained on 570 gigabytes of text data, or approximately 300 billion words. Similarly, the stable diffusion algorithm (which powers many AI imaging applications such as DALL-E, Lensa, and Midjourney) was trained on the LIAON-5B dataset consisting of 5.8 billion image-text pairs. If an algorithm is trained with insufficient data, it will produce inaccurate or low-quality results.

Experts also specifically point out that the quality of the content on which large language models are trained is critical for the development of LLM. In this respect, social networks are not very suitable for education, since the quality of information in them is often manipulative, which will lead to incorrect conclusions from artificial intelligence. Texts retrieved from social media platforms may be biased or contain false information or illegal content that may be reproduced by the model.

According to scientists, AI developers are now looking for high-quality content such as texts from books, online articles, scientific articles, Wikipedia and certain filtered web contents. For example, Google Assistant was made more conversational by training it on 11,000 romance novels from the self-publishing site Smashwords.

At the same time, researchers predict that we will run out of high-quality text data before 2026 if current AI training trends continue. They predict that low-quality language data will run out between 2030 and 2050, and low-quality image data will run out between 2030 and 2060.

But the situation may not be as bad as it seems. There are many unknowns about how AI models will evolve in the future and some of the ways to address the risk of missing data. One opportunity for AI developers is to improve algorithms so they can use the data they already have more efficiently. In the coming years, they will be able to train high-performance AI systems using less data and possibly less computing power.

Another option is to use AI to create synthetic data to train systems. In other words, developers can easily generate the data they need based on their custom AI models. Many projects already use synthetic content, mostly from data generation services like Mostly AI. Researchers say this will become more common in the future.

Previously Focus He reported that Germany, France and Italy reached an agreement on artificial intelligence regulation. According to German Digital Minister Volker Wissing, if Europe wants to play in the world’s best artificial intelligence league, applications, not technologies, need to be regulated.

Source: Focus

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest

Today’s horoscope for Libra for September 1, 2023.

Lb (23.09 ? 22.10) Perhaps your mind revolves around things from the past that you have not yet completed, and this prevents you from falling asleep....

Andrea Escalona proudly presents her baby and reveals her name along with tender snaps.

past On December 22, Andrea Escalona gave birth to her first child. and he didn't hesitate to share the first images to present them...

Angela Aguilar fell on the stage of Jaripeo Sin Fronteras

Pepe Aguilar and his sons Angela D Leonard, who successfully performed on November 11 in Monterrey, Nuevo Leon, with their show Jaripeo Sin Fronteras;...