It has been two years since the explosion of self-generative AI, and the recent progress seems to be unsatisfactory, with few breakthrough innovations in large models, no killer applications at the application level, and the capital market continues to debate the "bubble theory" and overvaluation...... People seem to have "disenchanted" AI, has the development of AI really slowed down?
Amid doubts and expectations, OpenAI, the "AI leader", released a benchmark called MLE-bench on Friday, which is specially designed to test the machine learning engineering capabilities of AI agents and establish an industry standard for measuring the machine learning capabilities of large models.
The establishment of this standard was after the debut of O1, and last month OpenAI threw out a major update, and the o1 series models with reasoning ability beyond the level of human doctors were launched, realizing a leap in the reasoning ability of large models.
The test results showed that under the MLE-bench benchmark, o1-preview won medals in 16.9% of the competitions, almost twice as much as the second place (GPT-4o, 8.7%), 5 times more than Meta Llama 3.1 405b, and 2 times more than claude 3.5.
It is worth mentioning that in addition to the leap in inference ability, the most critical breakthrough of the O1 model is to open a new Scaling Law, and at the same time form a so-called "data flywheel", which has the ability to "self-evolve".
Nvidia CEO Jensen Huang previously said that AI is designing the next generation of AI, and the speed of progress has reached the square of Moore's Law. This means that in the next one to two years, there will be amazing and unexpected progress. Altman, the founder of OpenAI, bluntly said that the progress curve of the new paradigm of AI has become steeper, and it may be possible to achieve a faster jump to the next level after having the ability to evolve.
The ability to "self-evolve" indicates that the "singularity" of AI development is accelerating, as some analysts have pointed out, OpenAI's current understanding of the singularity is not just a theory, but as a very real phenomenon that may become a reality, especially through AI agents.
In response to the question of "whether the development of AI has really slowed down", from the perspective of the latest progress in the industry and the views of technology bosses, the market has underestimated the slope of AI development.
Self-evolution towards the singularity
In its latest paper, OpenAI noted:
If AI agents are able to conduct machine learning research autonomously, they could have many positive impacts, such as accelerating scientific progress in areas such as healthcare, climate science, accelerating the security and alignment of models, and promoting economic growth through the development of new products. The ability of agents to conduct high-quality research could mark a turning point in the economy.
In this regard, there is an analysis and understanding:
OpenAI now sees the singularity theory no longer as just a theory, but as a very real phenomenon that could become a reality, especially through agents.
This is also reflected in OpenAI's naming of o1, which reset the counter to 1, marking the beginning of a new era of AI. The biggest breakthrough of o1 is not only the improvement of reasoning ability, but also the ability to have "self-learning", in addition to opening up a new Scaling Law.
The most crucial breakthrough is that o1 has the ability to "self-evolve" and take a big step on the road to AGI.
As mentioned above, O1 generates intermediate steps in the inference process, and the intermediate steps contain a large amount of high-quality training data, which can be repeatedly used to further improve the performance of the model, forming a virtuous cycle of continuous "self-reinforcement".
Just like the process of scientific development of human beings, new knowledge is constantly generated by extracting existing knowledge and mining new knowledge.
Jim Fan, a senior scientist at NVIDIA, praised the future development of O1 as a flywheel, like AlphaGo's self-game to improve its chess skills:
Strawberry easily forms a "data flywheel", and if the answer is correct, the entire search trajectory becomes a small training sample dataset with both positive and negative feedback.
This, in turn, will improve the inference core of future versions of GPT, much like AlphaGo's value network, which is used to assess the quality of each checkerboard position, as MCTS (Monte Carlo Tree Search) generates more and more granular training data.
The o1 model also represents a breakthrough in the new paradigm of large models, which opens a new Scaling Law in the inference stage.
The scaling law rule in the AI field generally refers to the continuous improvement of the performance of large models as the number of parameters, data, and computing power increases. However, after all, data is limited, and the more AI is trained, the more stupid it becomes, and the marginal return of scaling up from pre-training begins to diminish.
O1 breaks through this bottleneck to a large extent, and increases the inference process and thinking time through post training, which also significantly improves the performance of the model.
Compared with the traditional scaling law in the pre-training stage, o1 opens a new scaling law in the inference stage, that is, the longer the model inference time, the better the inference effect. As o1 opens up the paradigm innovation in the field of large models, it will lead to a shift in the focus of research in the field of AI, and the industry will move from "volume parameters" to "volume inference time", and the MLE-bench benchmark reflects this shift in measurement standards.
At the T-Mobile conference in September, Huang directly announced a 50-fold increase in computing power, reducing the response time of the O1 model from minutes to seconds:
Recently, Sam made the point that the reasoning capabilities of these AIs will become smarter, but this will require more computing power. Currently, every prompt in ChatGPT is a path, and there will be hundreds of paths internally in the future. It will do reasoning, it will do reinforcement learning, and it will try to create better answers for you.
That's why we improve inference performance by a factor of 50 in our Blackwell architecture. By improving inference performance by a factor of 50, that inference model, which now may take minutes to answer a particular prompt, can respond in seconds. So it's going to be a whole new world, and I'm excited about that.
Accelerating forward means that "the singularity is coming", as Altman previously published a long article that in the future of medical field, superintelligence can help doctors diagnose diseases more accurately and formulate personalized treatment plans; In the field of transportation, traffic flow can be optimized to reduce congestion and accidents; In the field of education, every child is equipped with AI learning partners to make educational resources fair.
The market may be underestimating the slope of AI development
Regarding the market's concerns about AI, industry bigwigs retorted that the rhythm of AI narratives is accelerating.
Speaking at an event hosted by Salesforce, Huang said:
Technology is entering a positive feedback loop, and AI is designing the next generation of AI, and the rate of progress is reaching the square of Moore's Law. This means that in the next one to two years, we will see amazing and unexpected progress.
At the T-Mobile conference last month, Altman bluntly said that the progress curve of the new AI paradigm has become steeper, and the leap to the next level will be achieved more quickly.
The new paradigm moment curve becomes steeper in time, and problems that the model cannot solve can be solved in a few months; I think the new inference model now is similar to what we had in the GPT-2 era, and you'll see it evolve to a level comparable to GPT-4 in the next few years. Over the next few months, you'll also see significant progress as we downgrade the upgrade from O1-Preview to O1 Release. The way O1 interacts will also change, and it's no longer just chatting.
Looking at OpenAI's five-level AGI roadmap, we are at AGI level 2, Altman said that it took a while to go from L1 to L2, but I think one of the most exciting things about L2 is that it is able to achieve L3 relatively quickly, and it is expected that the agents that this technology will eventually bring will be very powerful.
L1: Chatbot, an AI with conversational capabilities;
L2: The Reasoner we've just reached, an AI that can solve problems like humans;
L3: Agent, an AI system that can not only think, but also act;
L4: Innovator, AI that can assist in invention and creation;
L5: Organization, an AI that can complete organizational work;
Microsoft CTO Scott mentioned at the Goldman Sachs conference that the AI revolution is faster than the Internet revolution:
I don't think we're experiencing diminishing returns, we're making progress, and the rise of AI is still in its early stages. I encourage people not to get carried away by the hype, but AI is getting more powerful. All of us working at the forefront can see that there is still a lot of power and capacity that has not been unleashed.
While there are similarities between the AI revolution and previous technological breakthroughs such as the internet, and the advent of smartphones, this time is different, at least in terms of construction, and all of this is likely to happen faster than we have seen in previous revolutions.
What is the principle of the O1 model "self-evolving"?
Specifically, the reason why the O1 model is so impressive is that AI has learned to use Chain of Thought (CoT) technology to deal with problems through reinforcement learning (RL).
The so-called chain of thought technology refers to mimicking the human thought process, compared to the rapid response of the previous large model, the O1 model will take time to think deeply before answering the question, generate a long chain of thought internally, and reason step by step and refine each step.
Some analysts have compared it to System 2 in "Thinking, Fast and Slow":
System 1: Unconsciously think quickly, rely on intuition and experience, and react quickly, such as brushing your teeth, washing your face, etc.
System 2: Thoughtful, logical slow thinking, such as solving math problems or planning long-term goals.
The O1 model is like System 2, which inferentiates before answering the question and generates a series of thought chains, while the previous large model is more like System 1.
By dismantling the problem in a chain of thought, the model can continuously verify, correct errors, and try new strategies in the process of solving complex problems, thereby significantly improving the reasoning ability of the model.
Another core feature of the O1 model is reinforcement learning, which can carry out autonomous exploration and continuous decision-making. It is through reinforcement learning training that large models learn to improve their thinking processes and generate thinking chains.
The application of reinforcement learning in large models refers to the agent learning to take actions in the environment and obtain feedback based on the results of the actions (trial and error and reward mechanisms) to continuously optimize strategies. In contrast, the previous large model pre-training used a self-supervised learning paradigm, which usually designed a prediction task and trained the model with the information from the data alone.
In short, the previous big model was learning data, and O1 was more like learning thinking.
Through reinforcement learning and chain of thought, O1 has not only significantly improved the quantitative reasoning index, but also significantly improved the interpretability of qualitative reasoning.
However, the O1 model only makes a breakthrough in specific tasks, and does not have an advantage in text-based fields such as text generation, and O1 only shows the human thinking process, and does not yet have the real human thinking and thinking ability.
This article is from Wall Street News, welcome to download the APP to see more