In the new era of large models, small companies stand aside?

Meta's third-generation large model Llama 3 was finally officially unveiled this week: the maximum parameter scale exceeds 400 billion, the training token exceeds 15 trillion, and the winning rate of GPT-3.5 human evaluation is more than 60%, officially known as "the strongest open-source model on the surface". In the "involution" of major technology giants, the large model has finally come to a critical turning point. Morgan Stanley pointed out that the world is entering a new era of rapid growth in the capabilities of large models driven by both hardware and software, and the ability of large models to be creative, strategic thinking and handling complex multi-dimensional tasks will be significantly improved. The report highlights that the training of large models in the future will require unprecedented computing power, which will lead to a significant increase in development costs. A report released this week by Morgan Stanley's team of analysts at Stephen C Byrd predicts that the high cost of supercomputers to train the next generation of large models is a huge challenge even for tech giants, let alone small companies. The report further points out that in addition to high capital expenditures, barriers to chip power supply and AI technology are also increasing. Together, these factors constitute a significant barrier to entry into the big model space, which can make it difficult for smaller companies to compete with powerful giants. As a result, Morgan Stanley has given an overweight rating to big tech companies such as Google, Meta, Amazon, and Microsoft, which are expected to take a leading position in the development of large models by virtue of their advantages in technology, capital, and market. At the same time, while small companies may be marginalized in the world of big models, smaller models with lower costs will create new opportunities for them.

Morgan Stanley pointed out that in the near future, the computing power required to develop large models will achieve exponential growth, which is closely related to the progress of chip technology, and NVIDIA's "strongest chip in history" Blackwel is one of the key technologies to promote the growth of computing power. Take OpenAI training the GPT model as an example. Morgan Stanley notes that GPT-4 currently takes about 100 days to train, uses 25,000 NVIDIA A100 GPUs, processes 13 trillion tokens, and involves about 1.76 trillion parameters. The total hashrate of these A100s (measured by FP8 teraFLOPs) is about 16 million. teraFLOPs are a unit of measurement of floating-point performance, representing how many trillions of floating-point operations can be performed per second. The total number of floating-point operations required for GPT-4 training is about 137 trillion. For the upcoming GPT-5, Morgan Stanley expects that the training of the model will require the deployment of 200,000-300,000 H100 GPUs and will take 130-200 days.

Supercomputers will make exponential growth expectations much easier to achieve. Morgan Stanley models show that later this decade, supercomputers will provide more than 1,000 times more computing power for developing large models than they do today. Using Blackwell's supercomputer, it only takes 150-200 days of training time to develop a completely new large model, which provides 1400-1900 times more computing power than the current model (such as GPT-4). The report also mentions that in the future, the annual computing power required by GPT-6 will account for a considerable percentage of Nvidia's annual chip sales. The estimated cost of a 100-megawatt data center using a B100 or H100 GPU could be $1.5 billion. Morgan Stanley sees Nvidia as a key driver of computing power growth. According to forecasts, Nvidia's computing power will grow at a compound annual growth rate of 70% from 2024 to 2026. This growth rate is calculated based on SXM (which may be the codename for one of NVIDIA's products or services) and FP8 Tensor Cores, a performance metric.

However, developing ultra-powerful models and the supercomputers needed to train them involves a complex set of challenges, including capital investment, chip supply, power demand, and software development capabilities. These factors constitute a major barrier to entry into this space, which will open up more opportunities for those tech giants with deep capital and leading technology. In terms of capital investment, Morgan Stanley compared Google, Meta, Amazon and Microsoft's data center capital expenditures in 2024 for a range of supercomputers of different sizes, with the estimated cost of a 1 gigawatt supercomputer facility being around $30 billion, while larger supercomputers could cost as much as $100 billion.

Morgan Stanley expects the four U.S. hyperscale companies to spend about $155 billion and more than $175 billion in data center capital expenditures in 2024 and 2025, respectively. These huge numbers will put small businesses off their feet. The agency also believes that Google, Meta, Amazon and Microsoft will be the direct beneficiaries of the growth in computing power, giving four companies an overweight rating.

While small companies may be marginalized in the development of larger and more complex models, the development of small models will create new opportunities for them. Morgan Stanley said that the low development cost of small models may realize significant benefits in specific industry sectors in the future and drive the rapid adoption of general AI technology.

Our latest general AI model includes a tool that calculates the data center costs associated with training small models, which we believe is a useful starting point for assessing the return on return (ROIC) that a domain-specific small model is likely to proliferate.

We believe that the declining cost and increased capabilities of small models have strengthened our assessment of the adoption of AI general technologies in many areas.

It is worth noting that in addition to the advancement of hardware such as chips, the innovation of software architecture will also play a key role in promoting the improvement of large model capabilities, especially the Tree of Thoughts architecture. Proposed in December 2023 by researchers at Google's DeepMind and Princeton University, the architecture is inspired by the way human consciousness works, specifically the so-called "System 2" thinking. "System 2" is a long-term, highly deliberate cognitive process, as opposed to rapid, unconscious "System 1" thinking, which is more akin to how the current large models work.

This shift will allow large models to work in a way that more closely resembles a human thought process, highlighting AI's ability to be more creative, strategic thinking, and complex, multidimensional tasks.

Computing costs have dropped sharplyMorgan Stanley's proprietary data center model predicts that the rapid rise in computing power of large models means that computing costs will decrease rapidly. From the evolution of the single-chip generation (from Nvidia Hopper to Blackwell), the cost of computing has dropped by about 50%.

OpenAI CEO Sam Altman previously emphasized the importance of declining computing costs as a key resource for the future, arguing that computing power could become the world's most valuable commodity, comparable in importance to currency. In addition, the report predicts that a handful of very large supercomputers will be built, most likely in the vicinity of existing nuclear power plants. In the U.S., Morgan Stanley expects Pennsylvania and Illinois to be the best places to develop supercomputers because of the multiple nuclear power plants in these regions capable of supporting the energy needs of multi-gigawatt supercomputers. ⭐ Star Wall Street news, good content do not miss ⭐ This article does not constitute personal investment advice, does not represent the views of the platform, the market is risky, investment needs to be cautious, please make independent judgment and decision-making.

In the new era of large models, small companies stand aside?

Read on

【AASLD2024 Express】Prediction of HBsAg clearance by peginterferon α-2b treatment: a simple model based on baseline HBsAg levels

Large models lead the 6G revolution! The latest review explores the future of communication methods, covering multimodality, RAG, etc

The top CP of the large model turned from sweet to abusive: they were dissatisfied with each other, and they all looked for a spare tire, because the money was unpleasant

Archetype AI released a large model of Newtonian physics to learn physics principles from sensor data

CNCC | The future of multimodal affective computing under large models

The "Fuxi Eye" large model was released! It has the world's largest ophthalmic image database

New car | The AI large model is on the car, 13 new/27 optimizations, and the ZEEKR 009 glorious OTA upgrade

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Surveying and Mapping Bulletin | Ren Ping: Noise data visualization based on LOD1 city model

The terminal AI grading standard has been implemented, and the "fire" of the mobile phone model has burned to the agent

J Clin Invest丨Yang Weili/Li Shihua/Li Xiaojiang's team used monkey models to reveal new pathological mechanisms of Parkinson's disease

Tens of millions of dollars lost by poisoning for large model training? Anthropic found a hidden bug in the LLM codebase

Nearly 1,000 teenagers in the city gathered at Zhonghai Expo to show their skills in the three major model competitions of navigation, aviation and architecture

DeepMind and MIT developed Fluid, which enables autoregressive models to achieve large-scale expansion of Wensheng graphs

AI Weekly | ByteDance's large model training was "poisoned"; Microsoft will terminate the Azure OpenAI service for individuals in China

ByteDance responded to the attack on the intern for the training of the large model: it has been dismissed and does not affect the online business