laitimes

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Hengyu is from the Au Fei Temple

Quantum Position | 公众号 QbitAI

The size of the Transformer model has changed, and it is re-following the old path of CNN!

Seeing that everyone was attracted by LLaMA 3.1, Jia Yangqing sighed with such emotion.

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Comparing the development of large model size with the development of CNN, we can find an obvious trend and phenomenon:

In the ImageNet era, researchers and technology practitioners witnessed rapid growth in parameter sizes, and then began to move to smaller, more efficient models.

Doesn't it sound the same as GPT's roll-up model parameters, the industry generally agrees with the Scaling Law, and then GPT-4o mini, Apple's DCLM-7B, and Google's Gemma 2B appear?

Jia Yangqing said with a smile, "This is the pre-large model era, and many people may not remember it :)."

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Moreover, Jia Yangqing is not the only one who perceives this, and the AI god Kapasi also thinks so:

Competition for large model sizes is intensifying...... But the direction of the roll is reversed!

The model has to be "bigger" before it can be "smaller", because we need this process to help us restructure the training data into an ideal, synthetic format.

He even patted his chest and made a bet that we would see models that were good and could think reliably.

And it's the kind with very small parameter sizes.

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Even Musk said again and again in the comment area of Kapathi:

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

The above can probably be called "the big guy sees the same".

Expand and talk

Jia Yangqing's emotion starts with LLaMA 3.1, which only stayed on the strongest throne for a short time.

That was the first time that "the strongest open source model = the strongest model" was realized, and it was not surprising that everyone was looking forward to it.

However, Jia Yangqing made a point at this time:

"But I think the industry is going to really thrive because of the small vertical models."

As for what is a small vertical model, Jia Yangqing also made it very clear, such as those great small and medium-sized models represented by Patrouns AI's Iynx (the company's hallucination detection model, which surpasses GPT-4o on hallucination tasks).

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Jia Yangqing said that in terms of personal preference, he himself likes the 100 billion parameter model very much.

But in reality, he observed that the large model between the 7B-70B parameter scale is more convenient for everyone to use:

  • They are easier to host and don't require huge traffic to be profitable;
  • As long as you ask a clear question, you get a decent output – contrary to some previous beliefs.

At the same time, he heard that OpenAI's newest, fast-moving models are also starting to become smaller than the "most advanced" large models.

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

"If my understanding is correct, then this is definitely indicative of industry trends." Jia Yangqing made his point of view straightforward, "that is, in the real world, use a model that is applicable, cost-effective, and still robust." ”

So, Jia Yangqing briefly sorted out the development process of CNN.

First, there is the era of the rise of CNN.

With AlexNet (2012) as a starting point, a period of about three years of model size growth began.

VGGNet, which emerged in 2014, is a model with very strong performance and scale.

Second, there is the downsizing period.

In 2015, GoogleNet reduced the model size from "GB" to "MB", that is, 100 times smaller; However, the performance of the model did not decrease dramatically, but maintained a good performance.

A similar trend is followed by the SqueezeNet model, which was launched in 2015.

Then for a while, the focus was on the pursuit of balance.

Follow-up studies, such as ResNet (2015) and ResNeXT (2016), have maintained a moderate model size.

It's important to note that the size of the model doesn't lead to a reduction in the amount of computation – in fact, everyone is willing to invest more computational resources in a state that is "more efficient with the same parameters".

Then came the period when CNN danced on the side.

As an example, MobileNet is an interesting job that Google launched in 2017.

It's fun because it takes up very few resources, but the performance is excellent.

Just last week, someone mentioned to Jia Yangqing: "Wow~ We are still using MobileNet because it can run on devices and has excellent feature embedding generality." ”

最后,贾扬清借用了来源于Ghimire等人的《A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration》里的一张图:

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

And once again ask your own questions:

Will the large model size follow the same trend as in the CNN era?

What do netizens think?

In fact, there are many examples of GPT-4o mini walking on the road of large-scale model development, "not big but small".

When the above people expressed such a view, some people immediately nodded their heads like garlic, and gave some other similar examples to prove that they saw the same trend.

Someone immediately followed:

I have a new positive example here! Gemma-2 distills the knowledge of the 27B parameter size model into a smaller version.
Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Some netizens said that the development of a larger model means that it can give "upper strength" to the training of subsequent generations of smaller and more vertical models.

This iterative process culminates in what is known as the "perfect training set".

In this way, smaller large models can be just as smart, if not smarter, in a given domain than large models with huge parameters today.

In a nutshell, the model has to be bigger before it can be smaller.

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Most of those who discuss this view agree with this trend, with some bluntly saying that "this is a good thing, more practical and useful than the 'my model is bigger than yours' parameter contest." ”

But, of course!

Looking through the comment area on the Internet, there are also people who make different voices.

For example, the following friend left a message under Jia Yangqing's tweet:

Mistral Large (Mistral AI), LLaMA 3.1 (Meta), and OpenAI, the companies with the most competitive models, are likely to be training larger models right now.

I don't see a trend of "smaller models make technological breakthroughs".

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Faced with this problem, Jia Yangqing also replied in time.

Here's what he said: "That's right! When I say that large model sizes may be following the old path of CNNs, I definitely don't mean that we are calling on everyone to stop training larger models. ”

He further explained that the original meaning of this is that as the technology (including CNNs and large models) is more and more widely implemented, people have begun to pay more and more attention to more cost-effective models. ”

Jia Yangqing: The size of the large model is returning to the old path of CNN; Elon Musk: It's the same at Tesla

Therefore, perhaps more efficient small-· large models can redefine the "intelligence" of AI and challenge the assumption that "bigger is better".

Do you agree with this view?

Reference Links:

[1]https://x.com/jiayq/status/1818703217263624385

[2]https://x.com/fun000001/status/1818791560697594310

[3]https://www.patronus.ai/

[4]https://twitter.com/karpathy/status/1814038096218083497

— END —

QubitAI · 头条号签约

Follow us and be the first to know about cutting-edge technology trends

Read on