Lead

It has set off a storm in the technology circle, and everyone is discussing the emergence of GPT-3, its training model is almost the largest in history, with this huge model, it is only a small case to be able to do human-computer dialogue, which has attracted the attention of a large number of people.

However, according to relevant parties, ChatGPT is also a very good pre-trained model, and users' answers are quite consistent for human-computer conversations.

In the process of training, ChatGPT only needs to be trained once to successfully train the machine's language model, and this training is quite surprising.

ChatGPT was born.

ChatGPT is another epoch-making pre-trained model product after GPT-3, which is being developed by OpenAI, and the company is constantly working on developing a general artificial intelligence.

In the first half of 2019, the company successively performed GPT, GPT-2 and other excellent training models, and under the current environment, this pre-trained model, also known as ChatGPT, has attracted all-round attention from mainland scientific research forces.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

ChatGPT is a general pre-trained model officially released by OpenAI, and this pre-trained model uses the fusion of GPT-3, GPT-2 and GPT models, so ChatGPT can not only train general language models, but also greatly optimize human-computer dialogue.

In addition, ChatGPT's training model is also very large, with a total of 175 billion parameters, and a large amount of training data is required before the GPT model is trained, and the parameters of this ChatGPT pre-trained model alone need to use 570GB of text data for training.

At the same time, ChatGPT's pre-training data is also very rich, with about 45T of text data, which is equivalent to watching 135,000 videos on Douyin every day, what a terrible number, but this pre-training data is only ChatGPT's pre-training data.

With the training data of such a large climate pre-trained model, coupled with the gradual improvement of the training method, ChatGPT can have a strong generalization ability.

ChatGPT's largest language model pre-trained model also has 175 billion parameters, which is 3 times the size of GPT-3, so ChatGPT is also called GPT-3.5.

ChatGPT's reasoning ability is very powerful, and this reasoning ability is stronger than our human ability, because its ability is more learned from data, and it is also capable of inference in a dozen steps.

But GPT-3 has 175 billion parameters, and the pre-trained model that comes out of this must be very large, so how many parameters are there in ChatGPT's language model, counting all the parameters that can be trained, the parameters of ChatGPT's language model are 22.9 billion.

ChatGPT训练多疯狂。

For a good professional, only the minimum required computing power can train a large number of models, and with the continuous improvement of pre-training methods, it only takes a few days to successfully train ChatGPT's language model.

In fact, the training time of ChatGPT is 12 days, which is indeed a very short time, especially the training resources, whether it is personnel or materials, are very important.

The training time of ChatGPT is 12 days, which is only equivalent to a year for 12,000 people, but in the training process, the computing resources equivalent to 1,114 TPU have been being calculated.

In the process of training, ChatGPT used up to 12 hours at once, and these hours were spent on the training link or the data of the training optimization process.

Although ChatGPT only needs to be trained once, the training resources and amount required are very large, and the amount of these data and so on are always optimized, so the resources occupied during training will become very large.

In the process of training, ChatGPT uses 80,000 nodes, and in about 20 days, the data of these nodes will become very large, and the number of TPU cores used is about 34 million.

ChatGPT will generate 1500TB of data in the process of training, and these data are compressed, which is equivalent to generating 280TB of duplicate data.

However, this data is generated by serialization based on the data in the chat log, and at the time of training, a large amount of extra data is invalid data, so this data is not useful at all.

Although a large amount of invalid data will be generated in the training process, it requires a lot of resources, for example, in the training process, a lot of useless models will be generated, and although they are useless, they will also be used in the prediction process.

However, ChatGPT is already making more optimizations in the process of training, although these optimizations are not very large, but they are already very troublesome when selecting models, but these models are useless because they are useless, so they are also useless when predicting.

Although it is useless, we will still skip these models later, so this is also a wasteful point when training, and there will be a lot of garbage models in the training process, but how can these models reduce the garbage generated.

ChatGPT has been trained to produce as little garbage as possible, but no matter what, it is indispensable, and the number of these "garbage" models is unknown.

Stability of ChatGPT training.

For a pre-trained model, if the trained data is very large, then the stability of the model is very high, but if the amount of data is not so large, and the trained model is very large, the stability of the model is very strong.

On the other hand, in the pre-training process of ChatGPT, the process steps using data have been optimized to make the model training more stable, so that the model trained by ChatGPT is the best.

In the process of training, we need to make effective use of time, and compared with the time of the previous GPT-3 model, it is 10 days less, which can be seen that the stability of the ChatGPT model is guaranteed to a large extent.

Because the data is constantly changing during the training process, it is necessary to continuously optimize the model to ensure that the model is the most stable, but the stability of the model is very important.

It determines whether a trained model will eventually live up to expectations, and if there is a problem with stability, then there will be a lot of problems in its training process, and it will be difficult to predict.

epilogue

In the training process of ChatGPT, due to the optimization of the amount of data, the optimization of the model and other comprehensive factors, the trained model is more stable, so we need to pay attention to this part when training ChatGPT.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

Lead

ChatGPT was born.

ChatGPT训练多疯狂。

Stability of ChatGPT training.

epilogue

Read on

AI Book Download: "ChatGPT Wealth Stream - Build a Side Hustle Empire and Make Money Online Fast"

Is the ChatGPT moment for self-driving?

ChatGPT and other models are frantically trained to exhaust public text data as soon as 2026

After dismantling 30 short videos of Douyin same-city store exploration with Wanzan, I used ChatGPT to make a short video copywriting assistant for same-city store exploration

Continuous downtime! Users in many countries are affected! ChatGPT collapsed?

There are indeed a lot of ads in Baidu search, and Ye Jun, the president of DingTalk, is really right, you search for things in Baidu, the first three ads, and there is an ad on the middle content page, and the tail is good

ChatGPT App is available on the Mac platform and is fully open to all users

OpenAI postpones ChatGPT's new advanced voice mode

Starting today, the Mac version of the ChatGPT app is available to everyone! GPT-4o voice function for another month

Quest 2库存已基本售罄；亚马逊秘密研发AI聊天工具挑战ChatGPT

ChatGPT is now available on macOS for everyone

How to use ChatGPT for reverse marketing?

The Firefox browser sidebar will be connected to AI chatbots such as ChatGPT

The interface effect of ChatGPT: Does quasi-social communication experience reduce the willingness of attachment-avoidant individuals to accept technology?

Jia Yu, Luo Chen, etc., function-oriented or entertainment-oriented? Research on the impact of ChatGPT usage type on user satisfaction