laitimes

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

author:Andy

Lead

It has set off a storm in the technology circle, and everyone is discussing the emergence of GPT-3, its training model is almost the largest in history, with this huge model, it is only a small case to be able to do human-computer dialogue, which has attracted the attention of a large number of people.

However, according to relevant parties, ChatGPT is also a very good pre-trained model, and users' answers are quite consistent for human-computer conversations.

In the process of training, ChatGPT only needs to be trained once to successfully train the machine's language model, and this training is quite surprising.

ChatGPT was born.

ChatGPT is another epoch-making pre-trained model product after GPT-3, which is being developed by OpenAI, and the company is constantly working on developing a general artificial intelligence.

In the first half of 2019, the company successively performed GPT, GPT-2 and other excellent training models, and under the current environment, this pre-trained model, also known as ChatGPT, has attracted all-round attention from mainland scientific research forces.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

ChatGPT is a general pre-trained model officially released by OpenAI, and this pre-trained model uses the fusion of GPT-3, GPT-2 and GPT models, so ChatGPT can not only train general language models, but also greatly optimize human-computer dialogue.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

In addition, ChatGPT's training model is also very large, with a total of 175 billion parameters, and a large amount of training data is required before the GPT model is trained, and the parameters of this ChatGPT pre-trained model alone need to use 570GB of text data for training.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

At the same time, ChatGPT's pre-training data is also very rich, with about 45T of text data, which is equivalent to watching 135,000 videos on Douyin every day, what a terrible number, but this pre-training data is only ChatGPT's pre-training data.

With the training data of such a large climate pre-trained model, coupled with the gradual improvement of the training method, ChatGPT can have a strong generalization ability.

ChatGPT's largest language model pre-trained model also has 175 billion parameters, which is 3 times the size of GPT-3, so ChatGPT is also called GPT-3.5.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

ChatGPT's reasoning ability is very powerful, and this reasoning ability is stronger than our human ability, because its ability is more learned from data, and it is also capable of inference in a dozen steps.

But GPT-3 has 175 billion parameters, and the pre-trained model that comes out of this must be very large, so how many parameters are there in ChatGPT's language model, counting all the parameters that can be trained, the parameters of ChatGPT's language model are 22.9 billion.

ChatGPT训练多疯狂。

For a good professional, only the minimum required computing power can train a large number of models, and with the continuous improvement of pre-training methods, it only takes a few days to successfully train ChatGPT's language model.

In fact, the training time of ChatGPT is 12 days, which is indeed a very short time, especially the training resources, whether it is personnel or materials, are very important.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

The training time of ChatGPT is 12 days, which is only equivalent to a year for 12,000 people, but in the training process, the computing resources equivalent to 1,114 TPU have been being calculated.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

In the process of training, ChatGPT used up to 12 hours at once, and these hours were spent on the training link or the data of the training optimization process.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

Although ChatGPT only needs to be trained once, the training resources and amount required are very large, and the amount of these data and so on are always optimized, so the resources occupied during training will become very large.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

In the process of training, ChatGPT uses 80,000 nodes, and in about 20 days, the data of these nodes will become very large, and the number of TPU cores used is about 34 million.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

ChatGPT will generate 1500TB of data in the process of training, and these data are compressed, which is equivalent to generating 280TB of duplicate data.

However, this data is generated by serialization based on the data in the chat log, and at the time of training, a large amount of extra data is invalid data, so this data is not useful at all.

Although a large amount of invalid data will be generated in the training process, it requires a lot of resources, for example, in the training process, a lot of useless models will be generated, and although they are useless, they will also be used in the prediction process.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

However, ChatGPT is already making more optimizations in the process of training, although these optimizations are not very large, but they are already very troublesome when selecting models, but these models are useless because they are useless, so they are also useless when predicting.

Although it is useless, we will still skip these models later, so this is also a wasteful point when training, and there will be a lot of garbage models in the training process, but how can these models reduce the garbage generated.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

ChatGPT has been trained to produce as little garbage as possible, but no matter what, it is indispensable, and the number of these "garbage" models is unknown.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

Stability of ChatGPT training.

For a pre-trained model, if the trained data is very large, then the stability of the model is very high, but if the amount of data is not so large, and the trained model is very large, the stability of the model is very strong.

On the other hand, in the pre-training process of ChatGPT, the process steps using data have been optimized to make the model training more stable, so that the model trained by ChatGPT is the best.

In the process of training, we need to make effective use of time, and compared with the time of the previous GPT-3 model, it is 10 days less, which can be seen that the stability of the ChatGPT model is guaranteed to a large extent.

How crazy is ChatGPT? Just one training session is equivalent to a year for 12,000 people

Because the data is constantly changing during the training process, it is necessary to continuously optimize the model to ensure that the model is the most stable, but the stability of the model is very important.

It determines whether a trained model will eventually live up to expectations, and if there is a problem with stability, then there will be a lot of problems in its training process, and it will be difficult to predict.

epilogue

In the training process of ChatGPT, due to the optimization of the amount of data, the optimization of the model and other comprehensive factors, the trained model is more stable, so we need to pay attention to this part when training ChatGPT.

Read on