After more than 1 year, Mr. Li Mu, the AI god, has finally returned to the "Filling the Pits" classic paper intensive reading series at Station B!
That's right, the latest protagonist is Llama-3.1:
In this 18-minute video, Mr. Li Mu taught us to read the Llama-3.1 technical paper by hand, and even broke some gossip. (Directions: about 7 minutes and 50 seconds)
He mentioned that when the Llama team trained the largest model, they actually trained both the MoE model and the dense model at the beginning, but the former later failed, so only a simple version was released in the end.
For more details, you can go to watch the original video, and quietly revealed that as soon as Mr. Li Mu was launched, the broadcast instantly exceeded 10,000.
A large number of research monks are rushing to hear the wind, and they will know everyone's mental state when they look at the hot one:
(The video address will be picked up at the end of the article)
At the same time, the large model arena ranking was updated, and Llama-3.1-405B represents the first time that the open-source model has entered the top three in the arena, behind GPT-4o and Claude-3.5-Sonnet.
Of course, this result may not be surprising, Meta officials have long secretly made a comparison of these two.
Also human assessments, Llama-3.1-405B and the other two are evenly matched.
In addition, we see that Llama-3.1-405B can not only play overall, but still firmly occupy the top three in individual items (coding, math, instruction following, hard prompts).
It is worth mentioning that Llama-3.1-70B also came to No. 9 in the overall list, and the overall confidence level has been greatly improved compared to before.
However, the most surprising thing is that at the same time, foreign netizens also sent congratulatory messages to the new achievements of 405B, and some people reminded them "intimately":
The 405Bs had only been trained to be "computationally optimal", and they could have moved on, and the next iteration would have been amazing.
OK,知道Llama-3.1-405B很腻害了!
No, it's only been released for a week, and netizens have already played with it......
Take it for production
The first step in production is to try it locally~
The Technical Community Manager of Open Interpreter (a project that allows LLMs to run locally) showed us his results -
让Llama-3.1-8B在树莓派上运行,仅用到CPU。
[I can't insert a video here, sorry...... You can go to the qubit public account to view~】
To do this, just download the llamafile file from GitHub or Hugging Face, and configure the parameters.
据他透露, 这项尝试使用了Raspberry Pi 5(8GB内存)、M.2 Hat和Hailo AI模块,且采用了4-bit quantization(4位量化)。
However, the little brother also ridiculed that this guy can really burn the CPU · dry with a few words.
Next, the little brother has been sharpening his knives under the urging of netizens to 405B~
In addition to the above example, there are also users who use Llama-3.1-405B to start creating chatbots on any GitHub repository.
[I can't insert a video here, sorry...... You can go to the qubit public account to view~】
And it doesn't cost money, Hugging Face offers the ability to create new assistants for free.
However, Groq engineer Rick Lamers questioned it after trying:
The current RAG pipeline may be problematic and prone to hallucinations.
But no matter what, netizens still can't stop wanting to try~
In addition, in addition to actually taking out things, some netizens took Llama-3.1-405B and put smoke bombs.
Just now, netizen Hassan announced:
Use Llama-3.1-405B to generate a full React app.
Wouldn't it be easier to develop an app, good guys!
Although it has not been officially opened up, netizens have begun to line up.
More ways to play are welcome to unlock by yourself~
— END —
QubitAI · 头条号签约
Follow us and be the first to know about cutting-edge technology trends