laitimes

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Editor: Editorial Department

Groq has put pressure on Nvidia again! Not only did it show a record-breaking output speed of 1,256 tokens per second, but the latest round of financing of $640 million provided the confidence to challenge NVIDIA in the field of AI chips.

Nvidia has ushered in a strong challenger again.

Founded in 2016, Groq raised $640 million in its latest funding round, led by the BlackRock Inc. fund and backed by Cisco and Samsung's investment arm.

Currently, Groq is valued at $2.8 billion.

The company's founder, Jonathan Ross, worked on TPU chips at Google, and Groq's current pillar LPU is dedicated to accelerating AI foundational models, especially LLMs.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Ross said the use of LLMs will increase further once people see how convenient it is to use large language models on Groq's fast engine.

With a lower price and energy consumption, it can achieve the same speed as Nvidia's chips, or even faster, so that Groq has the confidence to call Nvidia.

It is worth mentioning that Groq also announced that Turing Award winner LeCun will be serving as a technical advisor.

The official addition of LeCun makes it a strong ally for Groq in the highly competitive chip space.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

1256.54 tokens per second, lightning fast

The world's martial arts, only fast is not broken.

And the only Groq that can beat the response of 800 tokens per second is the next-generation Groq.

From 500 tokens to 800 tokens to 1256.54 token/s, Groq is so fast that many GPUs can't match it.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

With the unobtrusive release of new features in early July, Groq now delivers results much faster and smarter than previously demonstrated, with the ability to query not only text but also voice commands.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

By default, Groq's website engine uses Meta's open-source Llama3-8b-8192 large language model.

Users can also choose from the larger Llama3-70b, as well as the Gemma and Mistral models from Google, and others will be supported soon.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

This fast and flexible experience is important for developers. In traditional AGI processing, waiting is commonplace, and you have to watch the characters spit out one by one before proceeding to the next step.

In the latest version of Groq, almost all of these tasks are answered instantaneously, as fast as lightning.

Take a chestnut. For example, ask it on Groq to comment on what could be improved on the VB Transform agenda.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

1225.15 token/s - almost instantaneously answered and popped up.

And the content is also very detailed and clear, including suggestions for clearer classification, more detailed conference descriptions, and better speaker bios, etc., with a total of 10 revisions.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

When voice input asks for a number of great speakers to make the lineup more diverse, it instantly generates a list of names, organizations, and topics to choose from, and presents them in a clear tabular format.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Ask it to add a list of contact information, and also instantly add your email address and Twitter account, no problem.

Another chestnut. In the video, Barabala speaks for more than a minute, asking Groq to create a schedule for next week's speaking session.

Groq not only patiently understood, creating the requested form, but also allowing for quick and easy revisions, including spelling corrections.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

I can also change my mind and ask it to create additional columns for what I forgot to ask for, be patient, efficient, and meticulous, and be perfect in the eyes of Party A and Party B as such.

It can also be translated into different languages. Sometimes it takes several requests to make a correction, but this error is usually at the LLM level, not the processing level.

It can be said that from 500 token/s to 800 token/s and now directly to a four-digit generation rate per second, GPT-4 and Nvidia are more thorough.

Of course, in addition to "fast", another highlight of this update is that in addition to directly entering queries in the engine, users are also allowed to query through voice commands.

Groq uses OpenAI's latest open-source automatic speech recognition and translation model, Whisper Large v3, to convert speech to text and then serve as a prompt for LLM.

Speed up and efficiency increase, plus multi-modal input, no lag and no typing, this innovative way of use provides users with great convenience.

Groq + Llama 3强强联合

On July 17, Groq research scientist Rick Lamers officially announced a "secret project" on Twitter - fine-tuned Llama3 Groq Synth Tool Use models 8B and 70B, aiming to improve the use of AI tools and function calling capabilities.

The team combines full fine-tuning and Direct Preference Optimization (DPO) and uses entirely ethically generated data with no user data involved.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

伯克利函数调用排行榜(Berkeley Function-Calling Leaderboard, BFCL)中的数据全部来源于真实世界,专门用于评估LLM调用工具或函数的的能力。

The fine-tuned versions of Groq's Llama3 8B and 70B both achieved impressive results on BFCL, with overall accuracy of 90.76% and 89.06%, respectively.

其中,70B版本的分数超过了Claude Sonnet 3.5、GPT-4 Turbo、GPT-4o和Gemini 1.5 Pro等专有模型,达到了BFCL榜单第一的位置。

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Both versions of the model are open source, and users can download weights from HuggingFace or access them via GroqCloud.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

HugggingFace地址:https://huggingface.co/Groq

In addition, Groq has also taken Llama 3 to the next level by launching an app called Groqbook, which can generate an entire book in less than 1 minute.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

GitHub address: https://github.com/Bklieger/groqbook

According to GitHub's homepage, Groqbook uses a mix of Llama3-8B and 70B models, with the larger model generating the structure and the smaller model creating the specific content.

Currently, this app is only available for non-fiction books, and requires the user to enter the title of each chapter as context.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Groq said that in the future, Groqbook will be able to generate entire book content and expand to fiction books to create high-quality novels.

The number of developers exceeded 280,000 in 4 months

It solves the core pain points of user use, and Groq is naturally popular with users.

Four months after launch, Groq has started offering a free service to handle LLM workloads, attracting more than 282,000 developers.

Groq provides a platform for developers to build their applications, similar to other inference service providers.

However, what makes Groq special is that it allows developers building applications on OpenAI to migrate their applications to Groq in a matter of seconds with simple steps.

Ross said he will soon focus on the high-demand corporate market. Large companies are pushing the deployment of AI applications extensively, and as a result, more efficient processing power is needed to handle their workloads.

Groq says its technology uses about one-third of the power of the GPU in the worst-case scenario, while most workloads use only one-tenth of the power.

Against the backdrop of expanding LLM workloads and growing energy demands, Groq's efficient performance poses a challenge to the GPU-dominated computing landscape.

Although Nvidia is good at AI training, it has limitations in inference, and Groq's chips have several times advantages in inference speed and cost, and the market share of inference will increase from the current 5% to 90%-95% in the future.

Ross confidently claims that 1.5 million LPUs will be deployed by the end of next year, accounting for half of the world's inference demand.

LPU: Fast, really fast

GPUs are currently the preferred choice for model training, but higher efficiency and lower latency are also extremely important when deploying AI applications.

Just as the first time Groq broke into the public eye because of one word, "fast", Groq continues to speed up the track this time.

Groq promises to get things done faster and more cost-effectively than its competitors, thanks in part to its language processing units (LPUs).

Compared to GPUs, LPUs reduce the overhead of managing multiple threads and avoid underutilization of cores. In addition, Groq's chip design allows multiple dedicated cores to be connected without the traditional bottlenecks found in GPU clusters.

There are significant differences between the operating principle of LPUs and GPUs, specifically, the use of a Temporal Instruction Set Computer architecture, which does not require the need to load data from memory as frequently as GPUs that rely on high-bandwidth memory (HBM).

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

The LPU does not rely on external memory, and its weights, key-value cache (KV cache), and activation functions are all stored on the chip during processing, which can not only avoid the problem caused by the shortage of HBM, but also effectively reduce costs.

Unlike Nvidia GPUs, which rely on high-speed data transfer, Groq's LPUs do not use HBM in their system architecture, but instead use SRAM.

With only 230MB of SRAM per chip, no complex model can run on a single chip. It's worth mentioning that SRAM is about 20 times faster than the memory used by GPUs.

图灵奖得主Lecun加盟AI芯片黑 马Groq,估值28亿挑战英伟达!

Given that the amount of data required for AI inference computation is significantly reduced compared to model training, Groq's LPUs show better energy savings.

When performing inference tasks, the amount of data read from external memory is significantly reduced, and the power consumption is significantly lower than that of the GPU.

Unfortunately, NVIDIA's GPUs can be used for both training and inference, but LPUs are designed for model inference only.

Resources:

https://venturebeat.com/ai/groq-releases-blazing-fast-llm-engine-passes-270000-user-mark/

https://the-decoder.com/ai-startup-groq-raises-640-million-to-challenge-nvidias-dominance-in-ai-chips/

Read on