案例解析：RAG的尽头是Agent

AI Evolution: From RAG to Agent, How Agents Will Reshape the Future World

introduction

With the release of ChatGPT, ChatGPT-4, etc., we have been completely conquered by the charm of large models (LLMs), and more and more companies and enterprises have begun to focus on the research and development and use of large model technology, which has brought great convenience to our daily lives. However, large models also face various problems such as timeliness and accuracy, how to make LLM better? How to solve the challenges faced by LLMs? How do I build an advanced LLM application? It has gradually become an important research topic in the field of AI.

In order to solve some of these problems, RAG was born, and RAG (Retrieval Enhanced Generation) technology has brought significant breakthroughs in the field of natural language processing. By combining information retrieval and text generation technologies, RAG enables machines to understand and respond to human language more accurately. But with the application of RAG, we are gradually aware of the limitations of RAG, so what are the pain points of RAG? And how do we solve it? Let's find out!

RAG pain points

RAG technology can play a role in some common natural language processing tasks, such as question answering systems, intelligent assistants and virtual agents, information retrieval, knowledge graph filling, etc., through RAG, to establish a huge knowledge base, when users query, use information retrieval to query relevant text fragments or real-time data from the knowledge base, and then we screen, sort and weight the retrieved information, and finally use the integrated information as the input of the generative model, which is undoubtedly to improve the accuracy of the answer. Reduce false information and greatly enhance the usability of large models.

But RAG was originally designed for simple questions and small document sets, so it was fast and efficient. Accurate output, such as:

What are the main risk factors for Tesla? (over Tesla 2021 10K)

作者在YC期间做了什么？（Paul Graham essay）

By using a specific knowledge base, LLMs can give good answers to these simple questions, however, RAG may not produce accurate or satisfactory results for certain types of questions, such as:

Wrapping up question: "Give me a summary of the annual report of Company XXX"

Comparative question: "Comparing the open source contributions of Developer A and Developer B"

Structured Analysis + Semantic Search: "Tell Me About the Risk Factors for the Top Performing Ridesharing Companies in the U.S."

Comprehensive Multi-Part Questions: "Tell me the arguments for X in Article A, then tell me the arguments for Y in Article B, make a table according to our internal style guide, and then draw your own conclusions based on these facts"

Naive RAG, many times is just a search system, for some simple questions or queries, can give users good feedback, but there are many complex questions/tasks that it can't answer, so what can we do when encountering complex questions/tasks?

RAG To Agent

Conventional RAG applications often augment large models with their own knowledge base for more accurate, real-time, and rich vertical content or personalized results, but this is still limited to content generation. If you need AI to be like an efficient employee who "starts with the end in mind", chooses the tools and communicates and collaborates with different systems until the final result is delivered, then you need to move from RAG to Agent.

The transition from RAG to Agent does not mean abandoning RAG, but adding the following levels of functionality on top of it:

● Multi-round conversations: Engage in deeper communication with users to identify user intent

●Query/Task Planning Layer: Ability to understand and plan complex queries and tasks

● External Environment Tool Interface: Use external tools to complete tasks

●Reflection: Reflect, summarize and evaluate the results of implementation

● Memory: Maintain the history of user interactions to provide personalized services

By adding these features, the agent can not only adapt to complex tasks, but also respond flexibly in changing environments. In contrast to RAG, Agents focus on specific tasks and are more focused on integration with existing systems. Not only can it understand language, but it can also take action in real-world or digital systems, and it can perform complex multi-step tasks such as retrieving and processing information, and it can seamlessly connect to various systems and APIs, access user data, and interact with databases.

One of the most obvious characteristics of human beings is that they use tools. As an agent, the agent can also handle more complex tasks with the help of external tools. For example, the agent can use the chart generator to generate online charts, and the weather query tool can be used to query the weather. It can be seen that Agent is the key to truly unleash the potential of LLM, so our LLM application will eventually shift from RAG to Agent, and Agent is undoubtedly the end of RAG.

Case Study

Recently, Ali Qianwen's team developed an agent (agent) that can be used to understand documents containing millions of words by combining RAG with RAG, although it only uses the 8k context of the Qwen2 model, but the effect is better than that of RAG and long sequence native models.

1. Agent build

The agent is built with three levels of complexity, each of which builds on the previous layer

Level 1: Search

How to find out the most relevant chunks for extracting keywords is divided into three main steps:

● Step 1: Separate the command information from the non-command information

User type: "Please elaborate in 2,000 words when answering, my question is, when was the bicycle invented?" Please reply in English.

Breakdown of information: {"information": ["When was the bicycle invented"], "Instruction": ["Answer in 2000 words", "Be as detailed as possible", "Reply in English"]}.

● Step 2: Derive multilingual keywords from the chat model.

Input: "When was the bicycle invented"

Information conversion: {"keyword_English": ["bicycles", "invented", "when"], "keyword_Chinese": ["bicycle", "invention", "time"]}.

●Step 3: Use the BM25 keyword search method.

Level 2: Read in chunks

How to solve the problem of invalidity when the overlap between the relevant blocks and the user's query keywords is insufficient, resulting in these related blocks not being retrieved and not provided to the model. The following strategies are mainly employed:

● Step 1: Have the chat model evaluate the relevance of each 512-word block to the user's query, output "none" if it is considered irrelevant, and output related sentences if it is relevant.

● Step 2: Take out the relevant sentences in step 1, use them as search query terms, and retrieve the most relevant blocks through BM25.

● Step 3: Generate a final answer based on the context retrieved.

Level 3: Step-by-step reasoning

How to solve multi-hop inference. For example, the user types: "What is the means of transportation in the same century as the Fifth Symphony?" The model first needs to be broken down into sub-questions to answer, such as "In what century was the Fifth Symphony composed?" "The bicycle was invented in the 19th century"

This can be solved with a tool call (also known as a function call) agent or a ReAct agent:

Ask a question to the Lv3-agent.

while (Lv3-agent cannot answer questions based on its memory) {

Lv3-agent proposes a new sub-problem to be answered.

Lv3-agent asks this sub-question to Lv2-agent.

Add the response of the Lv2-agent to the memory of the Lv3-agent.

}

Lv3-agent provides the final answer to the original question.

2. Experimental comparison

In order to carry out verification, the following three models were used for experimental comparison:

● 32k-model: 7B dialogue model, fine-tuned primarily on 8k contextual samples, supplemented with a small number of 32k contextual samples

●4k-RAG: The same model as the 32k-model is used, but with a RAG strategy of Lv1-agent

●4k-agent: A model that uses a 32k-model, but uses the more complex agent strategy described earlier

Through the above experiments, we can see that the 4k-agent consistently outperforms the 32k-model and the 4k-RAG, and that it can be more efficient and accurate when combined with the RAG and can be called by the tool. But the advantages of the agent are far more than these, as a key role in the artificial intelligence system, the agent has gradually become an important bridge between humans and machines, and once the agent is ready, it can provide more solutions to many problems.

Future outlook

For the future, we know that the development of agent applications will encounter many challenges, but also opportunities. Every challenge will trigger the integration of new technologies, Robin Li once said: there will be no such profession as a programmer in the future, because as long as you can speak, everyone will have the ability to be a programmer. The author believes that although the Agent is powerful, there is still a long way to go, and the application of the Agent still has a long way to go, but I firmly believe that there will be more Agent applications in the near future, and the Agent application will cover more technologies, and will eventually be integrated into all walks of life to bring greater convenience to human beings.

epilogue

The potential of these technologies and concepts lies in the combination of RAG and agent to form more powerful and agile AI applications by combining the deep language understanding and generation capabilities of large models, the vertical and real-time information retrieval capabilities of RAG, and the decision-making and execution capabilities of agents. Agents are able to improve execution through self-reflection and feedback, while providing observability so that developers can track and understand the agent's behavior, and combined with various tools and RAG technology, they can handle more complex business logic. At the same time, multiple agents can interact synchronously or asynchronously to perform more complex tasks and help build more complex LLM applications.

bibliography

https://qwenlm.github.io/zh/blog/qwen-agent-2405/

https://docs.google.com/presentation/d/1IWjo8bhoatWccCfGLYw_QhUI4zfF-MujN3ORIDCBIbc/edit#slide=id.g2bac367b3d6_0_0