原文：Plaban Nayak Build a Reliable RAG Agent using LangGraph

Build reliable RAG agents with LangGraph

RAG Agent Workflow

introduce

Here, we'll build a reliable RAG agent using LangGraph, Groq-Llama-3, and Chroma. We'll combine the following concepts to build a RAG agent.

Adaptive RAG (paper). We have implemented the concept described in this article by building a router that routes questions to different retrieval methods.
Correction RAG (Paper**). We have implemented the concept described in this article by developing a fallback mechanism to continue if the retrieved context is not relevant to the question being asked.
Self RAG (Paper**). We have implemented the concept described in this article by developing an illusion scorer, i.e., correcting those answers that produce hallucinations or do not answer the question asked.

What is a proxy?

The basic concept behind the agent involves using a language model to select a series of actions. In the chain, this sequence is hardcoded in code. Instead, the agent utilizes the language model as the inference engine to decide which actions to take and the order in which they should be taken.

It consists of 3 components:

Planning: Break down tasks into smaller sub-goals
Memory: Short-term (chat history) / Long-term (vector storage)
Tool usage: Different tools can be leveraged to extend its functionality.

Proxies can be embodied by using Langchain's ReAct concept or by using LangGraph.

Trade-offs between Langchain proxies and LangGraph:

*Reliability*

ReAct / Langchain Proxies: Less reliable because LLMs need to make the right decisions at every step
LangGraph: More reliability because the control flow is already set up and the LLM has specific tasks on each node

*Flexibility*

ReAct / Langchain Proxy: More flexible because the LLM can choose any action sequence
LangGraph: Less flexibility because the action is limited by setting the control flow on each node

*Compatibility with smaller LLMs*

ReAct / Langchain 代理：兼容性较差
LangGraph: Good compatibility

Here, we have created a proxy using LangGraph.

️ 什么是 Langchain？

LangChain is a framework for developing applications powered by language models. It supports the following applications:

Be context-aware: Connect the language model to contextual sources (prompt descriptions, a small number of examples, content, etc.).
Inference: Relying on language models for inference (how to answer a question based on the context provided, what action to take, etc.).

What is LangGraph?

LangGraph is a library that extends LangChain and provides circular computing capabilities for LLM applications. While LangChain supports defining computational chains (directed acyclic graphs or DAGs), LangGraph allows for the inclusion of loops. This allows for more complex, agent-like behavior, where LLMs can be called in loops to determine what action to take next.

Key Concepts:

Stateful Diagram: LangGraph revolves around the concept of a stateful diagram, where each node in the graph represents a step in our computation, and the graph maintains a state that is passed and updated as the computation progresses.
Nodes: Nodes are the building blocks of LangGraph. Each node represents a function or a computational step. We define nodes to perform specific tasks, such as processing inputs, making decisions, or interacting with external APIs.
Edges: Edges connect nodes in the graph to define the flow of the computation. LangGraph supports conditional edges, allowing you to dynamically determine which node to execute next based on the current state of the graph.

Steps to create a graph using LangGraph:

Define Diagram Status: Represents the state of the diagram.
Create a diagram.
Define Nodes: Here we define the different functions associated with each workflow state.
Add Nodes to the Graph: Here, add our nodes to the Graph and define the process using edges and conditional edges.
Sets the entry and end points of the diagram.

What is the Tavily Search API?

The Tavily Search API is an LLM-optimized search engine designed to achieve efficient, fast, and persistent search results. Unlike other search APIs, such as Serp or Google, Tavily focuses on optimizing search to meet the needs of AI developers and autonomous AI agents.

What is Groq?

Groq provides access to high-performance AI models and APIs for developers, with faster inference and lower costs than the competition.

Supported Models

! [img] (https://miro.medium.com/v2/resize

Supported embedding models

img

Workflow for RAG Agent

Depending on the problem, the router decides whether to retrieve context from vector storage or do a web search.
If the router decides to direct the issue to the vector store for retrieval, the matching document is retrieved from the vector store; Otherwise, use tavily-api for web search.
The document grader then rates the document as relevant or irrelevant.
If the retrieved context is rated as relevant, the hallucinations rater is used to check for hallucinations. If the rater decides that the response lacks hallucination, the response is presented to the user.
If the context is rated as irrelevant, a web search is done to retrieve the content.
Once retrieved, the document grader scores the content generated from a web search. If it is found to be relevant, the LLM is used for synthesis and then the response is rendered.

The technology stack used

嵌入模型:BAY/bge-base-en-v1.5
LLM:Llama-3-8B
Vector storage: Chroma
Graph/Agency: LangGraph

Code implementation

Install the required libraries

! pip install -U langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph tavily-python gpt4all fastembed langchain-groq

Import the required libraries

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings

Instantiate the embedding model

embed_model = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

Instantiate the LLM

from groq import Groq
from langchain_groq import ChatGroq
from google.colab import userdata
llm = ChatGroq(temperature=0,
                      model_name="Llama3-8b-8192",
                      api_key=userdata.get("GROQ_API_KEY"),)

Download the data

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
print(f"len of documents :{len(docs_list)}")

Chunk the document to synchronize with the LLM context window

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)
print(f"length of document chunks generated :{len(doc_splits)}")

Load the document into a vector store

vectorstore = Chroma.from_documents(documents=doc_splits,
                                    embedding=embed_model,
                                    collection_name="local-rag")

Instantiate the Retriever

retriever = vectorstore.as_retriever(search_kwargs={"k":2})

Implement a router

import time
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.output_parsers import StrOutputParser
prompt = PromptTemplate(
    template="""system You are an expert at routing a 
    user question to a vectorstore or web search. Use the vectorstore for questions on LLM  agents, 
    prompt engineering, and adversarial attacks. You do not need to be stringent with the keywords 
    in the question related to these topics. Otherwise, use web-search. Give a binary choice 'web_search' 
    or 'vectorstore' based on the question. Return the a JSON with a single key 'datasource' and 
    no premable or explaination. Question to route: {question} assistant""",
    input_variables=["question"],
)
start = time.time()
question_router = prompt | llm | JsonOutputParser()
question = "llm agent memory"
print(question_router.invoke({"question": question}))
end = time.time()
print(f"The time required to generate response by Router Chain in seconds:{end - start}")
#############################RESPONSE ###############################
{'datasource': 'vectorstore'}
The time required to generate response by Router Chain in seconds:0.34175705909729004

Implement the generation chain

prompt = PromptTemplate(
    template="""system You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise user
    Question: {question} 
    Context: {context} 
    Answer: assistant""",
    input_variables=["question", "document"],
)
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
# Chain
start = time.time()
rag_chain = prompt | llm | StrOutputParser()
#############################RESPONSE##############################
The time required to generate response by the generation chain in seconds:1.0384225845336914
The agent memory in the context of LLM-powered autonomous agents refers to the ability of the agent to learn from its past experiences and adapt to new situations.

Implement a retrieval scorer

#
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> 你是一个评分员，评估检索到的文档与用户问题的相关性。如果文档包含与用户问题相关的关键词，将其评分为相关。这不需要是一个严格的测试。目标是过滤掉错误的检索结果。\n
    给出一个二进制分数 'yes' 或 'no' 来指示文档是否与问题相关。\n
    将二进制分数作为一个带有单个键 'score' 的 JSON 提供，不包含任何前言或解释。\n
     <|eot_id|><|start_header_id|>user<|end_header_id|>
    这是检索到的文档：\n\n {document} \n\n
    这是用户问题：{question} \n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """,
    input_variables=["question", "document"],
)
start = time.time()
retrieval_grader = prompt | llm | JsonOutputParser()
question = "agent memory"
docs = retriever.invoke(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))
end = time.time()
print(f"检索评分器生成响应所需的时间（秒）：{end - start}")
############################响应###############################
{'score': 'yes'}
检索评分器生成响应所需的时间（秒）：0.8115921020507812

Implement a hallucination scorer

# 提示
prompt = PromptTemplate(
    template=""" <|begin_of_text|><|start_header_id|>system<|end_header_id|> 你是一个评分员，评估答案是否基于一组事实。给出一个二进制 'yes' 或 'no' 分数来指示答案是否基于一组事实。将二进制分数作为一个带有单个键 'score' 的 JSON 提供，不包含任何前言或解释。 <|eot_id|><|start_header_id|>user<|end_header_id|>
    这些是事实：
    \n ------- \n
    {documents} 
    \n ------- \n
    这是答案：{generation}  <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "documents"],
)
start = time.time()
hallucination_grader = prompt | llm | JsonOutputParser()
hallucination_grader_response = hallucination_grader.invoke({"documents": docs, "generation": generation})
end = time.time()
print(f"生成链生成响应所需的时间（秒）：{end - start}")
print(hallucination_grader_response)
####################################响应#################################
生成链生成响应所需的时间（秒）：1.020448923110962
{'score': 'yes'}

Implement an answer grader

# 提示
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> 你是一个评分员，评估答案是否有助于解决问题。给出一个二进制分数 'yes' 或 'no' 来指示答案是否有助于解决问题。将二进制分数作为一个带有单个键 'score' 的 JSON 提供，不包含任何前言或解释。
     <|eot_id|><|start_header_id|>user<|end_header_id|> 这是答案：
    \n ------- \n
    {generation} 
    \n ------- \n
    这是问题：{question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "question"],
)
start = time.time()
answer_grader = prompt | llm | JsonOutputParser()
answer_grader_response = answer_grader.invoke({"question": question,"generation": generation})
end = time.time()
print(f"答案评分器生成响应所需的时间（秒）：{end - start}")
print(answer_grader_response)
##############################响应###############################
答案评分器生成响应所需的时间（秒）：0.2455885410308838
{'score': 'yes'}

Implement a web search tool

import os
from langchain_community.tools.tavily_search import TavilySearchResults
os.environ['TAVILY_API_KEY'] = "YOUR API KEY"
web_search_tool = TavilySearchResults(k=3)

Define Diagram Status: Represents the state of the diagram.

Define the following properties:

issue
Build: LLM generated
Web Search: Whether to add a search
Documents: A list of documents

from typing_extensions import TypedDict
from typing import List
### 状态
class GraphState(TypedDict):
    question : str
    generation : str
    web_search : str
    documents : List[str]

img

Define the node

from langchain.schema import Document
def retrieve(state):
    """
    从向量存储中检索文档
    Args:
        state (dict): 当前图状态
    返回：
  state (dict): 新增了一个名为 documents 的键到 state 字典中，其中包含检索到的文档
  “”“
  print("---RETRIEVE---")
    question = state["question"]
    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}

def generate(state):
    """
    使用 RAG 在检索到的文档上生成答案
    Args:
        state (dict): 当前图状态
    Returns:
        state (dict): 新增了一个名为 generation 的键到 state 字典中，其中包含 LLM 生成的内容
    """
    print("---生成---")
    question = state["question"]
    documents = state["documents"]
    # RAG 生成
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}

def grade_documents(state):
    """
    确定检索到的文档是否与问题相关
    如果任何文档不相关，我们将设置一个标志来运行网络搜索
    Args:
        state (dict): 当前图状态
    Returns:
        state (dict): 过滤掉不相关文档并更新 web_search 状态
    """
    print("---检查文档是否与问题相关---")
    question = state["question"]
    documents = state["documents"]
    # 对每个文档进行评分
    filtered_docs = []
    web_search = "否"
    for d in documents:
        score = retrieval_grader.invoke({"question": question, "document": d.page_content})
        grade = score['score']
        # 文档相关
        if grade.lower() == "是":
            print("---评分：文档相关---")
            filtered_docs.append(d)
        # 文档不相关
        else:
            print("---评分：文档不相关---")
            # 我们不将文档包括在 filtered_docs 中
            # 我们设置一个标志来指示我们要运行网络搜索
            web_search = "是"
            continue
    return {"documents": filtered_docs, "question": question, "web_search": web_search}

def web_search(state):
    """
    基于问题进行网络搜索
    Args:
        state (dict): 当前图状态
    Returns:
        state (dict): 将网络搜索结果附加到文档中
    """
    print("---网络搜索---")
    question = state["question"]
    documents = state["documents"]
    # 网络搜索
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    if documents is not None:
        documents.append(web_results)
    else:
        documents = [web_results]
    return {"documents": documents, "question": question}

Define the conditions of the edge

img

def route_question(state):

"""

Route the issue to Web Search or RAG.

Args:

state (dict): 当前图状态

Returns:

str: The next node to be called

"""

print("--- Routing Issues---)

question = state["question"]

source = question_router.invoke({"question": question})

if source['datasource'] == 'web_search':

print("--- route the issue to the web search ---")

return "websearch"

elif source['datasource'] == 'vectorstore':

print("--- route the issue to RAG---)

return "vectorstore"

def decide_to_generate(state):

"""

Decide whether to generate an answer, or add a web search

Args:

state (dict): 当前图状态

Returns:

str: The binary decision of the next node to be called

"""

print("---评估评分文档---")

question = state["question"]

web_search = state["web_search"]

filtered_documents = state["documents"]

if web_search == "Yes":

# All documents have been filtered to check for relevance

# We will generate a new query

print("--- decision: all documents are not related to the issue, including web search ---")

return "websearch"

else:

# We have the relevant documentation, so generate the answer

print("---决策：生成---")

return "generate"

img

def grade_generation_v_documents_and_question(state):
    """
    确定生成的内容是否基于文档并回答问题。
    Args:
        state (dict): 当前图状态
    Returns:
        str: 下一个要调用的节点的决策
    """
    print("---检查幻觉---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]
    score = hallucination_grader.invoke({"documents": documents, "generation": generation})
    grade = score['score']
    # 检查幻觉
    if grade == "yes":
        print("---决策：生成内容基于文档---")
        # 检查问答
        print("---生成内容评分 vs 问题---")
        score = answer_grader.invoke({"question": question,"generation": generation})
        grade = score['score']
        if grade == "yes":
            print("---决策：生成内容回答了问题---")
            return "有用的"
        else:
            print("---决策：生成内容未回答问题---")
            return "无用的"
    else:
        pprint("---决策：生成内容不基于文档，重新尝试---")
        return "不支持的"

Add a node

from langgraph.graph import END, StateGraph
workflow = StateGraph(GraphState)
# 定义节点
workflow.add_node("websearch", web_search) # 网络搜索
workflow.add_node("retrieve", retrieve) # 检索
workflow.add_node("grade_documents", grade_documents) # 评分文档
workflow.add_node("generate", generate) # 生成

Set the entry point and end point

workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "websearch",
        "vectorstore": "retrieve",
    },
)
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "websearch": "websearch",
        "generate": "generate",
    },
)
workflow.add_edge("websearch", "generate")
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "websearch",
    },
)

Compilation workflow

app = workflow.compile()

Test workflows

from pprint import pprint
inputs = {"question": "什么是提示工程？"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"运行完成：{key}:")
pprint(value["generation"])
########################回应##############################
---路由问题---
什么是提示工程？
{'数据源': '向量存储'}
向量存储
---路由问题到RAG---
---检索---
'运行完成：检索:'
---检查文档与问题的相关性---
---评分：文档相关---
---评分：文档相关---
---评估已评分的文档---
---决策：生成---
'运行完成：评分文档:'
---生成---
---检查幻觉---
---决策：生成内容基于文档---
---生成内容评分 vs 问题---
---决策：生成内容回答了问题---
'运行完成：生成:'
('提示工程是指通过与大型语言模型交流来引导其行为以实现期望的结果，而无需更新模型权重。这是一门需要大量实验和启发式的经验科学。')

Test workflows for different questions

app = workflow.compile()
# 测试
from pprint import pprint
inputs = {"question": "熊队在NFL选秀中预计首轮选秀谁？"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"运行完成：{key}:")
pprint(value["generation"])
#############################回应##############################
---路由问题---
熊队在NFL选秀中预计首轮选秀谁？
{'数据源': '网络搜索'}
网络搜索
---路由问题到网络搜索---
---网络搜索---
'运行完成：websearch:'
---生成---
---检查幻觉---
---决策：生成内容基于文档---
---生成内容评分 vs 问题---
---决策：生成内容回答了问题---
'运行完成：生成:'
('根据提供的背景，芝加哥熊队预计将在NFL选秀中用第一顺位选秀南加州大学的四分卫Caleb Williams。')

Test workflows for different questions

app = workflow.compile()
#
inputs = {"question": "代理记忆有哪些类型？"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"运行完成：{key}:")
pprint(value["generation"])
###########################回应############################
---路由问题---
代理记忆有哪些类型？
{'数据源': '向量存储'}
向量存储
---路由问题到RAG---
---检索---
'运行完成：检索:'
---检查文档与问题的相关性---
---评分：文档相关---
---评分：文档不相关---
---评估已评分的文档---
---决策：所有文档与问题不相关，包括网络搜索---
'运行完成：评分文档:'
---网络搜索---
'运行完成：websearch:'
---生成---
---检查幻觉---
---决策：生成内容基于文档---
---生成等级与问题---
---决策：生成解决问题---
'完成运行：生成：'
('文本提到以下类型的代理记忆：\n'
 '\n'
 '1. 短期记忆（STM）或工作记忆：它存储代理当前意识到并需要执行复杂认知任务所需的信息。\n'
 '2. 长期记忆（LTM）：它可以存储信息长达数天至数十年，具有基本无限的存储容量。')

Visualize agents/diagrams

!apt-get install python3-dev graphviz libgraphviz-dev pkg-config
!pip install pygraphviz
from IPython.display import Image
Image(app.get_graph().draw_png())

img

conclusion

LangGraph is a flexible tool designed to leverage LLMs to build complex, stateful applications. Beginners can take advantage of its features for project development by mastering its fundamentals and engaging in basic examples. The focus is to focus on managing state, handling conditional edges, and ensuring that there are no dead end nodes in the graph.

In my opinion, this is more beneficial than ReAct proxies because we have complete control over the workflow instead of letting the agents make decisions.

Language Models 112

Language Models · directory

PreviousArtificial intelligence agents take over the beginning of computer tasks completed by humansNextA comprehensive guide to fine-tuning language models (LLMs): mimicking the writing style of a researcher

Build reliable RAG agents with LangGraph