laitimes

My exploration, practice and thinking in the direction of RAG in the application of large models

author:JD Cloud developer

Opening

I am Sun Lin, 2021-JD.com-Ph.D. Management Trainee, Ph.D. from the School of Software of Tsinghua University, submitted 5 patents during my work, and won the title of Beijing Yiqilin Outstanding Talent. Currently, I am working as a data development engineer in the R&D department of the algorithm middle platform, focusing on the application field of retrieval enhancement generation.

This article will introduce the background, core work, business practices and feedback, and future prospects.

Background

Large language models (LLMs) have made major breakthroughs in natural language processing and natural language understanding. The combination of large models and application scenarios can help to reduce costs while increasing efficiency. In the implementation of specific scenarios, the large model of the general domain lacks specific domain knowledge and needs to be fine-tuned, which will consume a lot of computing resources.

At present, Retrieval Enhanced Generation (RAG), as a mode of large language application, can combine the strong comprehension ability and domain knowledge of large language models, which can improve the accuracy and efficiency of the model. The main process of RAG is divided into two steps:1. Retrieve problem-related content from the knowledge base; 2. Splice relevant knowledge into prompts, and let LLMs answer based on relevant knowledge and user questions. Here's an example of a RAG prompt:

你是京东一名资深的商家助理,专注解答用户编成时候遇到的问题。请基于 '---' 之间的相关参考内容对用户的问题进行回答。

相关参考内容:
---
1. 入驻京东万商平台店铺公司资质要求如下:
营业执照:加载“统一社会信用代码”的营业执照,(需确保未在企业经营异常名录中且所售商品在营业执照经营范围内)
企业法人身份证:公司法人身份证正反面,有效期大于60天。
2. 入驻京东万商平台,经营类目为一级类目(京喜供应链中心)鞋包服饰,需要提交品牌资质。
--- 
用户问题:入驻京东万商平台店铺,公司需要什么资质
注意以下要求:
1. 在回答时,尽可能参考原文
2. 若无法提供回答,请回复联系人工客服
           

In the above example, the large model can give accurate answers to the company's qualification questions due to the addition of relevant knowledge, and RAG can use vertical knowledge more effectively than using LLMs to answer directly. In general, the main processes of RAG are as follows:

My exploration, practice and thinking in the direction of RAG in the application of large models

Due to the advantages of RAG, such as interpretability, no dependence on model fine-tuning, and ability to adapt to diverse application requirements, there are many solutions with RAG as the core on the market, mainly including frameworks and applications:

  • Frameworks: SDKs for developers. Users need to connect with different model resources and build their own application processes. The degree of customization is high, but it is difficult to get started. Related frameworks such as langchain, LlamaIndex, promptflow, etc
  • Application: Out-of-the-box, most of them are 2C-like knowledge assistant applications, and the general process is for users to upload documents (knowledge base), and then end-to-end Q&A can be conducted based on the knowledge base (usually, the built-in Q&A process of different applications has some differences in key links, such as recall strategy, whether to use Agent, etc.). Related applications such as Dify, Youdao QAnything, Byte Coze, etc.

In working with business parties, we have found that business parties often have a high degree of customization requirements. Existing frameworks and application solutions cannot be quickly used to solve application requirements in batches, such as:

  • Niche business side: There are no algorithm developers, only care about business logic, and hope that the platform will provide storage, computing power, and policies, and build high-availability services based on application data.
  • Multiple-input (MIMO): It is multiple-input multiple-output in specific scenarios, which is incompatible with mainstream RAG links.
  • Manual rapid intervention: return specific results under receiving specific input from users to ensure model reliability;
  • Data link closed-loop: In addition to data management, there is also a need for input and output management pages for post-event effect evaluation, BAD-case analysis and effect optimization.
  • High-quality data export: used to fine-tune the model to achieve higher accuracy;
  • Isolation of development and production: Models, data, and interface services need to be distinguished between the development environment and the production environment.
  • Other needs...

In this context, we created the RAG platform from scratch, hoping to provide full-link end-to-end Q&A capabilities based on large models through the capabilities of the platform.

  • For users who don't need to customize the process: provide a knowledge assistant application to answer questions through the platform's built-in default RAG logic;
  • For users who need customized requirements: provides resource management and process orchestration capabilities, so that users can more easily combine business logic for secondary development.
My exploration, practice and thinking in the direction of RAG in the application of large models

The core work of technological breakthroughs

The main framework of the RAG platform is shown in the figure below

My exploration, practice and thinking in the direction of RAG in the application of large models

Service resources are connected

From the perspective of the platform, service resources include data storage services, model invocation services, and model deployment services. From the user's point of view, the user does not care about the service, the user only cares: "I use the large model to answer my data", in order to achieve this demand, it is necessary to open up different service resources within the JD system

  • Storage resources: Open up the JD Vearch vector library to provide similar text retrieval, data filtering and other capabilities;
  • Large language model/embedding model: open up the group's large model gateway, provide built-in large language models on the platform, and support users to call self-deployed models through EA;
  • Service deployment: After you build a custom pipeline, you can publish it for use in the production environment with one click.
  • Computing resources: Users can fine-tune models and seamlessly replace the original models through the platform.

Large language model pipeline construction

My exploration, practice and thinking in the direction of RAG in the application of large models

For example, the following code framework shows how to build a custom RAG process through componentization:

rag = Pipeline()
rag.add_component(Input("in", input_keys=["query"]))
rag.add_component(VectorStore("vectorstore"))
rag.add_component(Prompt("prompt", preset="PlainRAG"))
rag.add_component(ChatModel("llm"))
rag.add_component(Output("output"))

rag.connect("in.query", "vectorstore")
rag.connect("in.query", "prompt.question")
rag.connect("vectorstore", "prompt.context")
rag.connect("prompt", "llm")
rag.connect("llm", "output")

rag.deploy()
           

To build a pipeline in a component-based manner, you only need to define the connection between blocks. Compared with building pipelines based on open source frameworks, this method can make users focus on business processes, which greatly reduces the threshold for users to use custom processes. Currently, the platform has built-in support for the following component capabilities:

  • Input and output components: support custom multi-input/multi-output;
  • Knowledge base component: Fuzzy matching and keyword matching are supported to recall similar content.
  • Large model component: provides an interface for large model access;
  • Prompt component: provides default Prompt templates and custom Prompt capabilities.
  • Python function components: users can build any custom function blocks through Python functions;
  • Branch components: support to run specific sub-processes with specific outputs;
  • Agent component: provides agent capabilities (such as ReAct).
  • One-click deployment: Pipelines can be run locally and deployed with one-click deployment, providing access APIs.

Kanban & Effect Optimization

At present, one of the pain points of users is that after building the RAG process, the effect cannot be optimized. To achieve effect tuning, it mainly includes the following aspects:

  • End-to-end data backflow: B-end users usually collect service history to check the quality of service. For a request, the platform saves the intermediate state of the runtime pipeline, and the user can backtrack what results were obtained at each step for further analysis. Through the complete runtime support for intermediate data tracking, the data of the whole link can be collected;
  • Data engineering: "garbage in, garbage out" is also applicable to this scenario, and data engineering is a general direction. From the perspective of data types, the platform supports a variety of data types such as TXT, DOCX, PDF, OSS files, etc., from the perspective of segmentation strategies, the platform supports strategies such as recursive segmentation and fixed-length segmentation, and from the perspective of data enhancement, the platform supports QA extraction and semantic understanding.
  • Key components/capability optimization: At present, there are a variety of strategies used to improve the effect of RAG, and the platform precipitates the optimization strategy into basic components for users to quickly call, such as providing semantic understanding and step disassembly before retrieval, providing capabilities such as dialogue retrieval and self-query during retrieval, and capabilities such as tag filtering and rearrangement after retrieval.
  • Routing: provides a cached routing module to quickly intervene in the configured Q&A.
  • Evaluation system & model iteration: One of the main reasons why the effect of traditional scenarios cannot be improved is that after providing end-to-end Q&A services, we don't know what is good and what is bad. Through the full-link data return and evaluation system, the platform can automatically trigger the fine-tuning of key models such as embedding and LLM, so that effect optimization can be automated.

Business Practices & Feedback

At present, the RAG platform has served a number of projects, some of which are listed below:

  • AI assistant application for merchants in Mall B (23 years dark horse second prize project): solves the problems of business, data, and process between platform merchants and front-line personnel, and has been put into use in multiple business lines to provide services to thousands of stores. I provided back-end RAG services for this project and received letters of appreciation from project partners. The relevant core links are:
  • Commodity model standardization: Based on the model in the standard model library, the JD model of outsourcing cleaning is matched to avoid entanglement caused by inconsistent product models. The model normalization efficiency has been increased from 400 SKUs/person-day to 750 SKUs/person-day, an increase of 87%, and the project partner has received a letter of thanks.
  • Knowledge assistant application: provide services to C-end users and provide out-of-the-box product pages. This quarter, the existing users of Knowledge Assistant were migrated, supporting about 7,000 daily active users and 2-3w daily visits.
  • Other: Omitted.

Future outlook

The development of large models can be implemented in multiple business scenarios, and RAG has been validated in a variety of applications because it allows LLMs to have richer knowledge. On this basis, the agent has a certain ability to "observe and think" and the ability to call tools, which will enrich the LLM capabilities in the future. In the future, I will devote myself to improving the implementation effect of RAG business and exploring the value of single/multi-agent in the business. On this basis, combined with JD's internal application scenarios, we will create more easy-to-use platform capabilities, and quickly reuse basic capabilities for different businesses, so as to improve user development efficiency and build the ability to quickly serve end users.

Read on