laitimes

Harrison Chase:独创 AI 智能体「认知架构」,定制+极简加减法双驱动

Make a customized cognitive framework and sell the best "beer".

Author丨Liu Jie

Editor丨Cen Feng

In early July, OpenAI published a blog post that swept the world with the craze for AI agents, and the industry's interest in AI agents reached a new level.

Agents are seen as another hot topic after the big model. For example, in an exclusive interview with Rui Yong, CTO of Lenovo, in the "Ten Talks on Embodied Intelligence" column before the AI Technology Review, Rui Yong believes that the trilogy of AI development is from small models to large models and then to agents. This is also the reason why Lenovo paid attention to OpenAI early on and quickly followed up on agent research.

First of all, let's understand the question: What is the difference between AI agents and the AI assistants we are familiar with?

On the surface, they all seem to be tools that help us get things done. But Harrison Chase, the founder of LangChain, tells us that the difference is actually quite significant.

An AI agent is an autonomous entity that can observe its surroundings and take actions to achieve its goals. In layman's terms, it is a subject with AI capabilities, which can be hardware or software, but it is generally a software program, such as LangChain.

LangChain is an open-source framework, and what makes it special is that you can quickly build AI applications with just a few lines of code. This makes creating complex agents as easy as building blocks.

After Harrison Chase founded his eponymous company, LangChain, he also launched LangGraoh and LangSmith to solve more complex problems.

The question arises: Are these agents really more powerful than AI assistants?

Harrison Chase's point is clear. He believes that the core of AI assistants is to assist humans in making decisions, while the core of agents is to act autonomously and make decisions independently.

The AI assistant is like a helper in the passenger seat, helping you get directions and pick up things; The AI agent, on the other hand, is the driver, who can decide the route and speed on his own, work independently, and handle a series of tasks without the need for human guidance at every step.

Early AI agents, such as BabyAGI and AutoGPT, have been questioned as hype versions of AI assistants that change the soup without changing the medicine. Because their tasks are too general and lack clear rules, what companies really need is agents that can be tailored to their specific needs.

Another concept that is currently in vogue is "embodied intelligence", which itself is an agent that has a body and supports physical interaction. LangChain agents are powerful components designed to augment LLM capabilities, enabling them to make decisions and take actions that enable more advanced forms of intelligence.

Harrison Chase refers to the entire process of processing and circulating information in calls from user input to output as "cognitive architecture", and said that customized cognitive architecture allows AI agents to perform the same tasks repeatedly as needed, automate a large number of tedious transactions, and achieve the ultimate simplification of user operations.

Of course, AI agents can do more than just assembly line work, while helping users to do tedious work subtraction, Harrison Chase also pays special attention to user experience, and makes new additions through customization.

On the one hand, AI intelligent bodies can interact with users and give users more intimate private customized services, and on the other hand, they can also be continuously optimized according to user feedback, and the more they are used, the more intelligent they are, so that users can completely let go of the AI agents to deal with.

However, for those enterprises whose goal is to do a common cognitive architecture, there is no need to bother to improve the level of cognitive architecture. Only those companies that focus on the needs of customization need to make great efforts to build their own power generation systems in order to make their beer taste better, just like the brewers of the last century.

Currently, research on AI agents is still in its infancy, with Princeton's study showing that their agents solve 12.5% of GitHub problems, compared to 3.8% when relying on retrieval enhanced generation (RAG).

But Harrison Chase is very bullish on the potential of AI agents in terms of customer support and coding, especially coding.

With the help of mature AI agents, anyone can become a software development engineer.

A designer who can't write code can tell an AI agent that they want an application with a specific function, and the agent can automatically generate code based on the needs and turn the idea into reality. This will revolutionize the way we work and create.

Harrison Chase believes that the future of work will no longer be bothered by daily chores, but will allow AI agents to take on heavy lifting, and people will only need to focus on creating and enjoying life.

In Sequoia Capital's podcast, Harrison Chase also synthesizes technology and products to share more of his insights on the training, evolution, and future prospects of AI agents.

The full podcast content can be listened to with one click by clicking on the following link, and the AI Technology Review has also made a condensed version of the podcast content without changing the original meaning, and sorted out the text version for everyone:

https://www.sequoiacap.com/podcast/training-data-harrison-chase/

1

The development of AI agents

Sonya Huang: Agents are a topic that everyone is paying a lot of attention to at the moment. Since the rise of LLMs (large language models), you've been at the forefront of agent building. Can you tell us a little bit about the definition of an agent?

Harrison Chase: It's tricky to define an agent. People may have a different understanding of it, and that's normal because we're still in the early stages of LLM and agent-related development.

My personal understanding is that the agent is the LLM that determines the control flow of the application.

For example, in a traditional RAG (Retrieval Enhanced Generation) chain, the process is preset: a search query is generated, a document is retrieved, an answer is generated, and finally fed back to the user.

The agent, on the other hand, puts the LLM at the center and lets it decide what to do next. Sometimes it initiates a search, sometimes it responds directly to the user, and may even query multiple times until it comes up with an answer. LLMs dynamically determine the entire process.

The use of tools is also an important feature of an agent. When an LLM decides to act, it usually calls a different tool to do so. In addition, memory is key, as the LLM needs to remember previous actions when it determines the next step.

At its core, the core of an agent is to let the LLM determine the control flow of the application.

Pat Grady: A lot of what you mentioned has to do with "decision-making", and I wonder if an agent is just a way of acting? Do the two complement each other? Is the agent's behavior more biased towards one direction?

Harrison Chase: I think they really go hand in hand. Many of the agent's behaviors are essentially deciding how to act, and the difficulty in this process is finding the right action. Therefore, solving the problem of "decision-making" usually also solves the problem of "action". Once the decision is made, the LLM system performs the appropriate action and feeds back the results.

Sonya Huang: The main difference between agents and chains is that LLMs autonomously decide the next step, rather than pre-set steps. Is this distinction accurate?

Harrison Chase: yes, that's a good description. Actually, though, there are different levels. For example, a simple router might do the path selection in the chain, and while the LLM is still making the decision, this is only the basic application. Completely autonomous agents are at the other extreme. Overall, there are some nuances and grey areas.

Sonya Huang: See, it's interesting that agents range from partial control to fully autonomous decision-making. What role do you think LangChain plays in the agent ecosystem?

Harrison Chase: Our focus right now is to make it easier for people to create agents in between. We found that the most effective agents are often located in this middle ground. While fully autonomous agents are attractive and have prototypes, they often deviate from expectations. Therefore, our work is focused on the "orchestration layer" in order to build agents that are flexible but still have some constraints. If you want to dig deeper, we can talk about it again. But overall, LangChain's vision is to be an orchestration framework.

Sonya Huang: I remember around March 2023, autonomous agents like BabyAGI and AutoGPT got a lot of attention, but their first iterations didn't seem to live up to expectations. What do you think is the reason? What is the stage of the agent hype cycle now?

Harrison Chase: It's true that the advent of AutoGPT started a cycle of hype for agents, especially on GitHub. This craze lasts from spring 2023 into summer, after which it cools down slightly. In 2024, we'll start to see some useful applications, such as LangChain's partnership with Elastic to launch production-grade agents like Elastic Assistant and Elastic Agent. These applications, such as Klarna's customer support bots, have sparked more discussion. In addition, companies such as Devon and Cira are also experimenting in the field of agents.

As for the reasons why AutoGPT didn't fully succeed, I think the main reason is that they are too general and lack clear tasks and rules. Businesses expect agents to do more specific work, not just vague autonomous agents. As a result, we're seeing agents more like custom cognitive architectures, which are flexible, but require more engineering investment and development time, which is why these systems didn't exist a year ago.

2

Customize the cognitive framework

Sonya Huang: You mentioned "cognitive architecture" earlier, and I like the way you think about it. Can you explain, what is cognitive architecture? How should we understand it? Is there a proper frame of mind?

Harrison Chase: yes, I understand cognitive architecture, which basically refers to what your system architecture looks like when you're using a large language model (LLM).

If you're building an app with multiple algorithmic steps, how do you take advantage of those algorithms? Do you use them to generate final answers? Or use them to choose between different tasks? Are there very complex branches, or even multiple loops?

These are all different manifestations of cognitive architecture. Cognitive architecture actually refers to how the LLM processes and flows information during the invocation process, from user input to output.

Especially when putting an agent into production, we find that the process is often tailored to the specific application needs.

For example, an application may require some specific checks and then several steps, each of which may contain loops or branches. It's like you're drawing a flowchart, and this kind of customized process is becoming more and more common because people want agents to be more controllable in their applications.

I call it "cognitive architecture" because the core strength of an LLM is its reasoning power, and you can code this cognitive mental model into some kind of architecture in a software system.

Pat Grady: Do you think that's the way forward? I heard two things, one was that it was very customized, and the other was that it sounded more like hardcode. Do you think this is our current direction, or is it a temporary solution? Will there be more elegant architectures in the future, or a set of standardized reference architectures?

Harrison Chase: That's a great question, and I've spent a lot of time thinking about this. I think in extreme cases, if the model is very robust and reliable in planning, you might just need a simple for loop, repeatedly call the LLM to decide what to do next, and then do the action and loop again.

All the constraints you want your model to follow can be communicated through prompts, and the model will perform the way you expect. While I believe that models will get better and better at reasoning and planning, I don't think they will completely replace manually built architectures.

The first is the issue of efficiency. If you know that a certain step always needs to be executed after another, then you can simply arrange them in order.

The second is reliability, especially in an enterprise environment, where people need some assurance that critical steps are performed as intended.

So I think while it might be easier to build these architectures, they're still going to have some complexity.

From an architectural perspective, you can think of "running LLMs in loops" as a very simple but generic cognitive architecture. What we see in actual production is more of a customized, complex architecture.

I feel that over time, the general planning and reflection functions will be trained directly into the model, but the planning, reflection, and control functions that require a high degree of customization will still not be replaced.

Sonya Huang: It can be understood this way: LLMs can do general-purpose agent reasoning, but in specific domains, you also need customized reasoning capabilities. These are not fully built into a generic model.

Harrison Chase: Exactly. The core idea of a custom cognitive architecture is that you let humans take responsibility for planning, rather than relying solely on LLMs.

While some planning features may get closer to models and prompts, the planning process for many tasks is still complex and cannot be fully automated. It will take time to develop a highly reliable, plug-and-play solution.

3

User experience design

Sonya Huang: I believe that agents will become the new trend of artificial intelligence, and we are moving from AI assistants to AI agents. Do you agree? Why?

Harrison Chase: I basically agree. The potential of agents lies in the fact that traditional AI assistants rely on human input and have limited task capabilities. Agents, on the other hand, are able to act more independently, interacting with the user occasionally, which allows them to handle more tasks autonomously.

But giving them more autonomy also comes with risks, such as the possibility of bias or errors. Therefore, finding a balance between autonomy and reliability will be an important challenge.

Pat Grady: You mentioned user experience on AI Ascent. Often, we think of it as sitting on opposite ends of the spectrum with architecture – architecture is working behind the scenes, and user experience is the front-end showcase.

But now it seems that things are different, and the user experience can actually affect the effectiveness of the architecture. For example, when something goes wrong, you can go back in time to what went wrong in the planning process, as Devin does.

Can you talk about the importance of user experience in agents or LLMs? Also, what are some interesting developments that you find interesting?

Harrison Chase: User experience is very important today because LLMs are not perfect and often go wrong. The chat mode is particularly effective, allowing users to see the model's reactions in real-time and correct mistakes or ask for details in a timely manner. Although this model has become mainstream, its limitation is that it still requires continuous feedback from users, and is more of an "assistant" experience.

Reducing user involvement and allowing AI to automate more tasks would be a game-changer.

However, finding the right balance between automation and user engagement is a challenge. Some interesting ideas are trying to solve this problem. For example, create an agent transparency list to give users a clear view of each step the AI performs. If a step goes wrong, the user can directly backtrack and adjust the instruction.

Another innovative idea is to introduce an "inbox" experience where agents run in parallel in the background, alerting the user when human help is needed, just like sending an email, so that the user can step in at the right time without having to monitor the whole process.

In terms of collaboration, the agent can draft the document first, and the user can provide feedback as a reviewer. The real-time interactive experience is also engaging.

For example, when a user is commenting, the agent is able to fix the problem immediately, just like in Google Docs. This interaction enhances the user experience and makes AI a truly effective work partner.

Pat Grady: It's really interesting that you mentioned about how agents learn from interactions. If I have to give the same feedback over and over again every time, the experience becomes terrible, right? How can the system improve this feedback mechanism?

Harrison Chase: Definitely! If we keep giving the agent the same feedback, and it doesn't improve, it will undoubtedly be frustrating. Therefore, the architecture of the system needs to be able to learn from this feedback, not only to fix the current problem, but also to gain experience and avoid future recurrence.

While it's still early days, we've spent a lot of time thinking about these issues and believe that as technology advances, Intelliers will become more and more "smart", resulting in a smoother user experience.

Make beer even better

Sonya Huang: In the last six months, there has been significant progress in the field of agents. Princeton's research shows that their agents solve 12.5% of GitHub issues, compared to 3.8% when relying on retrieval enhanced builds (RAGs).

Despite the progress, 12.5% is still not enough to replace an intern. What stage do you think the development of agents has reached? Can they be reliably deployed in a customer-facing environment?

Harrison Chase: Yes, SWE agents are relatively versatile and can handle a wide range of GitHub issues. The reliability of the custom agent, while not "99.999%", is sufficient for use in a production environment. For example, Elastic's agents have been applied in several projects. While I don't have specific reliability data, they are reliable enough to go live. General-purpose agents face greater challenges and require longer context windows and better inference capabilities for widespread application.

Sonya Huang: You mentioned technologies such as Chain of Thought, can you share the impact of cognitive architecture on agent performance? What do you think is the most promising cognitive architecture?

Harrison Chase: One reason why projects like AutoGPT didn't succeed was that early LLMs couldn't explicitly reason what to do in the first step. Techniques such as chain of ideas provide a better inference space for the model.

Yao's ReAct paper is one of the first cognitive architectures dedicated exclusively to agents. ReAct combines inference and action, allowing the model to not only perform actions but also perform inferences, improving its capabilities. Now, as the model is trained, the explicit inference step becomes less necessary.

The main challenge is long-term planning and execution, where models are not performing well and require a cognitive architecture to help generate plans and execute them incrementally. Reflection helps determine whether the task has been completed or not.

In general, planning and reasoning are currently the most important general cognitive architectures, and these problems will be better solved in the future as training improves.

Sonya Huang: You mentioned that Jeff · Bezos said "focus on making your beer better." This reminds me of the early days when many breweries chose to generate their own electricity. Many companies today face a similar question: Do they need to control the cognitive architecture to improve their business? Does building and optimizing these architectures really "make your beer better", or should we give up control and focus on user interface and product development?

Harrison Chase: It depends on the type of cognitive architecture you're building. If it's a generic architecture, it may not directly improve the business. In the future, model providers will focus on common planning and cognitive architectures that can be used directly by enterprises to solve problems. But if it's a highly customized architecture that reflects specific business processes or best practices, it can really improve the business, especially in areas that rely on these applications.

Customized business logic and cognitive models can significantly improve system performance, making personalization more accurate and efficient. While user experience and interface design are still important, it's clear that custom agents are an important advantage for businesses. I think there's a big difference between generic and custom.

4

Orchestration and observability

LangSmith and LangGraph

Sonya Huang: Can we talk about LangSmith and LangGraph? What problems did you solve? Especially in terms of agent management, how can your products help people better manage state and improve agent controllability?

Harrison Chase: Absolutely. The launch of LangChain solves key problems, in particular the standardization of interfaces for individual components. This allows us to integrate extensively with a wide range of models, vector stores, tools, and databases, which is a big reason why LangChain is so popular.

LangChain also offers a range of advanced interfaces that make it easy for users to use features such as RAG (Retrieval Enhanced Generation) and SQL Q&A, while dynamically building chains can run for a short period of time. It's important that we think of these "chains" as directed acyclic graphs (DAGs).

LangGraph solves the problems associated with customizable and controllable circular elements. Loops introduce new challenges, such as designing a persistence layer to restore state and have loops run asynchronously in the background. Therefore, we focus on how to effectively deploy long-term, cyclical, and human-computer interaction applications.

About LangSmith, we've been working on it since the company's inception, focusing on observability and testing of LLM applications.

We've found that the inherent uncertainty of LLMs at the core makes observability and testing especially important to ensure they can go into production with confidence. LangSmith is designed to work seamlessly with LangChain.

In addition, LangSmith provides a tip center to help users manage and manually review prompts. This is especially important throughout the process, as we need to be clear about what is new in the LLM output.

Observability is a distinguishing feature of LLMs, and the complexity of testing is increasing. As a result, we want people to review content more often, not just traditional software testing. LangSmith provides the tools and routing to address these challenges.

observability

Pat Grady: Do you have a heuristic way to evaluate existing observability, testing, and fill-in-the-blank to see how well they work for LLMs? What are the characteristics that make the existing LLM so different from the previous model that you need to develop new products, architectures, or approaches?

Harrison Chase: yes, that's really a question to think about. Especially when it comes to observability and testing, the complexity of LLMs makes it imperative for us to innovate. While a tool like Datadog is great for monitoring, for in-depth analysis of multi-step applications, LangSmith provides more granular trace analysis to help better debug and handle the uncertainty of LLMs.

The testing aspect is also interesting. In traditional software testing, it's common to focus only on whether the results pass or not, without pairwise comparisons. However, in LLM evaluation, tools like LLMSYS allow for side-by-side comparison of two models, which is especially critical in LLM testing.

Another challenge is that you won't always have a 100% pass rate in LLM tests, so it's important to track your progress and make sure you're constantly improving, not regressing. Compared with the pass/fail judgment of traditional tests, LLM tests require more detailed tracking and analysis.

Finally, human involvement is crucial. As much as we want the system to run automatically, human intervention tends to be more reliable. This is very different from the simple equation verification in software testing, where we need to introduce human judgment to make the test more accurate and flexible.

5

The future of software development

Pat Grady: Before I dive into the details of agent building, I'd like to ask a question. Our founder, Don · Valentine, famously asked, "So what?" "What if autonomous agents work perfectly? What impact does this have on the world? How will our lives be different?

Harrison Chase: At a high level, that means that we humans will be able to focus on different things.

At this stage, many industries rely on repetitive, mechanical tasks, and the idea of agents is to automate much of it, allowing us to focus on higher-level problems. We can use the output of the agent for more creative and highly leveraged work, like many functions in the company's operations can be outsourced to the agent.

You can imagine yourself in the role of CEO, while agents take care of other functions like marketing, sales, etc., automating a lot of repetitive tasks and giving you more time for strategic thinking or product development. This will give us the freedom to do what we are good at and what we are interested in, and get rid of the mechanical work that we are not very willing to do.

Pat Grady: Have you seen any real-world examples or any interesting projects in development?

Harrison Chase: The two most focused agent areas right now are customer support and coding.

Customer support is a great example of how many companies need to outsource this type of service, and it can be very powerful for an agent to efficiently replace this part of the work.

As for coding, it's more complex and involves a lot of creative and product positioning thinking. While it is true that some coding tasks limit a person's creativity, if there are agents who can automate these coding tasks, like my mom who has an idea for a website but can't program it, such an agent allows her to focus more on the idea and scope of the website, and the code part can be automatically generated.

Customer support agents are already starting to come into play, and in the field of coding, there are a lot of new developments, and although it's not fully mature, many people are working on interesting projects.

Pat Grady: The coding issue you mentioned is interesting because it's one of the reasons why we're optimistic about AI. AI has the potential to shorten the distance from idea to execution, making it easier for creative ideas to become reality. Like Figma's Dylan talks a lot about this.

Harrison Chase: Yes, automation can remove those things that get in the way of creation, and this "idea to reality" transition is very appealing. In the era of generative AI and the age of agents, the definition of "builder" will change.

Most of today's software builders are engineers, or need to hire engineers. And in the future, with agents and generative AI, builders can build more because they can leverage agents at low cost to get the knowledge and capabilities they need. This is the equivalent of commoditizing intelligence by agents, which means that more people can become builders.

Pat Grady: I'm curious, are there any issues that you haven't addressed directly at the moment for developers trying to build products or AI using LLMs, but that you might consider in the future?

Harrison Chase: yes, there are really two main areas. One is the model layer and the other is the database layer.

For example, we don't plan to build a vector database, but it's a very interesting question about how to store data. That's not the focus right now, though. We also don't build a base model, nor do we focus on fine-tuning.

We're more interested in helping developers streamline their workflows in data management, but we're not going to build infrastructure for fine-tuning.

There are a lot of companies, like Fireworks, that are doing these things specifically, and that's really interesting. For developers, these issues are at the bottom of the tech stack.

At the same time, another question worth pondering is, if agents do become more common as we envision, what new fundamental questions will arise? So let's be honest, it's too early to say what we'll or won't do in the future. Because we are still some way from a fully reliable agent economic system.

However, some concepts are already fascinating, such as the infrastructure of authentication, authorization, and payment for agents.

Imagine that someday in the future, agents pay humans for their services, not the other way around! It's a really exciting scene. If agents do become as popular as we think, what kind of tools and infrastructure do we need to support all this?

These questions are a bit different from the needs of the developer community for building LLM applications. LLM applications are already here, and agents are maturing, but the entire ecosystem of agents is not yet fully formed. It's going to be a very interesting way to go.

Sonya Huang: You mentioned fine-tuning and said that you don't plan to go into this area at the moment. It seems that cue engineering and fine-tuning are often thought of as tools that are interchangeable. What do you think of the current use of hints and fine-tuning? What do you think the future holds?

Harrison Chase: Actually, I don't think fine-tuning and cognitive architecture are interchangeable. On the contrary, I feel that they are complementary in many ways.

When you have a more customized cognitive architecture, the responsibilities of each part or node of the agent become more specific and clear. In this case, fine-tuning is particularly useful. Because when you have a clear scope of work for each module, fine-tuning can further optimize the performance of those modules.

So I think the relationship between fine-tuning and architecture is not in competition with each other, but rather in their own roles and reinforcing each other.

Harrison Chase:独创 AI 智能体「认知架构」,定制+极简加减法双驱动

Read on