Behind the two acquisitions: OpenAI wants to be a "large language model operating system"

The future of AI is pointing north

2024-06-27 15:46Tencent Technology AI Future Finger North Official Account

Behind the two acquisitions: OpenAI wants to be a "large language model operating system"

This article focuses on OpenAI's two recent acquisitions, from which it tries to build a huge blueprint for OpenAI's design of the "future operating system - LLMOS".

Tencent Technology Author: Hao Boyang, Li Anqi

Edited by Guo Xiaojing

There has been no news from GPT-5 for a long time, but OpenAI is obsessed with acquisitions.

Following last week's acquisition of data query company Rockset, OpenAI's second acquisition in a short period of time. OpenAI's intent to acquire Rockset is clear, to strengthen its RAG retrieval capabilities in order to better conduct ToB business.

However, the main business of the acquired company Multi is desktop remote control, providing office collaboration support for development tasks.

Desktop remote control is not unfamiliar to us, and this technology has been quite mature as early as 10 years ago.

The concept of collaborative office emerged relatively late, but the market is relatively saturated. Like Slack and DingTalk, the world's fastest-growing collaborative office software, they also support remote control. And they are also carrying out their own AI application experiments internally. It is very difficult to seize the SaaS market, which is a relatively fixed market with high cost of substitution.

From the perspective of the above single business, it is difficult to see that OpenAI's acquisition intentions are relatively vague. But if we had made them part of a larger plan, we might have been able to make a difference.

What kind of company is Multi?

Multi was founded in 2019 and is headquartered in San Francisco, USA. There are two key figures, one is former Dropbox product manager Alexander Embiricos, a graduate of Stanford University with a major in machine learning. His work at Dropbox is also related to content collaboration.

The other is Charley Ho, a former Google software engineer who also graduated from Stanford University with a degree in computer science. During her tenure at Google, Charley Ho was responsible for the Bobop project, a company acquired by Google in 2015 to focus on enterprise application platform development, which was merged into Google's cloud development team.

Including them, there are only five employees in the entire Multi company.

Multi is committed to creating a Zoom-based remote team collaboration platform that enables teams to work together through video chat. In the 2023 podcast, Embiricos talks about the origins of entrepreneurship. At the time, he and Ho noticed that most video chat tools and remote control tools were designed around presentations rather than joint problem-solving. To solve this problem, Multi came into play.

But in reality, it may have taken some time for them to come up with the idea. When Multi was first born in 2019 (then called Remotion), Fast Company described it as a "lightweight" video conferencing tool that could be used as an alternative to Zoom or Microsoft Teams. Its main feature is that the contacts are in the sidebar, more like instant messengers. But video conferencing quickly cooled down after the pandemic, and it also shifted to deeper remote collaboration.

In 2021, the year of change is taking place, and regular collaborative office products such as Slack are already in full swing, and they can also support simple tasks such as text editing. At the same time, the founder of Pop, another remote-based collaboration software with a very similar positioning, pointed out that Slack is not interested in programming support.

竞品PoP CEO的心路历程

At this time, collaboration tools in other functional areas are maturing, but collaboration tools in programming are not keeping pace with the industry.

"Other functional areas have acquired new multi-person collaboration tools that allow for faster collaboration – like Figma used by designers – but engineers don't have such powerful tools. Sure, we have great pull requests and repository tools like GitHub, but that's similar to the days when the design world was stuck in Dropbox + Sketch. It worked, but could have been faster. ”

Programming is a typical multi-person collaboration scenario. This is due to the complexity of modern software systems, which typically involve multiple subsystems and components. A large-scale application usually contains multiple modules, such as front-end and back-end development, user interface design, database management, algorithm implementation, security mechanisms, etc., each module requires professional knowledge and skills, and multi-person collaboration is the norm.

Therefore, Multi has set its sights on this segment to support more complex programming collaboration scenarios.

They advocate transparency for the collaboration process: "Liberate work from its containers (apps, tabs, screen sharing, etc.) so that you can interact directly with what you work and with your partners." ”

Specifically, the core functionality that Multi provides for code collaboration is the sharing of applications. Multi allows up to 10 people to collaborate across screens at the same time, with independent cursors that allow you to draw and annotate open apps, or even merge different app views into a shared view. The UI is characterized by a clear understanding of what others are doing and what they are doing.

In addition, Multi has also added new AI features to this AI boom, such as AI summarization, assistance in formulating action items, and one-click generation of Linear questions.

Why did OpenAI acquire a remote desktop control company?

After learning about Multi's main business, it is still difficult to understand OpenAI's acquisition intentions.

OpenAI 的AI Agents游乐园

But a closer look at Multi's acquisition announcement may give us some clues. In it, Multi mentions their plans and outlook for the future: "Lately, we've been asking ourselves more and more often how we should work with computers. It's not about operating or using a computer, it's really working with a computer, with artificial intelligence. We believe that this is one of the most important product issues of our time. “

From this, it can be seen that Multi's first goal after entering OpenAI should be to develop a tool suitable for humans and multiple AI agents to work together.

This statement is almost exactly the same as OpenAI's thinking. A few weeks ago, OpenAI's CTO Mira Murati was asked a question in an interview with Dartmouth's School of Engineering: "If GPT were to become incredibly intelligent in three years, would it decide on its own to connect to the internet and get moving?" Mira replied, "The team at Open AI has put a lot of thought into this situation. They believe that as long as AI continues to develop, systems with high agent capabilities will definitely emerge. And these AIs will even form communities, connect to the Internet, and communicate with each other. And to accomplish certain tasks together, or to work seamlessly with humans. So, in the future, humans and AI will work together just as we do with each other today. “

Therefore, it can almost be judged that this is the future direction of AI development in OpenAI's plan. Choosing to buy Multi at this time may mean that OpenAI's multi-agent capability has reached the level of application.

(Twitter网友也有这个洞察)

In the past year, AI agent systems, especially systems that work together to complete complex tasks, have been the core products that the industry wants to break through. In this process, a system was born that combined with Agents, such as Langchain and AutoGen, was born. It also gave birth to a series of monographs such as ComfyUI, Dify, Coze, etc., to build a workflow software for complex agent systems.

For example, Perplexity's recently launched Pages product is a multi-agent collaboration product, because it is difficult for a single model to complete such a complex layout and image selection design without calling other agents.

Perplexity的Pages

Pages was apparently influenced by GPT-Newspaper, an open-source multi-agent project that was launched in March this year. In this project, the team used a full seven agents to break down the workflow of the content produced by a newspaper, and finally made the AI form a professional and rich newspaper with multi-source review.

GPT-Newpaper的流程图

GPT- Newspaper生成的报纸

Now that there is a mature multi-agent product in the industry, there is no reason why OpenAI should not make further arrangements for it.

And the concept of AI and human collaboration has also emerged very early. Just six months after ChatGPT was born, its main partner Microsoft unveiled the concept of Copilot at the Build conference, that is, AI acts as a co-pilot to assist humans in their daily work. This human-computer collaboration is presented in Microsoft's various tools, with large language models as the core, to solve the specific problems involved in the tools, such as Github Copilit is focused on programming services, Microsoft 365 Copilot focuses on completing documentation work, but there is not much linkage between each Copilit.

At the same time, Microsoft also announced Windows Copilot, a system-level collaborative AI, but until today this feature is still relatively limited by the conversational invocation function, and the systematic connection with other Copilots has become superficial.

Windows Copilot

What OpenAI wants to do may be an upgraded version of this collaboration model, that is, Github Copilot and Micorsoft Copilot, which are optimized for a single application, to unite them into applications that handle more complex things.

It's powerful, but it has a small problem. Just imagine, if this combination of AI Agents is powerful enough, it can do basically everything that a human can do. In this process, people can basically be canceled except for publishing requirements. This is an extremely bad feeling in terms of user experience.

To smooth out this experience, OpenAI's multi-agent playground needed to create a sense of "multi-person collaboration," as if the AI was working side-by-side with you. You can feel the AI and know how it's going on.

This step is even more necessary when the agent is not so powerful. Because the AI may need your help or further clarification and confirmation of the need at any time. This kind of interaction is also the most comfortable communication between comrades-in-arms who work side by side.

Based on our introduction to Multi above, what it did before was to build a collaboration system with multiple participants to complete complex tasks simultaneously, and what it does best is create an intuitive sense of collaboration through its rich UI.

Although Multi's work is already roughly imaginable. But there is still a question here that has not been completely resolved. In such a complex scenario that requires the mobilization of multiple AI agents, where does OpenAI plan to use it?

The birth of LLMOS

It should first be applied to an Apple-specific application.

Multi co-founder and CEO Alexander Embiricos posted on his X account yesterday that he (and presumably the entire Multi team) has joined OpenAI's "ChatGPT desktop team," which is tasked with continuing to develop the ChatGPT Mac desktop app, which will be released in April 2024.

At OpenAI's spring conference this year, OpenAI showcased its ability to support screen recording and analyze your current work by recognizing speech and screen content.

More than a month has passed, and this feature has not yet been implemented

The fact that all Multi members joined the development of the desktop app proves that OpenAI is not just treating this desktop app as a product that simply calls ChatGPT.

They want the app itself to become a new operating system, an LLMOS (Large Language Model Operating System)

In March this year, Andrey Karpathy, a former core scientist at OpenAI, said at the event that "OpenAI is currently working hard to build something similar to an operating system - LLMOS." “

The most popular reviews on X in the Multi acquisition

LLMOS is a system proposed by Karpathy in December last year, which uses a powerful large language model as the kernel process and becomes the operating system that mobilizes all the tools in the other system. He believes that it will be the same as Windows and MacOS, and become a new mode of human-computer interaction in the future.

This is what Karpathy thought last December that LLMOS needed to achieve: the ability to browse the Internet; Use of existing software infrastructure (calculator, Python language, mouse/keyboard); comprehend language and video (pictures); Can think in complexities; Ability to self-improve in some areas that can provide reward functions; It can be downloaded from the "App Store"; Have your own file system, or you can call up external files and search for them; It can be customized and fine-tuned for specific tasks, and can communicate with other large language models. At the time, ChatGPT only solved networking and external programming tool calls.

Karpathy设想的LLMOS架构

In June of this year, we saw that OpenAI was indeed gradually completing and perfecting this LLMOS puzzle as Karpathy envisioned in the past few months. When he proposed this idea, the hard indicators of the large model "should be able to reach the level of GPT-4, be able to spit out more than 20 tokens per second, and the "storage" should be able to reach 128k tokens, which has also been achieved on GPT-4o. And the degree of perfection of its soft ability is also quite high. For example, the networking function is more complete, and the Code Interpreter has evolved to the level where it can do data analysis.

OpenAI's current level of completion

This means that the basic capabilities of LLMOS are almost ready.

As far as OpenAI itself is concerned, the only things that have not been solved in the LLMOS framework are the RAG system and multi-agent invocation and interaction. These two systems have hardly been mentioned in OpenAI's upgrades in the past.

But two companies that OpenAI recently bought, Rockset, are responsible for information retrieval; Multi is responsible for multi-agent (LLM) interactions. It just makes up for these two shortcomings of OpenAI in building LLMOS.

This puzzle, after these two companies joined, is finally completed. The OpenAI version of LLMOS should be the latest desktop system.

And Multi's role in this may be far more important than we think.

Karpathy also said in a March interview, "LLMOS will provide customized applications for different companies and verticals. Just like the Windows operating system comes with some default applications, LLMOS will also have some default applications. At the same time, it will also support a rich third-party application ecosystem for different areas of economic activities. ”

However, these apps may not take the same form as traditional apps. AIOS, the hottest LLMOS project on Github, which was launched in March this year, describes the application in the LLMOS system as an agent mobilized by LLM.

The top layer of the AIOS architecture is a variety of agent applications

The logic behind this is not difficult to understand. At present, all third-party application ecosystems are basically pursuing AI, trying their best to combine the capabilities of large language models with their products. This process is actually the process of brokering third-party applications. Since they have taken this step, OpenAI no longer needs to adjust the call API of each application, and GPT itself will make special calls to the application. It only needs to integrate the ability to call the agent into the API, providing a "proxy mode" for developers to embed the agent into any application or website.

GPT does a good job of mastermind, completes the call sorting, and lets these agents with more expertise in each application jointly complete specific tasks.

Therefore, the new LLMOS itself may be a home for multiple agents to collaborate, and the underlying product design of this process may determine the experience of the entire system.

That's why Multi is important.

Why Multi?

Multi-code collaboration isn't unique to Multi, and there are many more on the market that offer the same feature. For example, Zed, which is stronger than the editor and supports multi-person collaboration, and Pop, Tulpe, etc., which focus on the collaboration experience, the latter two products are even highly similar to Multi in terms of interface and functionality. To be selected by OpenAI, Multi must have its advantages.

Pop's code collaboration feature demo is basically the same

Excellent AI genes

Compared to other products with similar functions, Multi has demonstrated more understanding and design capabilities for AI.

In March of this year, Multi released an update that added AI features.

In their update document, they explain their AI philosophy: like GitHub's cutting-edge research Wittenburger, they believe that chatbots are not the best place for large language models.

Multi在博客中直接引用了Wittenburger的思考

They believe that compared with AI human-machine question answering, which "has no context and causes a certain cognitive burden", users need more "moisturizing and silent" collaborative intelligence. This is actually in line with the concise and natural application concept that OpenAI has always pursued.

AI update for Multi

They also summarized three concepts for AI products:

- Let LLMs do what they do better. Multi believes that LLMs are not suitable for distilling information that needs to be accurate, such as decisions, rationales, or plans involved in multi-person collaboration, because they do not yet have the ability to discern details and are likely to distort the facts. But LLMs are good at converting records into skimmable fragments, so they advocate for LLMs to distill the focus and build indexes, where collaborators can easily target based on the summary and go back to solving the problem in a concrete scenario with context.

- Internalize the AI as a participant. "Can we incorporate AI into familiar multi-person collaboration capabilities instead of creating a new system?" With this in mind, AI is seen as a teammate rather than a separate note-taking organization. In the process of generating summaries, AI and user actions are smoothly integrated into the same interface, and the summary content can be added and edited together with the user.

- Open input. Multi abandons the input-output interaction of the chat interface and preserves the autonomy of the user and the AI in an open way. After the video meeting ends, Multi will only automatically trigger the AI summary. If the user needs to know more about the matter, they can add it by creating an action item. At the same time, Multi is equipped with a note copilot that predicts the user's notes based on the context.

In addition to the concept level, Multi is also quite good in terms of AI landing capabilities. They see AI as a separate agent to collaborate rather than a reactive production tool – it automatically helps you summarize after each meeting. If you need it, it's just a click away. And it will only summarize the information you need most, such as the focus of the discussion and the next step.

If you want to continue to enrich this summary, you can pull it down below, Multi also gives a guided Q&A system. The structure is very much in line with the habits and demands of general users.

The segmentation is clear

Through these principles and practices, Multi does seem to be capable of serving as the helper OpenAI needs when building a multi-agent collaboration system.

Investment Insider

In addition to the technical advantages, the investors behind Multi are also inextricably linked to OpenAI.

Public information shows that Multi received nearly $13 million in investment from venture capital firms such as Greylock and First Round Capital earlier this year.

Greylock is a top venture capital based in Silicon Valley, with early investments in Facebook, Linkedin, Instagram, Dropbox and other star companies. In the new wave of AI investment, Greylock has even been called "the closest VC to OpenAI and DeepMind, two of the world's top AI labs".

This is thanks to one of Greylock's partners, Reid Hoffman. In 2015, Hoffman co-founded OpenAI with Musk, Altman and others, and Reid Hoffman had been a partner of Greylock for many years, which gave Greylock a head start in the AI wave.

Greylock is also among the investors in OpenAI's recent acquisition of Rockset.

Both acquisitions are related to Reid Hoffman, which is likely to be a key factor behind OpenAI's choice of Multi.

View original image 374K

Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"
Behind the two acquisitions: OpenAI wants to be a "large language model operating system"

Behind the two acquisitions: OpenAI wants to be a "large language model operating system"

Behind the two acquisitions: OpenAI wants to be a "large language model operating system"

Read on