Editor's brief: The services and algorithmic logic of computer systems in today's computer systems are becoming increasingly complex, and understanding, designing, and improving computer systems has become a central challenge. In the face of the exponential growth of system complexity and scale, as well as the emergence of new distributed system forms in large model-driven scenarios, innovative methods and technologies are urgently needed to cope. In the new chapter in the development of computer systems, modern systems should be the result of continuous self-evolution. The rise of machine learning and large models has ushered in a new intelligent opportunity for modern computer systems, namely learning-augmented systems. Microsoft Research Asia innovatively thinks about how systems should continue to learn and evolve themselves from two core directions: "modular" machine learning models and "systematic" large model inference thinking. The goal is to align the model to the complex and changing system environment and requirements, and the inferential thinking to align the behavior of the computer system in time and space. Related paper Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices was awarded the NSDI 2024 Outstanding Paper Award.

With the continuous advancement of technology, computer systems not only bear the heavy responsibility of many services in people's lives, but also contain many complex algorithmic logic. The diversification of user needs and the increase of scenarios have also led to the continuous growth of the complexity and scale of computer systems. From search, shopping, and chatting to news recommendations, streaming, and AI services, the complexity of these systems is not just a huge amount of code, but also the amount of work that goes into designing, developing, and maintaining them. At the same time, new types of scenarios, such as large model-driven co-pilots and AI agents, have also brought about new forms of distributed systems. Understanding, designing, and improving is a central challenge for modern computer systems. However, the exponential growth in system complexity and scale has made it impossible to rely solely on human intuition and experience to solve these challenges.

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

Fortunately, technological updates and iterations of computer science have brought new opportunities for computer systems. Among them, learning-augmented systems are gradually becoming a new trend to reshape computer systems with intelligence. Learning augmentation systems usually adopt three different implementation paths: one is to enhance the performance of heuristic algorithms and decision rules in existing computer systems through machine learning technology; The second is to use machine learning technology to optimize and redesign heuristic algorithms and decision rules. The third is to replace the original heuristic algorithms and decision-making rules with machine learning models, so as to promote the comprehensive intelligent upgrade of the system.

To this end, researchers at Microsoft Research Asia have carried out a series of work on learning augmentation systems. The research focuses on two key areas: first, "modular" machine learning models, which are aligned with the behavior of computer systems; Second, "systematized" large-scale model reasoning thinking gives computer systems the ability to evolve themselves.

A "modular" machine learning model that aligns with the behavior of a computer system

Machine learning excels at extracting patterns and patterns from data and using them for modeling and numerical optimization to drive forecasting and decision-making processes. Modern computer systems generally have well-established behavior and performance monitoring mechanisms, so they can be used as a data source for model training. In previous studies (Metis [1] and AutoSys [2]), researchers have explored how machine learning techniques can be used to optimize system parameters in computer systems. However, practical experience proves that the construction of learning augmentation system is not only the application of existing machine learning algorithms, but also faces the key research challenges of co-design of modern computer systems and machine learning.

Specifically, given the high scale (e.g., clusters of hundreds of distributed microservices) and dynamic nature of modern computer systems (e.g., clusters of microservices in clusters can be developed, deployed, and scaled independently), will it still be a sustainable way to learn the entire system with powerful models in the future? When the system deployment and environment change (for example, the cluster size changes due to system expansion), some of the assumptions made by the machine learning model about the task may no longer hold true. Therefore, if the model is not retrained, the correctness of the model-driven decisions will be compromised. However, the high dynamics and complexity of modern computer systems will make machine learning still face expensive data acquisition and resource overhead costs in continuously learning complex tasks.

"Modularity" is a key to integrating machine learning into the foundation of computer systems. Although modern computer systems are highly scaled and complex, they are actually composed of multiple sub-components or services, and their dynamics follow a pattern. For example, if one of the microservices is updated in a cloud system composed of multiple microservices, the end-to-end performance of the entire system may be affected. However, from a system architecture point of view, this update only changes the coding configuration of a standalone service. The same is true for system scaling, where a service in the system is replicated independently and deployed in multiple copies. Therefore, if the machine learning model only needs to modify the changes accordingly, it will greatly reduce the maintenance cost of the learning augmentation system compared to continuously training the entire model.

Fluxion [3], a framework proposed by researchers to simulate end-to-end system latency using modular learning, is the first step in applying modularized learning in learning augmentation systems. In the task of predicting the latency of the microservice system, Fluxion significantly reduces the maintenance cost of the latency prediction model as individual services continue to scale and deploy. By introducing a new learning abstraction, Fluxion allows individual system subcomponents to be modeled independently, and models from multiple subcomponents can be combined into a single inference graph through manipulation. The output of the inference graph is the end-to-end latency of the system. In addition, the inference graph can be dynamically adjusted to align with the actual deployment of the computer system. This approach is significantly different from the approach of modeling the end-to-end latency directly of the entire system. Related paper, On Modular Learning of Distributed Systems for Predicting End-to-End Latency, was published in NSDI 2023.

Figure 1: Fluxion introduces modular learning abstractions that allow for independent modeling of individual system subcomponents. This approach is significantly different from the approach of modeling the end-to-end latency directly of the entire system.

On the basis of the Fluxion framework, the researchers proposed Autothrottle [4], a two-level resource management framework for microservices with system latency targets, to introduce the concept of "modularity" into system resource management, especially the important task of automatic scaling. Auto-scaling is designed to automatically allocate the appropriate resources to each microservice to meet the service-level objective set by the user. In simple terms, when user demand increases per second, system resources should automatically increase accordingly to meet latency targets. Conversely, when the user demand decreases per second, the system resources should automatically decrease accordingly. This automatic scaling mechanism balances resource allocation with system performance. Currently, it is common practice in the industry to use heuristic algorithms, such as Kubernetes' HPA and VPA, but these algorithms require operators to manually set thresholds and adjust them continuously.

Based on this pain point, machine learning can be used as a new way to drive automatic scaling. The work combines deep learning models (e.g., convolutional neural networks and graph neural networks) and methods (e.g., reinforcement learning) to model the relationship between global resources and performance of the entire system. Although complex models can learn complex relationships across the system, training these models still requires expensive data collection and resource overhead.

Under the modular design concept, Autothrottle breaks down automatic scaling into a series of simple sub-learning problems, similar to Fluxion, where each problem corresponds to a microservice in the system. While each microservice is allocated independently, Autothrottle is designed with the microservice's local latency in mind that it collectively affects the global latency of the system. So, when the global latency of the system is too high (or too low), Autothrottle can predict how much of the local latency target each microservice needs to increase (or decrease) by the same amount. Based on these goals, each microservice then autonomously predicts the required resource allocation, such as CPU, based on its current load.

The researchers found that the CPU throttle metric (the number of times a process's CPU quota was used up over a specific period of time) was a good target for local latency. Therefore, if a microservice is under heavy load, you should increase the CPU resource allocation for that microservice to meet the specified CPU throttle target. Conversely, when the load is light, the CPU resources should be reduced to meet the specified CPU throttle target.

基于系统的全局延迟历史，Autothrottle 的 Tower 组件使用 contextual bandit 算法来计算局部延迟目标，而 Autothrottle 的 Captain 组件则在每个微服务上使用反馈控制回路来快速调整 CPU 资源分配。这种模块化的设计方法为系统资源管理提供了更加高效和精准的解决方案。相关论文 Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices 获评 NSDI 2024 杰出论文奖。

Figure 2: Autothrottle applies modular learning to auto-scaling tasks.

"Systematized" large-scale model reasoning thinking, giving computer systems the ability to self-evolve

The rise of large models has brought new intelligent opportunities to learning augmentation systems. In academia and industry, large language models are being used to understand and analyze long documents, logs, code, and more in computer systems. At the same time, a lot of research is being done to help engineers generate program code and operational instructions. Together, these studies demonstrate the potential of large models in the interaction between humans and computer systems.

Researchers at Microsoft Research Asia believe that the greater value of large models lies in giving modern computer systems the ability to evolve themselves. Just like the numerical optimization capabilities of traditional machine learning, the reasoning thinking capabilities of large models are also fascinating. If a computer system can think about whether its behavior (in time and space) is reasonable, and use the chain of thought to reason how its behavior should change, then a computer system can evolve itself. Researchers believe that self-evolution would be a major paradigm shift in the development of computer systems.

Looking back at the development of computers, from computing tools such as abacus and data tables, to modern computer systems such as big data and cloud computing, to emerging distributed systems such as AI agents and embodied robots, the bottleneck of system iteration mainly lies in human brain power and productivity. The reasoning thinking of large models is expected to break through this bottleneck and accelerate the iteration of computer systems.

So, how can we systematize the reasoning thinking of large models and then think about the behavior of computer systems? Researchers at Microsoft Research Asia are actively working in three directions:

(1) The basic knowledge reserve of the computer system of the large model itself

and (2) how the chain of thought of a large model is aligned with the behavior of a computer system (in time and space).

and (3) the practical application of large model-driven learning augmentation systems

In the future, Microsoft Research Asia will continue to devote itself to the research and application of learning augmentation systems, and look forward to working with like-minded researchers to solve these challenges.

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

A "modular" machine learning model that aligns with the behavior of a computer system

"Systematized" large-scale model reasoning thinking, giving computer systems the ability to self-evolve

Read on

Can quantum computers find criminals faster?

Schedule | The 39th China Conference on Computer Applications (CCF NCCA 2024)

Deep Analytics AO: Bringing AI to Crypto's Supercomputer

985 college graduates go to a company for an interview, and the examiner asks: What is the name of a horse that ran from China to the United States? Graduates answer: Maxima. The examiner said, "Maxima is wrong." graduate

Arm and RISC-V are at your disposal, and Pine64 releases the Oz64 single board computer

Apple's VisionPro national version is officially on sale in China, and it is undeniable that Apple's VisionPro has set off a MR track fever, and many technology companies

Cutting leeks for domestic consumers? Apple VisionPro was officially released today, with high prices and limited functions, which has been complained by consumers undeniably, Apple Vision

Daily recommendations for popular GitHubTrending open source projects. 1. grafana/alloy has almost 1000Star, which is developed with Go.

2024 High Paying Majors Ranking! Software Engineering No. 2 Computer No. 6, and No. 1 from the bottom is a hot topic these days

Apple's release of the AppleVisionPro national version has also sparked a heated discussion today,In addition to the discussion of the configuration,It is the high price of 29999 yuan

Apple released the national version of AppleVisionPro, Meta plans to release an affordable headset, vivo's new products support 3D shooting function, in the case of Apple taking the lead,

Professor of Computer Science at the National University of Singapore has won the SIGMOD Award for the third time

China Airborne Computer Industry Market Research Report

Artificial intelligence agents take over the beginning of computer tasks completed by humans

Computer technology, a super ruthless college for re-examination!

The Wisdom and Success of the Modern Woman: The Art of Mate Selection, Workplace and Family Life

On the car logo only serves NIO, minimalist and modern VI

The first anniversary of the China Museum of Modern Press and Publication: a reliable, accessible and amiable museum

A new choice for exquisite transportation! THE MAXIMUM CRUISING RANGE IS 355 KM, AND THE OFFICIAL IMAGE OF THE HYUNDAI INSTER WAS RELEASED

Chinese modernization is a dry one

Chinese-style modernization is a week of deep government

What aspects should be explained in terms of the basic connotation of a modern socialist financial power?

Huaihe No. 1 160㎡ modern style|warm color matching + simple lines, outlining high-quality space

The modern Yu Zecheng, who is lurking in Taiwan's military intelligence bureau, has successfully found out senior officials in the military! Rely on exchange to return to his hometown

The design style has changed greatly, and the exterior drawings of the new Hyundai Paristi have been revealed

Promote Chinese-style modernization by comprehensively deepening reform

Grateful Heart (Modern Poem)

Meet the modern version of "Lonely Boat Lady" in Huashan

[Modern Poetry] v

When pets become the "best companions" in the hearts of modern people, where is the boundary of "friendliness"?

The modern version of Mulan, who has been in the army for five years and never takes off her clothes to sleep, disguises herself as a man and joins the army to avenge her father