laitimes

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

author:Microsoft Research Asia

Editor's brief: The services and algorithmic logic of computer systems in today's computer systems are becoming increasingly complex, and understanding, designing, and improving computer systems has become a central challenge. In the face of the exponential growth of system complexity and scale, as well as the emergence of new distributed system forms in large model-driven scenarios, innovative methods and technologies are urgently needed to cope. In the new chapter in the development of computer systems, modern systems should be the result of continuous self-evolution. The rise of machine learning and large models has ushered in a new intelligent opportunity for modern computer systems, namely learning-augmented systems. Microsoft Research Asia innovatively thinks about how systems should continue to learn and evolve themselves from two core directions: "modular" machine learning models and "systematic" large model inference thinking. The goal is to align the model to the complex and changing system environment and requirements, and the inferential thinking to align the behavior of the computer system in time and space. Related paper Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices was awarded the NSDI 2024 Outstanding Paper Award.

With the continuous advancement of technology, computer systems not only bear the heavy responsibility of many services in people's lives, but also contain many complex algorithmic logic. The diversification of user needs and the increase of scenarios have also led to the continuous growth of the complexity and scale of computer systems. From search, shopping, and chatting to news recommendations, streaming, and AI services, the complexity of these systems is not just a huge amount of code, but also the amount of work that goes into designing, developing, and maintaining them. At the same time, new types of scenarios, such as large model-driven co-pilots and AI agents, have also brought about new forms of distributed systems. Understanding, designing, and improving is a central challenge for modern computer systems. However, the exponential growth in system complexity and scale has made it impossible to rely solely on human intuition and experience to solve these challenges.

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

Fortunately, technological updates and iterations of computer science have brought new opportunities for computer systems. Among them, learning-augmented systems are gradually becoming a new trend to reshape computer systems with intelligence. Learning augmentation systems usually adopt three different implementation paths: one is to enhance the performance of heuristic algorithms and decision rules in existing computer systems through machine learning technology; The second is to use machine learning technology to optimize and redesign heuristic algorithms and decision rules. The third is to replace the original heuristic algorithms and decision-making rules with machine learning models, so as to promote the comprehensive intelligent upgrade of the system.

To this end, researchers at Microsoft Research Asia have carried out a series of work on learning augmentation systems. The research focuses on two key areas: first, "modular" machine learning models, which are aligned with the behavior of computer systems; Second, "systematized" large-scale model reasoning thinking gives computer systems the ability to evolve themselves.

A "modular" machine learning model that aligns with the behavior of a computer system

Machine learning excels at extracting patterns and patterns from data and using them for modeling and numerical optimization to drive forecasting and decision-making processes. Modern computer systems generally have well-established behavior and performance monitoring mechanisms, so they can be used as a data source for model training. In previous studies (Metis [1] and AutoSys [2]), researchers have explored how machine learning techniques can be used to optimize system parameters in computer systems. However, practical experience proves that the construction of learning augmentation system is not only the application of existing machine learning algorithms, but also faces the key research challenges of co-design of modern computer systems and machine learning.

Specifically, given the high scale (e.g., clusters of hundreds of distributed microservices) and dynamic nature of modern computer systems (e.g., clusters of microservices in clusters can be developed, deployed, and scaled independently), will it still be a sustainable way to learn the entire system with powerful models in the future? When the system deployment and environment change (for example, the cluster size changes due to system expansion), some of the assumptions made by the machine learning model about the task may no longer hold true. Therefore, if the model is not retrained, the correctness of the model-driven decisions will be compromised. However, the high dynamics and complexity of modern computer systems will make machine learning still face expensive data acquisition and resource overhead costs in continuously learning complex tasks.

"Modularity" is a key to integrating machine learning into the foundation of computer systems. Although modern computer systems are highly scaled and complex, they are actually composed of multiple sub-components or services, and their dynamics follow a pattern. For example, if one of the microservices is updated in a cloud system composed of multiple microservices, the end-to-end performance of the entire system may be affected. However, from a system architecture point of view, this update only changes the coding configuration of a standalone service. The same is true for system scaling, where a service in the system is replicated independently and deployed in multiple copies. Therefore, if the machine learning model only needs to modify the changes accordingly, it will greatly reduce the maintenance cost of the learning augmentation system compared to continuously training the entire model.

Fluxion [3], a framework proposed by researchers to simulate end-to-end system latency using modular learning, is the first step in applying modularized learning in learning augmentation systems. In the task of predicting the latency of the microservice system, Fluxion significantly reduces the maintenance cost of the latency prediction model as individual services continue to scale and deploy. By introducing a new learning abstraction, Fluxion allows individual system subcomponents to be modeled independently, and models from multiple subcomponents can be combined into a single inference graph through manipulation. The output of the inference graph is the end-to-end latency of the system. In addition, the inference graph can be dynamically adjusted to align with the actual deployment of the computer system. This approach is significantly different from the approach of modeling the end-to-end latency directly of the entire system. Related paper, On Modular Learning of Distributed Systems for Predicting End-to-End Latency, was published in NSDI 2023.

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture
With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

Figure 1: Fluxion introduces modular learning abstractions that allow for independent modeling of individual system subcomponents. This approach is significantly different from the approach of modeling the end-to-end latency directly of the entire system.

On the basis of the Fluxion framework, the researchers proposed Autothrottle [4], a two-level resource management framework for microservices with system latency targets, to introduce the concept of "modularity" into system resource management, especially the important task of automatic scaling. Auto-scaling is designed to automatically allocate the appropriate resources to each microservice to meet the service-level objective set by the user. In simple terms, when user demand increases per second, system resources should automatically increase accordingly to meet latency targets. Conversely, when the user demand decreases per second, the system resources should automatically decrease accordingly. This automatic scaling mechanism balances resource allocation with system performance. Currently, it is common practice in the industry to use heuristic algorithms, such as Kubernetes' HPA and VPA, but these algorithms require operators to manually set thresholds and adjust them continuously.

Based on this pain point, machine learning can be used as a new way to drive automatic scaling. The work combines deep learning models (e.g., convolutional neural networks and graph neural networks) and methods (e.g., reinforcement learning) to model the relationship between global resources and performance of the entire system. Although complex models can learn complex relationships across the system, training these models still requires expensive data collection and resource overhead.

Under the modular design concept, Autothrottle breaks down automatic scaling into a series of simple sub-learning problems, similar to Fluxion, where each problem corresponds to a microservice in the system. While each microservice is allocated independently, Autothrottle is designed with the microservice's local latency in mind that it collectively affects the global latency of the system. So, when the global latency of the system is too high (or too low), Autothrottle can predict how much of the local latency target each microservice needs to increase (or decrease) by the same amount. Based on these goals, each microservice then autonomously predicts the required resource allocation, such as CPU, based on its current load.

The researchers found that the CPU throttle metric (the number of times a process's CPU quota was used up over a specific period of time) was a good target for local latency. Therefore, if a microservice is under heavy load, you should increase the CPU resource allocation for that microservice to meet the specified CPU throttle target. Conversely, when the load is light, the CPU resources should be reduced to meet the specified CPU throttle target.

基于系统的全局延迟历史,Autothrottle 的 Tower 组件使用 contextual bandit 算法来计算局部延迟目标,而 Autothrottle 的 Captain 组件则在每个微服务上使用反馈控制回路来快速调整 CPU 资源分配。 这种模块化的设计方法为系统资源管理提供了更加高效和精准的解决方案。 相关论文 Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices 获评 NSDI 2024 杰出论文奖。

With intelligence as the helmsman, we will lead the new direction of modern computer system architecture

Figure 2: Autothrottle applies modular learning to auto-scaling tasks.

"Systematized" large-scale model reasoning thinking, giving computer systems the ability to self-evolve

The rise of large models has brought new intelligent opportunities to learning augmentation systems. In academia and industry, large language models are being used to understand and analyze long documents, logs, code, and more in computer systems. At the same time, a lot of research is being done to help engineers generate program code and operational instructions. Together, these studies demonstrate the potential of large models in the interaction between humans and computer systems.

Researchers at Microsoft Research Asia believe that the greater value of large models lies in giving modern computer systems the ability to evolve themselves. Just like the numerical optimization capabilities of traditional machine learning, the reasoning thinking capabilities of large models are also fascinating. If a computer system can think about whether its behavior (in time and space) is reasonable, and use the chain of thought to reason how its behavior should change, then a computer system can evolve itself. Researchers believe that self-evolution would be a major paradigm shift in the development of computer systems.

Looking back at the development of computers, from computing tools such as abacus and data tables, to modern computer systems such as big data and cloud computing, to emerging distributed systems such as AI agents and embodied robots, the bottleneck of system iteration mainly lies in human brain power and productivity. The reasoning thinking of large models is expected to break through this bottleneck and accelerate the iteration of computer systems.

So, how can we systematize the reasoning thinking of large models and then think about the behavior of computer systems? Researchers at Microsoft Research Asia are actively working in three directions:

(1) The basic knowledge reserve of the computer system of the large model itself

and (2) how the chain of thought of a large model is aligned with the behavior of a computer system (in time and space).

and (3) the practical application of large model-driven learning augmentation systems

In the future, Microsoft Research Asia will continue to devote itself to the research and application of learning augmentation systems, and look forward to working with like-minded researchers to solve these challenges.

Related Paper Links:

[1] Metis: Robustly Optimizing Tail Latencies of Cloud Systems. Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, Guangzhong Sun. USENIX ATC '18.

Link: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/

[2] AutoSys: The Design and Operation of Learning-Augmented Systems. Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, Wenjun Dai. USENIX ATC '20.

Link: https://www.microsoft.com/en-us/research/publication/autosys-the-design-and-operation-of-learning-augmented-systems/

[3] On Modular Learning of Distributed Systems for Predicting End-to-End Latency. Chieh-Jan Mike Liang, Zilin Fang, Yuqing Xie, Fan Yang, Zhao Lucis Li, Li Lyna Zhang, Mao Yang, and Lidong Zhou. USENIX NSDI '23.

Link: https://www.microsoft.com/en-us/research/publication/on-modular-learning-of-distributed-systems-for-predicting-end-to-end-latency/

[4] Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices. Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, Francis Y. Yan. Outstanding Paper Award of USENIX NSDI '24.

Link: https://www.microsoft.com/en-us/research/publication/autothrottle/

Read on