laitimes

This article takes you through the ten-year evolution of Alibaba Cloud network

author:Technical Alliance Forum

Jiushan Alibaba Cloud Developer 2023-09-28 08:31 Posted in Zhejiang

This article takes you through the ten-year evolution of Alibaba Cloud network

Ali Mei's guide

With large and ultra-large enterprises migrating to the cloud, more abundant industry scenarios and more diverse services run on Alibaba Cloud, which puts forward more extreme requirements for the scale, performance, and elasticity of cloud networks, driving continuous optimization of cloud networks, from classic networks to VPCs, control planes from 1.0 to 3.0, data planes from internal services to gateways, border gateways hardwareization, fully embrace intelligent NICs, and then to service network element virtualization, embark on a "spiral" upward route.

First, business needs drive network transformation

The development of the cloud computing industry has brought rich user network requirements, and the traditional network equipment architecture cannot meet the needs of cloud computing. In the early days, Alibaba Cloud users were mainly in the Internet industry and related industries, and the main requirements were to be able to independently plan, securely isolate, and simplify the functions of cloud networks into large-scale, multi-tenant network isolated networks (VPCs), at this stage, Alibaba Cloud network was upgraded from a large layer 2 network of classic network equipment architecture to a dedicated VPC network independently planned by tenants.

With large and ultra-large enterprises successively migrating to the cloud, richer industry scenarios and more diverse services run on Alibaba Cloud, which puts forward more extreme requirements for the scale, performance, and elasticity of cloud networks, and also drives the continuous optimization of cloud networks.

2. From classic network to VPC

Alibaba Cloud began to do cloud computing in 2009, at that time classic network technology is more mature and simpler than VPC technology, so at that time Alibaba Cloud originally chose classic network technology, the most obvious feature of the classic network is the large second layer, the so-called big second layer refers to the classic network in the second layer is completely interoperable, the entire classic network through the neighbor table to forward messages, virtual network and physical network between strong coupling, users purchase classic network server, the server's IP address is assigned, Customers cannot self-plan the network, and because the network between servers is fully reachable, customers need to configure security rules to ensure their own security. The abstract point metaphor in the classic network server is like living in a house in different rooms, the IP address is equivalent to the house number, it has been assigned when renting the room, the security group is equivalent to the door lock, and the tenant needs to set the door lock to prevent visiting the door.

With the increase of users on the cloud and more and more customers embracing the public cloud, more requirements for elasticity and security have been put forward, and the drawbacks of classic network solutions have begun to be exposed, for example

  • Insufficient security isolation: Because it is a large Layer 2 network, although the policy of the security group under the default configuration is to prohibit mutual access, unexpected security incidents may occur caused by the large scope of the security policy configured by the customer.
  • Strong coupling of physical network: The acquisition of ARP (Address Resolution Protocol) information of classical network machines depends on physical switches, which is less flexible.
  • Insufficient address space: Alibaba Cloud classic consumes a large number of private IP addresses because classic network servers are allocated in a single address space, and when virtual machines grow in scale, the problem occurs that the address is exhausted and cannot be expanded.
  • Virtual machine migration domain limitation: Cloud service elasticity and scalability is an important indicator to measure the advantages and disadvantages of cloud vendors' products and services, all of which rely on hot migration. Migration domain refers to the scope of virtual machines that can be migrated under the premise that the IP addresses of the private and public networks remain unchanged, while in the classic network, due to the strong coupling with the physical network, the configuration of the private network and the public network depends on the configuration of physical network devices, resulting in the inability of virtual machines to be migrated across clusters flexibly and quickly recover from failures.
  • Independent planning: In classic network scenarios, because the IP address is assigned at the time of purchase, customers cannot choose independently, customers cannot plan networks according to their actual business plans and requirements, and cannot effectively support large enterprises to migrate to the cloud.

It is precisely because of these shortcomings that Alibaba Cloud Network has fully entered the research and development of VPC networks, and released Alibaba Cloud VPC VPC VPC products in 2014. VPCs are isolated overlay network environments constructed based on technologies such as VXLAN, which are logically isolated from each other and decoupled from the physical network. In the abstract point metaphor, each household independently buys a single-family villa, only needs to select the foundation (network segment of the private network) when purchasing, and then the residents can customize the decoration of each room, divide different rooms (VSW virtual switch), and each villa is completely independent and does not affect each other.

From 2014 to 2018, Alibaba Cloud has been promoting the implementation of the classic network migration VPC solution, during which VPC was recommended by default for new users in July 2016, and Alibaba Cloud released Classiclink in August 2017, marking the maturity of the classic network migration VPC solution.

Classiclink is a solution for the interconnection of classic ECS and VPC networks in the transient migration process, and classic network and VPC network are two network planes, and ClassicLink aligns the two network planes to make them interoperable. After you use ClassicLink, you can access resources in a VPC. At the same time, ECS instances in a VPC can only access classic network ECS instances that have been linked to the VPC, but cannot access classic ECS instances that have not been linked to the VPC, nor can they access other cloud resources in the classic network. In terms of technical solutions, a VXLAN tunnel is established between the classic network ECS and the VPC, so that the forwarding plane of the classic network and the gateway flow table of the VPC have each other's information, so as to achieve interoperability.

Classiclink solves the intermediate problem of server intranet intercommunication between the classic network and the VPC during the migration process, but customers have strong requirements for retaining the public network during the migration process, such as replacing ICP filing, domain name and IP address binding, strong binding relationship between external interfaces to IP whitelists, and invalid licenses of purchased commercial software. As a whole, the virtual network gateway is moved to the overlay, and the IP address is centrally managed by the control plane of the SDN controller after the public network is moved, and the IP address has a long association with the physical device cluster before the move up, and cannot be distributed across clusters, and because it is centrally managed by the overlay gateway, the public network is moved while improving the utilization rate of the public IP address, which is of great significance in the era of gradual exhaustion of IPv4 addresses, and the public network migration was completed in March 2018, and Alibaba Cloud fully entered the VPC era from 2018.

Third, the service model drives the continuous evolution of the underlying layer of the VPC

The underlying components of a VPC are mainly VPC controllers, virtual network gateways, and vswitches, and the evolution routes of these three parts are different due to different roles and traffic sizes.

This article takes you through the ten-year evolution of Alibaba Cloud network

The VPC controller is the control engine of Alibaba Cloud's VPC network, which inherits the control interface of the user console upward, and the customer's console creation and routing configuration can be distributed to the VPC controller through a layer of control, and the data plane units such as virtual network gateways and virtual switches are controlled downward.

In a hyperscale cloud network, there are three main challenges for the virtual network control plane:

  • Larger table entries: Larger is divided into two areas:
    • 1. Cloud network a virtual network gateway cluster usually carries millions of flow table information, flow table includes routing table, virtual machine and physical machine correspondence table, forwarding table, address mapping table, QOS speed limit table, etc., if 1 VPC corresponds to the most basic three tables, each table entry has three pieces of information, then millions of VPCs contain at least 3 million tables, 9 million pieces of information, and usually a large physical routing device routing table entries in hundreds of thousands, The control plane flow table information of the cloud network is much larger than the underlying physical network.
    • 2. With the growth of the customer's own business, there will often be continuous virtual machine growth in a virtual network VPC, and even more than 5,000 VMs in a VPC, which requires a flow table to support more table entries.
  • Wider flow table: the control surface of the corresponding cloud network not only exists on the gateway device, each virtual machine has the corresponding flow table information on the physical machine, then if there are 10 virtual machines on a physical machine, a computing cluster has 5,000 devices, and there may be 6 clusters in a data center, then the corresponding control panel needs to be managed in 10*5000*6 units
  • Faster effective line: The virtual network control plane in the cloud network needs to support virtual networks with more than 10 W VMs, and the batch change takes effect time within 200ms (one RTO). A user's operation of its own VPC change does not affect other virtual networks, and customer-level operations are disorderly and unpredictable for cloud networks.

At first, Alibaba Cloud VPC Controller 1.0 needs to handle relatively simple services, the amount of data is relatively small, the number of devices controlled is relatively small, the architecture is relatively simple, the smoke tube type processing, receiving requests from upstream business parties, converting the parameters in the request into the required object model in the system, each request corresponds to a processing process, verifying the legitimacy of the parameters, generating the corresponding configuration flow table, and the data plane unit responds to confirm the end of the request. This architecture is simple and efficient, but it has poor concurrency and scalability, and cannot effectively manage a large number of forwarding devices. With the increase of service volume, especially the exponential growth of the number of virtual switches, the pressure to deliver configuration to virtual switches is increasing, and Controller 2.0 introduces middle-layer services to form asynchronous processing, thereby saving the time consumption of the entire interface.

This article takes you through the ten-year evolution of Alibaba Cloud network

After fully entering the VPC era, the number of virtual machines on the cloud began to grow explosively, and the number of machines managed by VPC controllers began to increase geometrically. In order to meet such business model changes, VPC Controller 3.0 is horizontally divided, divided into four layers, API processing layer, business logic orchestration layer, task processing layer, configuration delivery layer, and for the characteristics of a large number of virtual switches, less stand-alone configuration, a small number of virtual gateways, and a lot of stand-alone configurations, the virtual switch delivery engine and virtual gateway delivery engine are designed respectively. Each business unit will be independently microserviced, and further develop towards microservice.

4. Internal service to the gateway: service tide, north-south traffic sinks east-west

In the physical network of the data center, the network traffic is usually divided into two types, one is the traffic that interacts between users outside the data center and internal servers, which is called north-south traffic or vertical traffic; The other is the traffic that interacts between servers in the data center, also called east-west traffic or horizontal traffic.

In the VPC, the upper-layer cloud network dilutes the framework and architecture of the underlying physical network, but defines the entire forwarding plane through the two parts of the virtual switch and the virtual gateway, so it can be simplified to all traffic to communicate with the gateway is north-south, and the traffic that does not need to pass through the gateway directly interacts is east-west, such network forwarding also brings new problems, such as when direct communication between VMs and VMs is required, because all VMs in the cloud network are virtual. There is no real MAC address at the remote end of the VM, and all MAC addresses are directly returned to the VM after the ARP Proxy agent on the physical machine, so the communication between VMs in the cloud network is encapsulated on three layers in the physical network, which leads to almost no direct access traffic between physical machines, and all traffic is forwarded to the gateway where the virtual router is located for forwarding, and the virtual router provides transit.

With the development of customer cloud services, from the traditional business model based on mutual access between direct and public network clients to a business model with a large number of interaction and computing in the data center, along with the need to open up peering connection between different VPCs (VPCPeering), the mutual access between VPCs also needs to pass through the centralized gateway contained in the virtual router, and VGW has become a bottleneck for cloud network expansion.

This article takes you through the ten-year evolution of Alibaba Cloud network

As shown in the figure above, for the bottleneck of centralization, the optimal solution is decentralization, sinking the traffic of VM mutual access between VPCs and the traffic of VM mutual access in VPCs from the red line part in the figure to the green line part, so that the north-south traffic that needs to go to the gateway interaction sinks to the east-west direction, thereby bypassing the bottleneck of the centralized gateway and further expanding the horizontal capabilities of the cloud network.

In order to sink the traffic between VMs, Alibaba Cloud sinks all routing tables, and sends the VM and physical machine correspondence table at the virtual switch level, but it is difficult to fully synchronize the direct connection routing table items between all VMs to all physical machines through the controller, so Alibaba Cloud's self-developed RSP (Route Synchronization Protocol) passes RCM (Remote Control). Message) to refresh the relationship between massive VM routes and physical machines.

When the virtual machine in the physical machine is successfully created, start to request other virtual machines in the VPC, here with TCP request as an example, because there is no table entry at the first time, the SYN packet of the TCP packet will be forwarded to VGW (vRouter Gateway), because there is no table entry at the remote end of the virtual machine in the physical machine, this SYN packet will be sent at the same time as an RCM request request packet, the payload carrier of the packet contains the source physical machine IP, The Tunnel ID of VPC, the IP address of the destination VM, VGW will directly forward the SYN packet of the service to the remote physical machine, when receiving the RCM request packet, the IP address of the destination physical machine will be returned to the source physical machine, after the source logistics receives it, the information will be updated to its own table entry, the same is true for the return message, when the destination physical machine receives the source SYN request, the [SYN, ACK] packet responded by the destination virtual machine will be sent to VGW. At the same time, an RCM request request is sent to obtain the information of the peer, and when the RCM reply of VGW is received, the IP address of the physical machine of the source is saved in its own table entry, and the subsequent business interaction does not require VGW participation, and the physical machines on both sides directly communicate to complete the sinking of traffic.

For related interactions, refer to the following figure: First Packet->First Packet Return Packet-> Subsequent Service Packets.

This article takes you through the ten-year evolution of Alibaba Cloud network
This article takes you through the ten-year evolution of Alibaba Cloud network
This article takes you through the ten-year evolution of Alibaba Cloud network

Fifth, the hardware of the border gateway: hardware breakthrough, elephant flow problem under the 28 effect

The evolution of the cloud gateway is closely related to its role in the cloud network and the development of the physical network, the cloud gateway itself is mainly to deal with north-south related traffic in the network, mainly refers to public network traffic, VPC interconnection traffic, cross-data center private line traffic, so the initial Alibaba Cloud gateway by IGW (Internet gateway, handling public network related traffic), VGW (vrouter gateway, handling private network related traffic), CGW (Customer Gateway, handling leased line-related traffic) consists of three parts.

In order to centralize the processing of virtual networks, in 2013, Alibaba Cloud began to develop its own x86 platform based on DPDK high-performance forwarding suite, from a hardware gateway to a virtual network gateway based on DPDK general x86 architecture design, independent deployment, and from a 10G physical network to a 4*40G x86 server gateway architecture. However, with the continuous growth of services, the deployment of multiple clusters of cloud gateways leads to high construction costs and operation and maintenance costs, and due to different traffic peaks and valleys, there is often a situation where a single cluster is idle and another cluster is overwhelmed, so the need to merge cloud gateways is generated, and IGW, VGW, and CGW are merged into XGW, with X standing for Any. After the merger, the stand-alone performance of XGW has been greatly improved, the number of CPU cores has increased from 16 cores to 32 cores, the bandwidth of a single machine has increased from 40G to 160G, the stand-alone PPS has increased from 12M to 26M, and the cost of online equipment has been reduced from the original split 3*4 to 4 units, which also simplifies the traffic on the gateway and reduces the complexity of O&M.

The cloud gateway x86 cluster merger method has been running stably on Alibaba Cloud for 5 years, but since 18 years with the rapid development of online business, including Alibaba Group's Double 11 promotion, the XGW cluster model has encountered new challenges, there is a single-core PPS performance bottleneck under the DPDK architecture, when an elephant stream is forwarded to the cloud gateway, the fixed quintuple will be forwarded to the single core of the CPU for processing, resulting in the single core being full, thereby affecting all other customers and affecting the overall stability. In addition, XGW's single-machine forwarding performance has also become a bottleneck, at the end of 18 years, some head customers proposed that a single cluster support 1.6T traffic requirements, XGW cluster stand-alone 40G, support 1.6T and according to 50% water level assessment, need at least 80 x86 machines to support, such a scale of the cluster can not be effectively maintained and managed.

Based on this background and industry solution survey, Alibaba Cloud Cloud Gateway chooses a software and hardware integration solution, first analyzes the business traffic on Alibaba Cloud, and it can be seen that 20% of customer business occupies 80% of the traffic on the cloud, and the traffic model of these 20% customers basically belongs to elephant flow, which is suitable for hardware gateways to bear. For hardware-based gateways, only 5% of the table entries need to carry 95% of the traffic, and the remaining entries are assisted by a software-based gateway, and the software-based gateway loads 95% of the entries and carries 5% of the service traffic.

Alibaba Cloud's new generation hardware gateway has super computing power and high-bandwidth switching capability, which can be offloaded to the Tofino chip for forwarding for fast forwarding services, and can be forwarded to the CPU for flexible processing of the service logic of the load, while considering the convenience of space and deployment.

  • Switching capacity: 3.2T programmable switch chip, 32*100GE QSFP28 network interface
  • Computing resources: Supports up to 2 CPUs, 26 cores per CPU, and 128GB DRAM
  • 6*PCIE, while supporting FPGA expansion
This article takes you through the ten-year evolution of Alibaba Cloud network

After the gateway is hardwareized, it first solves the problem of single-core performance of the cluster under the DPDK x86 architecture and the problem of single-machine cluster containers, and effectively reduces the cost of border gateway deployment.

Sixth, fully embrace smart network cards: bandwidth is further improved, from software offloading to hardware offloading

Virtual network gateway hardwareization, solves the problem of north-south elephant flow, and with de-gatewayization, the bottleneck of north-south traffic no longer becomes a bottleneck in a short period of time, and the east-west traffic Alibaba Cloud has not stopped the pace of evolution, the east-west traffic traceability is the ability of a single virtual machine, and the ability of a single virtual machine is determined by the virtual switch module on the physical machine, from the early generation of Alibaba Cloud virtual switch (APSARA virtual switch, AVS) in 13 Based on bridge and netfilter, AVSv3 with reference to fast and slow conversion separation and reconstruction in 15 years, and DPDKAVS based on user-mode implementation in 17 years, Alibaba Cloud has never stopped pursuing the ultimate performance of stand-alone machines. With the continuous increase of the scale of virtual machines, the shortcomings of DPDKAVS in user mode have gradually become prominent.

The first is the cost of resources, AVS itself as a software running on the physical machine, need independent CPU and memory resources, which leads to the physical machine itself can sell less resources, that is, often said cloud resources sharing costs, the second is the cost of virtualization, virtual machines in the reception and sending of messages, need the CPU to perform memory copy operations, and in the case of large bandwidth, the memory copy of the CPU is very consuming computing resources, the third is numa overhead, If the CPU of AVS and the vCPU of the virtual machine are on different CPU nodes, a large number of LLC misses will result, resulting in degraded forwarding performance. At the same time, due to the different models of physical machines, AVS needs to have a lot of adaptation work, need to adapt to the model of the network card, and the CPU architecture of the physical machine, so there is a lot of work in deployment and maintenance.

With the extreme performance requirements of user services, the popularity of server 100G network cards, and simple software-based solutions can no longer face the needs of cloud services, the industry and Alibaba Cloud have focused on the smart network interface card (SmartNIC) to improve the capabilities of a single virtual machine, by offloading the functions of the virtual switch to the network card, using the independent CPU and hardware of the smart network card to improve network performance.

What is a SmartNIC? Network interface card (NIC) is a network hardware device that connects the network and the server, used for network data transmission and communication, intelligent network card is a flexible programmable network card, on the basis of the network card to increase the onboard CPU, and the server is used. The intelligent network card has independent computing resources, thereby releasing the CPU computing power of the host, and the intelligent network card offloads the CPU-related data processing functions that are not suitable for the CPU-related in the network, security, and storage to the programmable hardware chip for execution, and also offloads the virtualization hypervisor in the cloud network, so that the server can run business programs more effectively and optimize the overall effectiveness of business data processing.

This article takes you through the ten-year evolution of Alibaba Cloud network

Alibaba Cloud MOC card is an on-chip SoC with CPU, is a smart network card, AVS from the original host CPU running on the physical machine to running on the network card independent CPU, while the virtualization KVM architecture under the virtio driver to replace, AVS from the physical network after receiving packets no longer through the CPU memory copy to the virtual machine, but through hardware DMA to copy the packets to the virtual machine, such a software offloading architecture first solves the resource problemThe memory and CPU that can be sold on the physical machine have been increased, reducing the cost of the physical machine. Usually, the CPU cost of the network card will be lower than the CPU cost of the physical machine, and the hardware DMA method can solve the CPU more effectively, so that the smart NIC scenario can better support large bandwidth.

This article takes you through the ten-year evolution of Alibaba Cloud network

Fast and slow forwarding path

On the basis of software offloading, AVS with the advantage of hardware forwarding, and referring to the fast and slow conversion separation model, in the original software offloading model in the further increase of hardware offload, so that the network forwarding performance is greatly improved, the current latest generation of MOC 2.5 supports 200G network bandwidth, 5000W PPS, and additionally add traffic mirroring, eRDMA, VPC encryption and decryption, jumboframe and other features.

Alibaba Cloud Network enables Alibaba Cloud's latest generation of "Network Plus" instances to support 160G network bandwidth, 3000W PPS, and 1600W connections through intelligent NICs, laying the foundation for NFV-based service NEs.

This article takes you through the ten-year evolution of Alibaba Cloud network

7. Service NE virtualization: Efficiency and cost, NE fully embraces cloud native

In the traditional network, both the underlying IT infrastructure and the upper-layer applications are completed by dedicated equipment. These devices are costly, rigid in capacity and location, and make it difficult to quickly respond to the needs of new services for rapid and flexible network deployment. With the rapid development of cloud computing, the characteristics of iterative innovation of cloud service providers and Internet enterprises also put forward higher requirements for the rapid deployment, flexibility and even cost of network functions. The European Telecommunications Standards Institute (ETSI) first proposed the concept of NFV (Network Functions Virtualization), which decouples network hardware devices and services by using standard x86 server, virtualization and other technologies, so that network functions no longer rely on dedicated hardware. Virtualized resources can be shared flexibly to enable rapid development and deployment of new services, and automatic deployment, auto scaling, fault isolation, and self-healing based on actual business needs.

This article takes you through the ten-year evolution of Alibaba Cloud network

The figure above is an NFV reference framework released by ETSI, and the left side can be divided into three layers from bottom to top, and the lowest is the infrastructure layer, which provides physical resources and virtual technology support for virtualization. In the middle is the virtual function and the corresponding EMS system, in which the actual service processing of network services is realized, and the top is the operation support layer, that is, the operator's OSS/BSS system. On the right is the core of NFV, which is mainly responsible for orchestration and management. As you can see from the figure, this piece consists of three modules: NFVO, VNFM, VIM. NFVO is responsible for NS orchestration, VNFM and VIM management. VNFM is responsible for managing the lifecycle of VNFs and monitoring VNFs, and VIMs are responsible for managing the virtualized infrastructure.

In a computer network, a network element is a manageable logical entity that unites one or more physical devices, while in a cloud network, the service element refers to NAT gateway, Server Load Balancer (SLB), Transit Router, private link, VPN gateway, which provide specific network logic functions.

In the previous cloud network, using load balancing as an example, the use of dedicated hardware resources to build LVS (Linux Virtual Server) cluster, and a physical equipment cluster of a computer room is often planned in advance, when the customer business has a short-term exponential growth, the problem shortcomings of network equipment based on the physical machine form are amplified, for example, the physical machine procurement and deployment cycle is long, the cabinet and the uplink physical switch need to be re-planned, and the physical resource expansion is difficult. At the same time, the cluster deployment has a large fault domain, a long release cycle of cloud network functions, and slow iteration of new functions, and because the network element nodes based on the modular functions of physical machines do not have high elasticity capabilities, this point is contrary to the concept of cloud computing.

Border gateways solve the elephant flow problem through hardware, and the forwarding performance is what hardware devices are good at, but for stateful complex services such as load balancing and SNAT, hardware devices are somewhat inadequate. Alibaba Cloud's integrated software and hardware solution is very compatible with performance and flexibility, and at the same time, by deploying the functions of service network elements in virtual machines, it combines the elastic capabilities of virtual machines to achieve high-performance, highly flexible, and highly elastic network requirements, hardware gateway load base forwarding capabilities, and diverts complex network application logic to the corresponding NFV network elements, while NFV network elements are deployed based on virtual machine ECS to realize the business logic of network functions such as SLB, NAT, CEN TR, VPN, etc. Based on the capabilities of Alibaba Cloud ECS, it achieves elasticity and embraces cloud native, and Alibaba Cloud provides a unified NFV platform Cyberstar to provide a unified foundation for the development of network elements, which only needs to focus on the development of logic code, and the elastic and NFV layer architecture are provided by the NFV platform.

The Alibaba Cloud NFV platform refers to the MANO model of the ETSI ISG NFV working group and divides NFV management and control into three modules: NFVO, VNFM, and VIM, as shown in the figure:

This article takes you through the ten-year evolution of Alibaba Cloud network

In the Alibaba Cloud NFV platform, VIM (Virtual Infrastructure Manager) is responsible for the management of virtual storage, network, and computing resources at the southbound NFVI layer, and is responsible for the lifecycle management of virtual computing resources, creation, deletion, upstreaming, offline, and grayscale upgrades. VNFM mainly allocates one/more logical computing resources to service network elements, and meets the requirements of high availability, elasticity, fault isolation, and self-healing of computing resource groups. In addition, Alibaba Cloud's NFV platform provides shuffle sharding capabilities to effectively narrow fault domains.

The NFVO layer arranges distributed fast and slow nodes for service NEs according to the network topology registered by the service NE, and the Alibaba Cloud NFVO layer design implements a distributed fast and slow forwarding architecture, which is composed of two layers of ECS in the implementation, Fastpath ECS stateless forwarding layer, with Slowpath ECS issuing offload forwarding rules, and a stream first packet missed to Slowpath processing. After Slowpath sends flow to Fast Forward according to logical rules, all subsequent traffic is forwarded to Fastpath and directly forwarded to the service, using this architecture, the service network element only needs to focus on the business logic processing of the network itself, that is, only need to pay attention to the processing of the first flow, and other complex logic, such as distributed Session synchronization and subsequent packet matching rules and logical forwarding, are all handled by the NFV platform.

This article takes you through the ten-year evolution of Alibaba Cloud network

Fastpath uses AVS ECMP capabilities to implement stateless forwarding layer diagrams

Alibaba Cloud's service NEs effectively expand horizontally when large-traffic services occur through the VIM layer of the NFV platform, and shrink when services are reduced, effectively reducing the cost of NEs, and the overall orchestration is similar to that of K8S for services.

VIII. Summary

  • Classic networks are characterized by insufficient security isolation, strong coupling with physical networks, insufficient address space, fault domains, and lack of customer self-sufficient planning, which prompted Alibaba Cloud to evolve to VPCs.
  • The core goal of VPC Controller Evolution is to meet the requirements of ultra-large-scale network networking and provide extremely flexible network management capabilities.
  • The internal service goes to the network element and offloads the traffic accessed between some virtual machines to the east-west direction by looking up the flow table in the first packet, which solves the bottleneck of centralized gateways.
  • 20% of customers contribute 80% of the traffic, and most of the traffic is a single five-tuple elephant flow, prompting the evolution of Alibaba Cloud Cloud Gateway from x86 DPDK cloud gateway to programmable hardware gateway.
  • The traffic of a single virtual machine is broken through the smart NIC, and Alibaba Cloud smart NIC also adopts the fast and slow conversion separation model.
  • Service NE NFV enables Alibaba Cloud service NE elasticity through the capabilities of the NFV platform, effectively saving costs.
  • The fast-slow-to-slow separation capability of Alibaba Cloud's NFV platform simplifies the development of business logic.

Appendix:

Achelous, an academic paper co-sponsored by the Alibaba Cloud VPC team and Zhejiang University, contributed the third SIGCOMM paper to cloud networking, and is the third public cloud network dock platform shared at top conferences after Azure's VFP (NSDI'17) and GCP's Andromeda (NSDI'19).

Readers who want to know more about the details of Alibaba Cloud network can read Achelous' paper, link: https://dl.acm.org/doi/10.1145/3603269.3604859.

Read on