laitimes

Turning Game MQ Refactoring: A Journey of Thinking and Learning

author:Flash Gene

1 Background

Since the game business set sail in 2017, it has gone through almost seven springs and autumns, and after a long period of development, it has unconsciously carried a heavy historical burden. It is like a big tree, with both lush and strong branches and many withered branches and leaves. This article focuses on the subtle module of MQ consumption for product updates, and details how the game business can refactor the original code to revitalize the game tree.

1.1 Reasons for Commencement

One day, I suddenly received an online call for RPC on a downstream interface to limit the current rate, and the threshold of the rate limit alarm was 600k/min. Therefore, we started to investigate the cause of the triggering of the current limiting alarm. Tracing back to the source, it was found that there was an external update operation, and the update interface call threshold was about 3K/min. Obviously the update traffic is not high, why is the rate limit triggered? So the investigation and investigation of the system was started.

1.2 The status quo before refactoring

After a preliminary exploration of the reasons for throttling, we further comprehensively sorted out the MQ of commodity consumption and found that the game has 19 consumers who have subscribed to the MQ of commodity updates, distributed in different clusters. Each of these consumers has its own internal query and update-related operations, because some of its update operations will generate new messages, which will further amplify the API calls.

The survey also found that some discarded consumers are still consuming online, and some of the same consumption logic is being consumed by multiple consumers.

In view of the above questions, the questions are as follows:

Turning Game MQ Refactoring: A Journey of Thinking and Learning
a. 逻辑分散,可维护性差
b. 服务调用量成倍放大
c. 存在并发更新和覆盖的情况
d. 存在废弃或者重复消费情况
           

1.3 Problem Analysis

Why is this happening?

The author believes that the early requirements are quickly iterated and the new consumers can quickly respond to the needs and be easy to develop. However, with the evolution and iteration of requirements, more and more new consumers have been added, and changes in requirements and personnel have made it more and more difficult to fully control the overall picture of the system. The ever-changing logic makes it more difficult to maintain the entire system, which leads to a variety of problems.

To reduce the number of MQ-related interface calls, there are two core points: first, reduce queries and realize data reuse; Second, reduce the number of update API calls and inhibit the generation of new messages. However, today's systems are so fragmented that it is extremely difficult to find a good solution on top of the existing structure. To change the current situation, a new structure is needed to refactor the original MQ consumption logic. With the new structure, it can not only solve the current problems, but also build new constraints, guide new functional writing methods in the future, and make the whole system more healthy and stable.

2 Refactoring

2.1 Objectives

Before embarking on a refactoring, it's important to be clear about your goals. Objectives help us to develop a plan, clarify the scope, and guide the project to be implemented without deviating from the right track.

a. 合理的结构
b. 优化重复无效消费逻辑
c. 提高消费能力
d. 逻辑优化
e. 构建新体系
           

It is expected that through a reasonable code architecture, the logic of consumer goods MQ messages will be highly cohesive and low-coupling, and the responsibilities of each class and method will be clear. Refactoring is not simply a copy of an old system, but also a task of defining new constraints for future expansion of the system. It is as if new branches and branches sprout in this game tree, which determines the growth direction of subsequent branches.

In addition to a reasonable architecture, it is also necessary to optimize and solve the previous duplicate and invalid consumption situation, improve the overall consumption capacity, and solve the problem of amplification of the original interface call. In addition, in the survey, it was found that there were some discarded logic and some problematic code in the system, which were optimized by taking advantage of this refactoring. (Note: It is generally not recommended to modify the logic in refactoring, and it is important to fully test the modified logic, otherwise new system bugs may be introduced)

2.2 Develop a plan

The overall refactoring plan consists of three parts: architecture design, implementation plan, and test plan.

2.2.1 Architecture Design

Turning Game MQ Refactoring: A Journey of Thinking and Learning

The overall architecture mainly uses the Xiangyuan design model and the strategic design model, and the whole architecture is composed of three parts from top to bottom.

a. 数据预处理
b. 按分类调用Handler进行消费
c. 收拢调用更新接口
           

A: Data preprocessing is mainly responsible for filtering and pre-querying data. It includes batch consumption of MQ messages, filtering non-game messages, calling batch query interfaces, and preprocessing logic that may be repeated in the future, reducing duplicate queries and improving interface efficiency.

b: Handlers and public handlers are mainly extracted by category to make responsibilities clear. Extract the public handler to process some common logic, such as recording buried logs, etc. The handler of each category only processes the business logic of the classification, which realizes logical decoupling and improves maintainability. In order to facilitate the use of facets and enhance the cohesion of related functions, an additional layer of Manage is extracted below the Handler. The Manage layer is mainly responsible for implementing the specific consumption logic and providing reusable components to make the logic more cohesive.

C: The update logic related to the middle office products is restricted, and its main purpose is to reduce the call of the update interface. (Since these updates will generate new messages, the number of API calls is reduced by calling batch APIs, so as to effectively solve the problem of API call frequency amplification.)

2.2.2 Implementation Plan

We divide the entire refactoring into the following three phases.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

Phase 1 and Phase II

Phase 1: Migration and reconstruction of non-core MQ logic. Non-core services are launched in grayscale, controlling the scope of influence, and quickly verifying the feasibility and stability of the architecture.

Phase 2: Core business-related MQ migration and refactoring. Grayscale online to focus on the impact on core business. Complete this step to complete the migration of all business logic.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

Third installment

Phase 3: Fine-tuning the structure, mainly to further disassemble and reconstruct the relevant functions, so that the functions are more cohesive and less coupled, so that the whole system can finally achieve the expected effect at the beginning of the design.

The benefits of multi-step refactoring are mainly in controlling the scope of impact, so that you can see results quickly. The limited scope of each change makes it easier to locate problems and extremely convenient to support product requirements.

2.2.3 Test Plan

Before each launch, the core passes three main tests, namely white-box testing, black-box testing, and log comparison.

a:黑盒测试,校验新老流程处理后的数据是否一致。
b:白盒测试。测试每一行代码的覆盖率,并观察新老流程数据是否一致。
c:调用接口前数据对比。在调用更新接口之处打印日志,对比新老流程调用更新接口的传参是否一致。
           

Testing is only one aspect, after going online, you need to pay attention to the operation status of the entire system and do a good job of alarming key aspects. In addition, the front-line customer service staff will be synchronized to collect whether there is a problem with user feedback, and grayscale will be carried out according to the granularity of the original consumer.

2.3 Part of the detailed design

Unified idempotent grayscale slice processing

This system is a refactoring project related to MQ consumption, and it is necessary to ensure the idempotency of consumption in each consumption module, but there are many migrated consumers, and it is extremely inconvenient to write idempotency and other related processes in each place. I did this mainly with the help of Spring's AOP capabilities.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

The main purpose is to define the specification, define idempotent annotations, uniform return values (generics), and write annotation processing classes. It is achieved by wrapping the annotation, the processing class will be standardized before processing, and the non-standard will be directly let go (equivalent to the use of annotation invalidation), and after the consumption is successful, we will store the return result through the cache, and the next time we come again, the direct consumption will be successful, without repeated processing, so as to achieve processing idempotency and reduce the situation of repeated consumption. The granularity of the idempotent cache is msgId. (The principle of grayscale control scheme is the same, so I will not repeat it here)

Respond to abnormal failures

Turning Game MQ Refactoring: A Journey of Thinking and Learning

We have made a collapse process when designing the downstream commodity update to facilitate the operation, but it also brings a problem, that is, it is possible that our business information has been updated, and the downstream processing may fail, for which we use the RocketMQ-based consumption retry component of the transfer package to achieve this. (To put it simply, if synchronous consumption fails, RocketMQ will be used to create an MQ to consume information for asynchronous processing.) If the data is not successfully updated, MQ retries are used to ensure the success of consumption.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

Alert for update failure

We also have an alarm mechanism, if the information of the product is not updated, an alarm will be sent through the enterprise WeChat to prompt the technical personnel, and the product data information will be provided, so that in the event of a special abnormal situation, the manual supplement will be made up to deal with such situations.

Data isolation

The new consumer provides a separate thread pool for processing during consumption, which is convenient for monitoring the consumption of logic processing and improving the concurrency of the overall logic processing capacity.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

Thread pool monitoring

Data monitoring

Establish a wealth of monitoring indicators and alarm notification mechanisms. Through the log query platform, data kanban, and abnormal WeCom alarm notification, we can help us observe the specific status of the new process in real time after the launch, and quickly locate the problem.

Turning Game MQ Refactoring: A Journey of Thinking and Learning

MQ generates consumption monitoring

Turning Game MQ Refactoring: A Journey of Thinking and Learning

Upstream query failure is alerted

3 Summary

Data Performance

After the project was launched, the number of calls to downstream core interfaces decreased significantly, ranging from 50% to 80%. Among them, the number of calls to update APIs has been reduced by 80%, and the number of calls to query APIs has been reduced by 50%.

Reflection and summary

  1. Clarify the reasons for system refactoring, which mainly covers two aspects. (There are problems with the existing system that need to be solved, or the existing system is restricting new business development)
  2. It's important to have a good understanding of your system. Before refactoring, information such as specific business logic and scope of influence needs to be directly evaluated and determined. Only by clarifying the original appearance of the system can we design a new technical solution according to the current situation of the system.
  3. Define a good specification with future development in mind. Good specifications and structures will play a guiding role in the iterative development of the system in the future. Guide everyone to develop along the same lines, better collaborate and support business needs.
About the author

Xu Zhifang is a back-end R&D engineer in the transfer order business

Source-WeChat public account: Zhuan Zhuan Technology

Source: https://mp.weixin.qq.com/s/bSFQVcLPFqOi5uKLxvAXUA

Read on