laitimes

A little bit about the error code

author:Flash Gene

1. What is an error code?

In general, error codes are divided into external error codes and internal system error codes.

A little bit about the error code

External error codes are often used in some open interfaces, such as HTTP interfaces, RPC interfaces, etc., to give more friendly error prompts and error descriptions to the upstream in the form of error codes.

Error codes within the system exist between closely related microservices, or upstream and downstream of the program. If the error is caused by a business error or system unavailability, the developer can locate the error code or error message. Or make a judgment on the business logic based on the error code in the upstream to ensure the integrity of the overall process.

In summary, the function of error codes is to point out the cause of the error, quickly locate the problem, guide the upstream system to make correct business judgments, and guide users to perform correct operations. Therefore, it is necessary to build a common and well-structured error code system.

So how do you define an error code that is both external and internal? So far, there is no better plan or specification in the industry. Here is the integration of new and old payment systems, and gradually explored.

Second, the status quo of the payment system error code

Due to historical reasons, the company currently has two wallet systems in operation. In order to improve the availability and performance of the wallet system, the two sets of wallet systems were migrated, integrated, upgraded and optimized. However, in the process of migrating and merging the two wallet systems, it was found that there were conflicts, inconsistent types, confusing definitions, and high arbitrariness before the two wallet systems and the existing optimized error codes.

A little bit about the error code

Because one of the wallet systems was previously developed and maintained by other teams, and later refactored, there are two sets of error code definition rules in the new wallet system. From the perspective of code, at the beginning of the design of the new and old wallet systems, a more appropriate specification was customized for the definition of error codes, but in the later development and maintenance, fewer and fewer developers followed the specification to define error codes, which eventually led to the phenomenon of error codes determined by error descriptions. At present, at the beginning of the design of the error code in the new system, the problem of error code conflict caused by future integration was not considered, resulting in the problem that some error codes overlap and the semantics are very different, so it is urgent to redefine the error code specification.

Wallet system error code status:

Wallet error code definition: 6-digit error code. The first position indicates the error type, which distinguishes between system level, checksum, and RPC service call error. In this design method, the first bit is clear, and the last 5 bits are reserved error codes, and the length is sufficient to meet the increase of subsequent error codes; Disadvantages: If there are too few types of error codes, error codes with the same semantics will appear, corresponding to different first digits. The error code of the later reconstructed version modifies the original error code number and modifies the last 7 digits. Resulting in the interface layer of the system, the 6-bit error code, the 7-bit error code is confused, and the error code is not classified, and all are in the back row.

Wallet 2 error code definition: Provide error code tools, which stipulate the error code commonly used by most wallets; Then build on that and add to it. Advantages: The classification of error codes is clear, the structure is clear, and the disadvantages: It is not easy to locate specific errors according to error codes.

To put it simply, there are the following problems with the error code in the wallet system

1. The definition of the error code field type is inconsistent. Some define numeric types, while others define string types

2. Error code coincidence problem. The same error code has different semantics in different systems

A little bit about the error code

3. Investigate the definition of interfaces in the industry

In view of the pain points of error codes in the current payment system, such as whether the error code should be a numeric type or a string type, and what is the length naming format, the API specifications and interface definitions of many large manufacturers are investigated to explore the error code specifications applicable to the payment system.

  1. WeChat Pay

a. Refer to WeChat Pay v2 interface

微信支付v2接口: 协议:http, content-type: text/xml

Reference link: https://pay.weixin.qq.com/wiki/doc/api/jsapi.php?chapter=9_1

原格式是xml,为了更加直观,这里先加工为json格式
{
    "return_code":"SUCCESS",  //  SUCCESS/FAIL 此字段是通信标识,非交易标识,交易是否成功需要查看result_code来判断
    "return_msg":"OK",
    "result_code":"SUCCESS", // SUCCESS/FAIL  标识业务成功失败
    "err_code":"SYSTEMERROR", // 当result_code为FAIL的时候,该值返回业务错误码
    "err_code_des":"系统错误"
}           

The WeChat Pay error code structure is a three-level structure:

Level 1, common error code (gateway layer). This layer only returns communication success failure information

Level 2: Service error code (total): indicates whether the service is successfully processed

Level 3: Specific service error codes

Examples of service error codes:

error_code err_code_des
NOAUTH Merchants do not have this API permission
INVALID_REQUEST The parameter is incorrect
NOTENOUGH The balance is insufficient

b. Refer to WeChat Pay v3 interface

微信支付v3接口: 协议:http, content-type: application/json

Reference Links:

err_code http_code err_msg
USERPAYING 202 The user is in the process of making a payment
OUT_TRADE_NO_USED 403 Duplicate merchant order numbers
ORDERNOTEXIST 404 The order does not exist

According to the interface document of WeChat v3, WeChat Pay will return a corresponding httpcode at the same time as the business error. If the status code of the http_code is between [200,300], the request is considered to be a valid return. If the parameter is greater than 300, the interface must be abnormal. WeChat Pay V3 SDK encapsulates the processing of http_code and err_code.

Exposing the HTTP interface to the outside world not only throws business errors, but also throws the same semantic HTTPCODE, which requires developers to clarify the HTTPCODE semantics. The disadvantage of this is that the learning cost is high, and it depends on the developer's proficiency in httpcode, and there may be inconsistencies between the semantics of httpcode and the semantics of business errors.

2. Alipay error code definition

Reference link: https://opendocs.alipay.com/open/common/105806

Service error codes for different services: An example of an interface https://opendocs.alipay.com/apis#%E4%B8%9A%E5%8A%A1%E9%94%99%E8%AF%AF%E7%A0%81

sub_code. sub_msg These two parameters identify the business error code and business error information returned by Alipay;

{
    "code":"",//网关返回码
    "msg":"",//网关返回码
    "sub_code":"ACQ.INVALID_PARAMETER",
    "sub_msg":"参数无效"
}           

Alipay's error code definition is similar to WeChat Pay's V2 interface definition style.

The Alipay error code structure is divided into two levels: Level 1: Gateway, Level 2: Business Error Code.

To request the Alipay interface, first pass through the Alipay gateway system, and the gateway system will perform functions such as signature verification, encryption and decryption, and flow control. After the gateway is successfully verified, it is handed over to the downstream business system, sub_code all of which are semantically clear error codes and error descriptions.

3. Google API Specification

Reference Links:

In Google's error code definition, the structure of the return code is defined as:

message Status {
  // A simple error code that can be easily handled by the client. The
  // actual error code is defined by `google.rpc.Code`.
  int32 code = 1;
  // A developer-facing human-readable error message in English. It should
  // both explain the error and offer an actionable resolution to it.
  string message = 2;
  // Additional error information that the client code can use to handle
  // the error, such as retry delay or a help link.
  repeated google.protobuf.Any details = 3;
}           

where code: is the error code, message: is the specific error message, and detail is the recommended action to be taken by the caller according to the error.

Google's definition of error codes is relatively concise. A large class is assigned a code; There are no more than one code for multiple similar error types.

Details in the Google specification: Indicates the specific cause of the error, defined by:

The retryable information is defined in detail, which indicates that the error code can be returned and can be retried, and the recommended retry delay time is given, and the QuotaFailure information indicates that the quota is wrong and the rate limit is exceeded. badRequest can give details of why such an error is reported, and so on, and the developer can react correctly based on such a detailed error return code.

4. Weibo Specification

Reference APIs:

{
    "request" : "/statuses/home_timeline.json",
    "error_code" : "20502",
    "error" : "Need you follow uid."
}           

20502 Composition of error code: 1 digit error level number (system, service) + 2 digits of service module (such as gateway, microblog, evaluation, private message more like service identifier) + 2 digits of error code (custom error code)

2 05 2
Service-level error (1 is a system-level error) Service module code Specific error codes

Weibo's error codes have a clear structural semantics. The service system is identified and displayed in the error code. This operation is a little similar to the composition of Alipay's error code. The first position identifies the service-level and system-level fields, and it is not sure whether it means that the error is thrown by the gateway or whether it is clear that there are several system-level errors, and it is thrown by classification.

5. ALIBABA'S JAVA TECHNICAL MANUAL

Alibaba Specification

The first point: Explain the characteristics of error codes: Be simple and clear;

Point 2: The error code is best defined as a string of type: source + error number (so that the value of the error code can carry more information)

Point 3: Avoid adding error codes at will, and avoid directly exposing error codes to the user

To sum up, according to the external documents of these companies surveyed, there is no unified specification for the definition of error codes in the industry. But the general design idea is as follows:

a. The error code type is a string type

b. If the system consists of gateways → internal services, the error code is divided into two levels

c. The error code identifies the service from which the error was thrown.

d. Error codes can be abstracted into two types: public error codes and business error codes

4. Reflections and conclusions

Combined with the definition of industry error codes in the above survey, as well as the current status of the wallet system. Since the payment system is at the most basic layer of the overall business process, it provides RPC capabilities such as payment and payment, and there is no possibility of directly exposing the http interface to the outside world. Therefore, the three-layer error code structure of WeChat payment and Alipay payment is not suitable for the wallet system.

Drawing on the API of the above survey, the error code is defined as a string type, which is more suitable for business scenarios and is convenient for later business expansion. The unified specification for the definition of error code scenarios in the Google specification reduces the possibility of developers arbitrarily defining new error codes to a certain extent. Therefore, the error_code scenario definition refers to the google-api specification

Error code error_code: string. The first N digits are the domain identifiers of the current business, and the advantages are: It is easy to distinguish other service error codes, and if the service is split and merged due to business expansion or business shrinkage in the later stage, the error code can still maintain the current setting.

Extracting common common errors:

Refer to Google specifications for aggregation: https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto

Common errors in RPC interfaces error_code
Internal errors 115
The request is not supported 112
Status error 110
Requests are frequent 109
There are no permission class errors 108
Permission verification error 107
A class error already exists 106
There are no class errors 105
Timeout class error 104
The parameter is incorrect 103

Unknown error

(For example, if there is an error in calling the downstream interface, you can throw this exception)

101

For other service error codes, you can use 200~999 error segments to customize the settings. However, if the error semantics hit the above errors, you need to select the above error codes first.

5. Definition of RPC error code structure

exception RpcError{
    1:required string err_code;
    2:required string err_desc;
}           
  1. Error Code Definition:

Composition: service + error code type + custom service code; The custom service code is customized by the system.

Identity processing business Error Code Type (3) Custom Service Code(2)
参数校验失败:
{
    "err_code":"WORDER.10501",
    "err_desc":"交易不存在"
}
WORDER:表示当前错误发生时,所处理的业务标识
105: 不存在
01: 交易不存在
02: 用户不存在
03: 订单不存在
...           

2. err_desc: Error message

Error Messages Developers can quickly locate the problem.

3. Mistakes are to be thrown

If an error occurs in the processing of an interface, wrap the appropriate error code and error description and throw the error instead of wrapping the error in the return parameters of the interface. Since the company uses the thrift protocol -http and the monitoring alarm strongly relies on the httpCode, the error can be directly thrown, which can make the monitoring more effective in monitoring the RPC interface and avoid processing errors, but the return of httpcode is 200.

A little bit about the error code

6. Httpcode design of gateway type

  1. Basics of httpcode
Error codes Represents meaning
2xx success
3xx redirect
4xx Errors caused by the client
5xx Errors caused by servers

2. HTTP service returns

{
    "code":"0", // 成功:0  失败:返回对应错误码
    "message":"",
    "data":{ //接口实际处理结果
 
    },
    "pagination":{
        "is_end":false,
        "is_first":true,
        "offset":20,
        "limit":20,
        "total":1000
    }
}           

3 Error message conversion

http interface, if the interface returns success. The httpcode error code returns 200; If it fails, you can return the above httpcode as needed.

HTTP interfaces are generally divided into two types, the first one: internal HTTP interface; The second is to interact with the front-end. Internal HTTP interface, error code can refer to RPC interface; For external interfaces, it is best to convert the error descriptions displayed by users at the HTTP layer instead of directly exposing the internal error descriptions.

7. Prospects

When the system is migrated, refactored, and optimized, it is often difficult to maintain the system due to the unreasonable design of the original system and the subsequent rush to pile up requirements. This time, we mainly focus on the problem of confusion in the definition of error codes encountered in the migration process, and put forward research and my own thoughts. The target can unify the specification of the error code format and the error definition specification of RPC and HTTP interfaces. In the subsequent system migration process, the standardized error definition is used to reduce the complexity of the upstream system's understanding of the error, and to a certain extent, the O&M efficiency can be reduced.

In the follow-up planning, more technical ideas can be made for the use of standardized error codes, such as error integration SDK, which includes error code definitions, error point reporting, and error monitoring dashboards.

Author: Let me stay

Source: https://zhuanlan.zhihu.com/p/411726319