18 Error Handling How to Design a Scientific Set of Error Codes

18 Error Handling How to Design a Scientific Set of Error Codes #

Hello, I’m Kong Lingfei. Today, let’s talk about how to design error codes for business.

In modern software architecture, many systems expose RESTful API interfaces externally, while internal system communication adopts the RPC protocol. RESTful API interfaces have inherent advantages such as being standardized, easy to debug, and easy to understand. Therefore, they are often used as communication specifications directly facing users.

Since it is directly facing users, the first requirement is for the format of the returned messages to be standardized. Secondly, if an interface encounters an error, it should provide users with some useful error information, usually including a unique error Code (to uniquely identify an error instance) and a Message (to display the error information). This requires us to design a set of standardized and scientific error codes.

In this lesson, I will introduce in detail how to design a set of standardized and scientific error codes. In the next lesson, I will also explain how to provide an “errors” package to support the error codes we design.

Expected Functionality of Error Code Implementation #

In order to design a set of error codes, we need to first determine our requirements.

RESTful API is a series of API development specifications based on the HTTP protocol. After an HTTP request is made, it is important for the client to be aware of whether the API request was successful or not so that it can determine the next steps to take.

To provide the best user experience, it is necessary to have a well-implemented error code system. Here I will introduce the expected functionality when designing error codes.

The first functionality is to have a business code identifier.

Since HTTP status codes are limited and are related to the HTTP Transport layer, we want to have our own error code system. On one hand, this allows us to extend the codes as needed, and on the other hand, it enables precise identification of specific errors. In addition, because codes are typically computer-friendly 10-digit decimal integers, computers can conveniently perform branch processing based on these codes. Of course, there should also be certain rules for business codes so that we can quickly identify which category of error they belong to.

The second functionality is to display different error messages for external and internal use for security purposes.

When developing a system for external use, it is necessary to have mechanisms that inform users about the errors that occur and, ideally, provide some help documentation. However, it is not feasible to expose all errors to external users as this is unnecessary and insecure. Therefore, we also need mechanisms to obtain more detailed internal error information, which may include sensitive data that should not be displayed externally but can assist us in problem identification.

Therefore, the error codes we design should be standardized, allowing the client to easily determine whether the HTTP request was successful and providing business codes and error messages.

Common Error Code Design Patterns #

In business, there are generally three ways to implement error codes. I will explain each one using an example of a request failure due to a user account not being found:

The first method is to always return 200 http status code, regardless of whether the request is successful or not. The error message indicating that the user account was not found is included in the HTTP Body.

For example, the error code design of the Facebook API always returns a 200 http status code:

{
  "error": {
    "message": "Syntax error \"Field picture specified more than once. This is only possible before version 2.1\" at character 23: id,name,picture,picture",
    "type": "OAuthException",
    "code": 2500,
    "fbtrace_id": "xxxxxxxxxxx"
  }
}

There is some justification for using a fixed 200 http status code in this way. For instance, the HTTP Code usually represents the status of the HTTP Transport layer. When we receive an HTTP request and return a response, the HTTP Transport layer is successful, so from that perspective, it is reasonable to have the HTTP Status fixed at 200.

However, the drawback of this approach is also apparent: for each request, we need to parse the HTTP Body, extracting the error code and error message from it. In practice, in most cases, for successful requests, we either forward them directly or parse them into a particular data structure; for failed requests, we also hope to have a more direct way to perceive the failure. This approach has a certain impact on performance and is not user-friendly. Therefore, I do not recommend using this method.

The second method is to return an http 404 Not Found error code and a simple error message in the Body.

For example, the error design of the Twitter API returns the appropriate HTTP Code based on the error type and provides an error message and custom business code in the Body:

HTTP/1.1 400 Bad Request
x-connection-hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
set-cookie: guest_id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Date: Thu, 01 Jun 2017 03:04:23 GMT
Content-Length: 62
x-response-time: 5
strict-transport-security: max-age=631138519
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Server: tsa_b

{"errors":[{"code":215,"message":"Bad Authentication data."}]}

This method is better than the first one since the client can directly perceive the request failure through the http status code and receive some error information for reference. However, with only this information, it is still not accurate enough to locate and resolve the problem.

The third method is to return an http 404 Not Found error code and detailed error information in the Body.

For example, the error design of Microsoft Bing API returns the appropriate HTTP Code based on the error type and provides detailed error information in the Body:

HTTP/1.1 400
Date: Thu, 01 Jun 2017 03:40:55 GMT
Content-Length: 276
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/10.0
X-Content-Type-Options: nosniff

{"SearchResponse":{"Version":"2.2","Query":{"SearchTerms":"api error codes"},"Errors":[{"Code":1001,"Message":"Required parameter is missing.","Parameter":"SearchRequest.AppId","HelpUrl":"http\u003a\u002f\u002fmsdn.microsoft.com\u002fen-us\u002flibrary\u002fdd251042.aspx"}]}}

This is the method I would recommend, as it allows the client to easily know that the request has failed through the http status code, as well as understand where the error occurred and how to solve the problem based on the returned information. At the same time, a machine-friendly business code is returned, which can be further used by the program for judging and handling when necessary.

Suggestions for Error Code Design #

Based on what we discussed earlier, we can summarize a set of excellent error code design ideas:

Business codes should be different from http status codes. Business codes need to follow certain rules that allow us to identify the type of error.
When a request fails, the error can be identified directly through the http status code.
When a request fails, detailed information should be returned. This typically includes three categories of information: business code, error message, and reference documentation (optional).
The error message returned should be safe to display directly to users, meaning it should not contain sensitive information. At the same time, there should be more detailed error information available internally for debugging purposes.
The format of the returned data should be fixed and standardized.
Error messages should be concise and provide useful information.

There are two additional features that need to be implemented: designing business codes and setting the http status code when a request fails.

Next, I will provide a detailed explanation on how to implement these two features.

Business Code Design #

To solve the problem of how to design business codes, let’s first understand why business codes are introduced.

In actual development, introducing business codes has the following benefits:

It allows for easy problem and code line identification (understanding the meaning of error codes, using grep to locate the line where the error code is used, unique identification of error types).
Error codes contain certain information, allowing for determination of error levels, error modules, and specific error information.
The net/http package used in Go’s HTTP server development only includes 60 error codes, which are mostly related to HTTP requests. In a large-scale system, these error codes are not enough, and they are not relevant to specific business requirements. By introducing business codes, these issues can be resolved.
In the process of business development, it may be necessary to determine the type of error in order to perform specific logic handling. This can be easily achieved through customized error codes, for example:

if err == code.ErrBind { … }

Please note that business codes can be an integer, an integer string, or a character string. They serve as the unique identification of the error.

By studying the open APIs of Tencent Cloud, Alibaba Cloud, and Sina, I found that Sina’s API code design is more reasonable. Therefore, based on Sina’s code design, I have summarized my recommended Code Design Specification: only use digits, different positions represent different services and modules.

Error code description: 100101

10: Service.
01: Module under a service.
01: Sequence number of the error code under a module. Each module can register up to 100 errors.

Based on 100101, it can be known that this error is the service A, under the database module, and it indicates a record not found error.

You may ask: with this design, a maximum of 100 errors can be registered under each module. Isn’t that too few? In my opinion, if the number of error codes exceeds 100 in a module, it either indicates that the module is too large and should be split, or that the error code design is unreasonable and lacks sharing, necessitating a redesign.

How to Set HTTP Status Codes #

The Go net/http package provides 60 error codes, roughly divided into the following 5 categories:

1XX - (Informational) Request received, continuing process.
2XX - (Success) The request was successfully received, understood, and accepted.
3XX - (Redirection) Further action needs to be taken in order to complete the request. Typically, these status codes are used for redirection.
4XX - (Client Error) The request contains syntax errors or cannot be fulfilled. Typically, these status codes indicate a client error and require client intervention.
5XX - (Server Error) The server failed to fulfill an apparently valid request. These status codes are the result of server failures.

As you can see, there are many HTTP status codes, and mapping errors to each code can lead to many issues. For example, developers may have difficulty determining which HTTP status code an error belongs to, which can ultimately lead to errors or mismatched HTTP status codes. Additionally, clients may have difficulty handling so many HTTP error codes.

Therefore, it is recommended to limit the number of HTTP status codes. Essentially, only 3 HTTP codes are needed:

200 - Indicates a successful execution of the request.
400 - Indicates a client-side issue.
500 - Indicates a server-side issue.

If you feel that these 3 error codes are not enough, you can add the following 3 error codes at most:

401 - Indicates authentication failure.
403 - Indicates authorization failure.
404 - Indicates resource not found. The resource can refer to a URL or a RESTful resource.

By controlling the number of error codes to a reasonable level, clients can easily handle and determine the errors, and developers can easily perform error code mappings.

IAM Project Error Code Design Specification #

Next, let’s take a look at how the error codes are designed in the IAM project.

Code Design Specification #

First, let’s take a look at the code design specification for the IAM project’s business codes. You can refer to the internal/pkg/code directory for specific implementations. The error code design specification for the IAM project conforms to the error code design principles and specifications mentioned above. The specific specifications are as follows.

The code starts from 100001, and codes below 1000 are reserved for github.com/marmotedu/errors.

Error Code Explanation: 100001

Service and Module Explanation

- Common: Describes errors that are applicable to all services, increasing reusability and avoiding reinventing the wheel.

Error Message Specification Explanation

For errors exposed to the outside world, the first letter should be capitalized and there should be no period at the end.
Error messages exposed to the outside world should be concise and accurately describe the problem.
Error explanations exposed to the outside world should focus on “how to do it” instead of “what went wrong”.

Here, it is important to note that error messages are exposed directly to the users and should not contain sensitive information.

Explanation of IAM API Interface Return Values #

If the code field exists in the returned result, it indicates a failed API interface call. For example:

{
  "code": 100101,
  "message": "Database error",
  "reference": "https://github.com/marmotedu/iam/tree/master/docs/guide/zh-CN/faq/iam-apiserver"
}

In the above example, code represents the error code, and message represents the specific information about the error. Each error also corresponds to an HTTP status code. For example, the above error code corresponds to HTTP status code 500 (Internal Server Error). Additionally, when an error occurs, the reference field is also returned, which contains a link to the documentation that can help resolve the error.

I have created a table for you to see the error codes that the IAM system supports:

Summary #

API interfaces exposed externally need to have a standardized and scientific error code. Currently, there are roughly three design approaches for error codes in the industry. Let me explain using an example where a request fails because the user account is not found:

Regardless of whether the request is successful or fails, always return 200 http status code and include the error information of the user account not found in the HTTP Body.
Return the http 404 Not Found error code and include a simple error message in the Body.
Return the http 404 Not Found error code and include a detailed error message in the Body.

In this talk, I refer to these three error code designs and provide my own suggestions for error code design. The error code includes both an HTTP Code and a business Code, and the business code will be mapped to an HTTP Code. The error code also exposes two types of error information to the outside world: one directly exposed to the user, without sensitive information; and the other for internal development to view and locate issues. This error code also supports returning reference documents to be displayed to the user when an error occurs, providing users with a view to resolve the issue.

I suggest paying particular attention to the design specification for the code I summarized: pure digital representation, with different parts representing different services and modules.

For example, the error code 100101, where 10 represents the service; the middle 01 represents a specific module under a service; and the last 01 represents the error code number under the module. Each module can register 100 errors.

Practice Exercise #

Since the error codes are in compliance with the specifications, please consider if there is a Low Code way to automatically generate error code documentation based on the error code specifications.
Think about any other more scientific error code designs you have encountered. If you have any, feel free to share and discuss in the comments section.

Feel free to share and discuss with me in the comments section. See you in the next lesson.