48 Three Step Melody for Building Microservice API Gateways Part Two

48 Three-Step Melody for Building Microservice API Gateways - Part Two #

Hello, I’m Wen Ming.

After understanding the core components and abstract concepts of the microservice API gateway, we need to start selecting the technology and get our hands dirty to implement it. Today, let’s take a look at the technology selection issues for the four core components: routing, plugins, schema, and storage.

Storage #

As I mentioned in the previous lesson, storage is a critical underlying component that affects core issues such as how configuration is synchronized, how clusters scale, and how high availability is ensured. Therefore, we place it at the beginning of the selection process.

Let’s first take a look at where the existing API gateway stores data. Kong stores data in PostgreSQL or Cassandra, while Orange, which is also based on OpenResty, stores data in MySQL. However, this choice has many flaws.

First, storage requires a separate high availability solution. Although PostgreSQL and MySQL databases have their own high availability solutions, you still need a DBA and machine resources, making it difficult to achieve quick switching in the event of a failure.

Second, it can only poll the database for configuration changes and cannot push them. This not only increases the consumption of database resources but also greatly reduces the real-time nature of the changes.

Third, you need to maintain historical versions and consider rollback and upgrades. If a user publishes a change, there may be a rollback operation later on, and you will need to do a diff between the two versions at the code level to facilitate rollback of the configuration. At the same time, during the system upgrade, the table structure in the database may also be modified, so the code level needs to consider compatibility between old and new versions as well as data upgrades.

Fourth, it increases the complexity of the code. In addition to implementing the functionality of the gateway, you also need to patch the code at the code level for the first three flaws, which obviously reduces the readability of the code.

Fifth, it increases the difficulty of deployment and operation. Deploying and maintaining a relational database is not a simple task, and if it is a database cluster, it becomes even more complex. In addition, we cannot achieve quick scaling and resizing.

In view of this situation, how should we choose?

Why don’t we go back to the original requirements of the API gateway? The stored information here is simple configuration information, such as URI, plugin parameters, upstream addresses, etc., and does not involve complex join operations or require strict transaction guarantees. Obviously, in this case, using a relational database is like “using a sledgehammer to crack a nut,” right?

In fact, based on the principle of minimalism and being closer to Kubernetes, etcd is a perfect choice:

The configuration data of the API gateway does not change frequently per second, so the performance of etcd is sufficient.
In terms of clusters and dynamic scaling, etcd has inherent advantages.
etcd also has a watch interface, so there is no need for polling to retrieve changes.

In fact, there is one more reason why we can confidently choose etcd—it is already the default choice for storing configurations in the Kubernetes ecosystem, which has been tested in much more complex scenarios than an API gateway.

Routing #

Routing is also a very important technology choice, where all requests are filtered by the router to identify the list of plugins that need to be loaded. After running each plugin, the request is then forwarded to the specified upstream. However, considering that there may be a large number of routing rules, we need to focus on evaluating the time complexity of the algorithms when selecting the routing technology.

Let’s first see what ready-to-use routers are available under OpenResty. Following our tradition, let’s search through the awesome-resty project one by one, where we can find a dedicated section for Routing Libraries:

    •    lua-resty-route — A URL routing library for OpenResty supporting multiple route matchers, middleware, and HTTP and WebSockets handlers to mention a few of its features
    •    router.lua — A barebones router for Lua, it matches URLs and executes Lua functions
    •    lua-resty-r3 — libr3 OpenResty implementation, libr3 is a high-performance path dispatching library. It compiles your route paths into a prefix tree (trie). By using the constructed prefix trie in the start-up time, you may dispatch your routes with efficiency
    •    lua-resty-libr3 — High-performance path dispatching library based on libr3 for OpenResty

You can see that there are implementations for four routing libraries listed here. The first two routers are implemented purely in Lua, which are relatively simple, and thus lack many features, making them unable to meet our requirements.

The next two libraries are actually based on the C library libr3, and they wrap it using FFI (Foreign Function Interface). libr3 itself utilizes a prefix tree. This algorithm has a time complexity of O(K), which is dependent on the length of the matching data, not the number N of rules stored.

However, libr3 also has its drawbacks. Its matching rules are different from the familiar Nginx location rules, and it does not support callbacks. Thus, we cannot set routing conditions based on request headers, cookies, or Nginx variables, which is not flexible enough for API gateway scenarios.

Although our efforts to find usable routing libraries from awesome-resty did not succeed, the implementation of libr3 has pointed us in a new direction: using C to implement prefix trees and FFI wrappers, which should allow us to approach the optimal solution in terms of time complexity and code performance.

Coincidentally, the author of Redis has open-sourced a radix tree, which is the compressed version of prefix trees in C implementation. Following this lead, we can also find an FFI wrapper library for rax that is available in OpenResty called lua-resty-radixtree. Here is an example code snippet:

local radix = require("resty.radixtree")
local rx = radix.new({
    {
        path = "/aa",
        host = "foo.com",
        method = {"GET", "POST"},
        remote_addr = "127.0.0.1",
    },
    {
        path = "/bb*",
        host = {"*.bar.com", "gloo.com"},
        method = {"GET", "POST", "PUT"},
        remote_addr = "fe80:fe80::/64",
        vars = {"arg_name", "jack"},
    }
})

ngx.say(rx:match("/aa", {host = "foo.com",
                     method = "GET",
                     remote_addr = "127.0.0.1"
                    }))

From this example, you can see that lua-resty-radixtree supports routing based on multiple dimensions such as URI, host, HTTP method, HTTP headers, Nginx variables, IP addresses, etc. Additionally, the radix tree has a time complexity of O(K), which is much more efficient than the commonly used “traversal + hash cache” approach in existing API gateways.

schema #

Choosing a schema is actually much easier, as we have mentioned before, lua-rapidjson is a very good choice. There is no need for you to write one yourself, as json schema is already powerful enough. Here is a simple example:

local schema = {
    type = "object",
    properties = {
        count = {type = "integer", minimum = 0},
        time_window = {type = "integer", minimum = 0},
        key = {type = "string", enum = {"remote_addr", "server_addr"}},
        rejected_code = {type = "integer", minimum = 200, maximum = 600},
    },
    additionalProperties = false,
    required = {"count", "time_window", "key", "rejected_code"},
}

Plugins #

With the foundation of storage, routing, and schema mentioned above, it becomes clear how to implement upper-level plugins. There are no ready-to-use open-source libraries for plugins, so we need to implement them ourselves. When designing plugins, there are three aspects we need to consider.

First is how to mount the plugins. We hope that plugins can be mounted in the rewrite, access, header filter, body filter, and log phases. We can expose these phases in the Nginx configuration file and provide interfaces for plugin implementations.

Second is how to obtain configuration changes. Since we are not restricted by a relational database, we can use etcd’s watch feature to achieve configuration change notifications. This makes the overall framework’s code logic more clear and understandable.

Lastly is the priority of plugins. For example, which should be executed first, authentication or rate limiting plugins? When there is a conflict between a plugin bound to a route and a plugin bound to a service, which one should take precedence? These are all considerations we need to take into account.

After clarifying these three issues, we can have a flowchart of the internal workflow of a plugin:

Architecture #

Naturally, once these key components of the microservice API gateway are determined, the processing flow of user requests is also determined. Here, I have drawn a diagram to illustrate this process:

From this diagram, we can see that when a user request enters the API gateway,

First, it will match the routing rules based on the request method, URI, host, request headers, etc. If a routing rule is matched, the corresponding plugin list will be retrieved from etcd.
Then, the intersection between the retrieved plugin list and the locally enabled plugin list is obtained, resulting in the final list of plugins that can run.
Next, the plugins are executed one by one according to their priority.
Finally, the request is sent to the upstream based on the health checks and load balancing algorithms.

Once the architecture design is complete, we can confidently proceed to write the specific code. This is similar to building a house - only after having the blueprint and a solid foundation can we proceed with the specific bricklaying work.

Conclusion #

In fact, through the study of these two lessons, we have already completed the most important tasks of product positioning and technology selection. They are both more critical than the concrete coding implementation. I hope you can consider and choose them more carefully.

So, have you used an API gateway in your actual work? How does your company select an API gateway? Feel free to leave a message and share your experiences and gains with me. You are also welcome to share this article with others and communicate and improve together.