06 Knowledge of Nginx Used in Open Resty

06 Knowledge of NGINX Used in OpenResty #

Hello, this is Wen Ming.

Based on the introductions in the previous articles, I believe you already have a rough understanding of OpenResty. In the following sections, I will familiarize you with the two cornerstones of OpenResty: NGINX and LuaJIT. A tall building starts from the ground, and only by mastering these basic knowledge can we better learn OpenResty.

Today, I will talk about NGINX. Here, I will only introduce some basic NGINX knowledge that may be used in OpenResty. These are just a small subset of NGINX. If you need a systematic and in-depth study of NGINX, you can refer to Tao Hui’s “100 Lectures on NGINX Core Knowledge”. It is a highly praised course on Geek Time.

When it comes to configuration, in OpenResty development, we need to pay attention to the following points:

Try to configure nginx.conf as little as possible.
Avoid using multiple instructions such as “if”, “set”, “rewrite” in combination.
If possible, use Lua code instead of NGINX configuration, variables, and modules.

By doing so, we can maximize readability, maintainability, and scalability.

The following NGINX configuration is a typical negative example, where configuration directives are used as code:

location ~ ^/mobile/(web/app.htm) {
    set $type $1;
    set $orig_args $args;
    if ( $http_user_Agent ~ "(iPhone|iPad|Android)" ) {
        rewrite  ^/mobile/(.*) http://touch.foo.com/mobile/$1 last;
    }
    proxy_pass http://foo.com/$type?$orig_args;
}

This is something we should avoid when using OpenResty for development.

NGINX Configuration #

Let’s first take a look at the NGINX configuration file. NGINX controls its behavior through a configuration file, which can be seen as a simple DSL (Domain Specific Language). NGINX reads and loads the configuration into memory when the process starts. If you modify the configuration file, you need to restart or reload NGINX to take effect again. Only the commercial version of NGINX provides partial dynamic capabilities in the form of an API at runtime.

Let’s take a look at the following configuration, which is very simple and I believe most engineers can understand:

worker_processes auto;

pid logs/nginx.pid;
error_log logs/error.log notice;

worker_rlimit_nofile 65535;

events {
    worker_connections 16384;
}

http {
    server {
        listen 80;
        listen 443 ssl;

        location / {
            proxy_pass https://foo.com;
        }
    }
}

stream {
    server {
        listen 53 udp;
    }
}

However, even with a simple configuration, there are some important underlying concepts.

First, each directive has its applicable context, which is the scope of the directive in the NGINX configuration file.

The top-level context is main, which contains some directives unrelated to specific business, such as worker_processes, pid, and error_log, all belonging to the main context. In addition, contexts have a hierarchical relationship. For example, the location context is within the server context, the server context is within the http context, and the http context is within the main context.

Directives cannot run in the wrong context and NGINX checks the validity of the nginx.conf when starting. For example, if we move listen 80; from the server context to the main context, and then start the NGINX service, we will see an error like this:

"listen" directive is not allowed here ......

Second, NGINX can handle not only HTTP requests and HTTPS traffic, but also UDP and TCP traffic.

Layer 7 traffic is placed within the http context, while layer 4 traffic is placed within the stream context. In OpenResty, lua-nginx-module and stream-lua-nginx-module correspond to these two.

One thing to note here is that not all NGINX-supported features are necessarily supported by OpenResty. It depends on the version of OpenResty. The version number of OpenResty is consistent with that of NGINX, making it easy to identify. For example, in NGINX version 1.13.10 released in March 2018, support for gRPC was added, but as of April 2019, the latest version of OpenResty was 1.13.6.2, and it can be inferred that OpenResty does not yet support gRPC.

The configuration directives mentioned in the above nginx.conf are all part of the NGINX core modules ngx_core_module, ngx_http_core_module, and ngx_stream_core_module. You can click on these links to see the specific documentation.

MASTER-WORKER Pattern #

After understanding the configuration file, let’s take a look at the multi-process mode of NGINX. Here I have included a diagram to illustrate. You can see that after NGINX starts, there will be one Master process and multiple Worker processes (it can also have only one Worker process, depending on how you configure it).

Let’s first talk about the Master process. As the name suggests, it plays the role of “manager” and is not responsible for handling requests from clients. Its purpose is to manage the Worker processes, including receiving signals sent by the administrator and monitoring the running status of the Worker processes. When a Worker process abnormally exits, the Master process will restart a new Worker process.

The Worker processes are the “front-line employees” responsible for handling requests from clients. They are forked from the Master process and are independent of each other, not interfering with each other. The multi-process mode is much more advanced than Apache’s multi-threading mode, as there is no need for inter-thread locking, making it easier to debug. Even if a process crashes and exits, it will not affect the normal operation of other Worker processes.

In the context of NGINX’s Master-Worker pattern, OpenResty adds its own privileged agent process. This process does not listen on any ports and has the same privileges as the NGINX Master process. Therefore, it can perform tasks that require high privileges, such as certain write operations on local disk files.

If the privileged agent process works together with the NGINX binary hot upgrade mechanism, OpenResty can achieve the entire process of self-binary hot upgrading without relying on any external programs.

By reducing dependencies on external programs, trying to solve problems within the OpenResty process as much as possible, it not only facilitates deployment and reduces operation and maintenance costs but also reduces the probability of errors in the program. It can be said that the privileged process and the ngx.pipe functionality in OpenResty are all driven by this purpose.

Execution Phase #

The execution phase is also an important feature of NGINX that is closely related to the specific implementation of OpenResty. NGINX has 11 execution phases, which can be seen from the source code in ngx_http_core_module.h:

typedef enum {
    NGX_HTTP_POST_READ_PHASE = 0,

    NGX_HTTP_SERVER_REWRITE_PHASE,

    NGX_HTTP_FIND_CONFIG_PHASE,
    NGX_HTTP_REWRITE_PHASE,
    NGX_HTTP_POST_REWRITE_PHASE,

    NGX_HTTP_PREACCESS_PHASE,

    NGX_HTTP_ACCESS_PHASE,
    NGX_HTTP_POST_ACCESS_PHASE,

    NGX_HTTP_PRECONTENT_PHASE,

    NGX_HTTP_CONTENT_PHASE,

    NGX_HTTP_LOG_PHASE
} ngx_http_phases;

If you want to learn more about the functions of these 11 phases, you can study Mr. Tao Hui’s video course or the NGINX documentation. Here, I will not elaborate on them.

However, coincidentally, OpenResty also has 11 *_by_lua directives, and their relationship with NGINX phases is shown in the following diagram (image from the lua-nginx-module documentation):

Among them, init_by_lua is only executed when the master process is created, and init_worker_by_lua is only executed when each worker process is created. The other *_by_lua directives are triggered by terminal requests and are executed repeatedly.

Therefore, in the init_by_lua phase, we can pre-load Lua modules and common read-only data, which can save some memory by utilizing the copy-on-write (COW) feature of the operating system.

For business logic, in fact, most operations can be completed in the content_by_lua phase. However, a recommended approach is to split the logic into different functions according to different functionalities, such as the following:

set_by_lua: sets variables
rewrite_by_lua: forwards, redirects, etc.
access_by_lua: access control, permissions, etc.
content_by_lua: generates response content
header_filter_by_lua: filters and processes response headers
body_filter_by_lua: filters and processes response body
log_by_lua: logs records

Let me give you an example to illustrate the benefits of this kind of split. Let’s assume that you provide many plaintext APIs externally and now need to add custom encryption and decryption logic. Then, do you need to modify the code of all APIs?

# Plain text protocol version
location /mixed {
    content_by_lua '...';       # processes the request
}

Of course not. In fact, by leveraging the features of the phases, we only need to decrypt in the access phase and encrypt in the body filter phase. The code in the content phase does not need to be modified at all:

# Encrypted protocol version
location /mixed {
    access_by_lua '...';        # decrypts the request body
    content_by_lua '...';       # processes the request without caring about the communication protocol
    body_filter_by_lua '...';   # encrypts the response body
}

Binary Hot Upgrade #

Finally, let me briefly explain NGINX’s binary hot upgrade. We know that after modifying the NGINX configuration file, we need to restart it for the changes to take effect. However, when upgrading the NGINX version itself, it can be done through a hot upgrade. This may seem a bit counterintuitive, but considering that NGINX started from traditional static load balancing, reverse proxy, and file caching, it can be understood.

The hot upgrade is accomplished by sending the USR2 and WINCH signals to the old Master process. The former is used to start a new Master process, and the latter is used to gradually shut down the Worker processes.

After these two steps are completed, the new Master and new Worker processes are already running. However, the old Master process does not exit at this point. The reason it does not exit is simple: if you need to rollback, you can still send the HUP signal to the old Master process. Of course, if you are certain that you do not need to rollback, you can send the KILL signal to the old Master process to exit.

At this point, the binary hot upgrade is successfully completed.

Regarding binary upgrade, these are the main points I wanted to explain. If you want to learn more detailed information on this topic, you can refer to the official documentation for further study.

Extra-curricular Extension #

The author of OpenResty wrote an NGINX tutorial many years ago. If you are interested, you can study it on your own. There is a lot of content in it, so it doesn’t matter if you don’t understand it, it won’t affect your learning of OpenResty.

Conclusion #

In summary, what we need to know for OpenResty is mainly based on the foundation of Nginx, involving configuration, master-slave processes, and execution phases. For anything that can be solved with Lua code, we should try to use code instead of relying on Nginx modules and configurations. This is a shift in mindset that comes with learning OpenResty.

Lastly, I will leave you with an open-ended question. Nginx officially supports NJS, which allows you to write the control logic of Nginx using JavaScript, similar to the approach of OpenResty. How do you feel about this?

Feel free to leave a comment and share your thoughts with me. Also, please feel free to forward this article to your colleagues and friends.