15 Differences Between Open Resty and Other Development Platforms

15 Differences between OpenResty and Other Development Platforms #

Hello, I’m Wen Ming.

In the previous module, you have already learned about the two cornerstones of OpenResty: NGINX and LuaJIT. I believe you are eager to start learning the API provided by OpenResty, right?

However, before you rush into it, you need to spend a little more time to understand the principles and basic concepts of OpenResty.

Principle #

In the previous content about LuaJIT, you’ve seen the following architecture diagram:

Here, I will explain it in more detail.

Both the master and worker processes in OpenResty contain a LuaJIT VM. All coroutines within the same process share this VM and execute Lua code within it.

At any given time, each worker process can only handle one user request, which means only one coroutine is running. At this point, you may have a question: since NGINX supports C10K (tens of thousands of concurrent connections), doesn’t it need to handle ten thousand requests simultaneously?

Of course not. NGINX actually reduces waiting and spinning through epoll event-driven mechanism in order to maximize the use of CPU resources for processing user requests. After all, only when a single request is processed fast enough can the overall performance reach a high level. If a multi-threaded model is used, where each request corresponds to a thread, resources can easily be depleted in a scenario with C10K connections.

At the OpenResty level, Lua coroutines work in coordination with NGINX’s event mechanism. If I/O operations occur in Lua code, such as querying a MySQL database, the Lua coroutine will first call yield to pause itself and then register a callback in NGINX. After the I/O operation is completed (or times out or encounters an error), NGINX will resume the Lua coroutine by calling back to resume. This completes the coordination between Lua coroutines and NGINX’s event-driven mechanism, avoiding the need to write callbacks in Lua code.

Let’s take a look at the diagram below, which describes this whole process. In the diagram, both lua_yield and lua_resume are Lua-provided lua_CFunctions.

On the other hand, if there are no I/O or sleep operations in the Lua code, such as only intensive encryption/decryption operations, the Lua coroutines will occupy the LuaJIT VM until the entire request is processed.

Next, I will provide a segment of the source code for ngx.sleep to help you understand this point more clearly. This code is located in ngx_http_lua_sleep.c, and you can find it in the src directory of the lua-nginx-module project on GitHub.

In ngx_http_lua_sleep.c, we can see the specific implementation of the sleep function. You need to register the ngx.sleep Lua API by first using the C function ngx_http_lua_ngx_sleep as follows:

void
ngx_http_lua_inject_sleep_api(lua_State *L)
{
     lua_pushcfunction(L, ngx_http_lua_ngx_sleep);
     lua_setfield(L, -2, "sleep");
}

The following is the main function for sleep, where I have only extracted a few lines of the key code:

static int ngx_http_lua_ngx_sleep(lua_State *L)
{
    coctx->sleep.handler = ngx_http_lua_sleep_handler;
    ngx_add_timer(&coctx->sleep, (ngx_msec_t) delay);
    return lua_yield(L, 0);
}

You can see that:

First, the ngx_http_lua_sleep_handler callback function is added here.
Then, the ngx_add_timer interface provided by NGINX is called to add a timer to NGINX’s event loop.
Finally, lua_yield is used to pause the Lua coroutine and hand over control to NGINX’s event loop.

After the sleep operation is completed, the ngx_http_lua_sleep_handler callback function is triggered. Inside it, ngx_http_lua_sleep_resume is called, and in the end, lua_resume is used to awaken the Lua coroutine. You can search for the specific calling process in the code yourself, and I won’t go into further detail here.

ngx.sleep is just a simple example, but by analyzing it, you can understand the basic principles of the lua-nginx-module module.

Basic Concepts #

After analyzing the principle, let’s review the important concepts of “phase” and “non-blocking” in OpenResty.

Like NGINX, OpenResty also has the concept of phases, and each phase has its own different purpose:

set_by_lua, used to set variables;
rewrite_by_lua, used for forwarding, redirection, etc.;
access_by_lua, used for access control, permissions, etc.;
content_by_lua, used to generate response content;
header_filter_by_lua, used for response header filtering;
body_filter_by_lua, used for response body filtering;
log_by_lua, used for logging.

Of course, if your code logic is not complicated, you can execute it in the rewrite or content phase.

However, it is worth noting that there are limitations on the usage of OpenResty’s API in each phase. Each API has a corresponding list of phases it can be used in. If you use it beyond the permitted range, an error will occur. This is quite different from other programming languages.

For example, let’s take ngx.sleep as an example. By checking the documentation, I know that it can only be used in the following contexts, except for the log phase:

context: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*, ssl_certificate_by_lua*, ssl_session_fetch_by_lua*_

And if you use sleep in the log phase, which it does not support:

location / {
    log_by_lua_block {
        ngx.sleep(1)
    }
}

An error message of “error” level will appear in the NGINX error log:

[error] 62666#0: *6 failed to run log_by_lua*: log_by_lua(nginx.conf:14):2: API disabled in the context of log_by_lua*
stack traceback:
    [C]: in function 'sleep'

Therefore, before using an API, make sure to consult the documentation to determine whether it can be used in the context of your code.

After reviewing the concept of phases, let’s take a look at non-blocking. First of all, it should be clear that all the APIs provided by OpenResty are non-blocking.

I will continue to use the requirement of waiting for 1 second with sleep as an example to illustrate. If you want to implement it in Lua, you need to do the following:

function sleep(s)
   local ntime = os.time() + s
   repeat until os.time() > ntime
end

Because standard Lua does not have a direct sleep function, I use a loop here to continuously check whether the specified time has been reached. This implementation is blocking. During the one second of sleep, Lua is doing useless work, while other requests that need to be processed can only wait foolishly.

However, if you use ngx.sleep(1) to implement it, according to the analysis of the source code we discussed earlier, OpenResty can still process other requests (such as request B) during this one second. The context of the current request (let’s call it request A) will be saved and awakened by NGINX’s event mechanism, and then return to request A. In this way, the CPU is always in a truly working state.

Variables and Lifecycles #

In addition to these two important concepts, the lifecycle of variables is also a common area where mistakes can occur in OpenResty development.

As mentioned earlier, I recommend declaring all variables as local variables in OpenResty and using tools like luacheck and lua-releng to detect global variables. This applies to modules as well, such as the following example:

local ngx_re = require "ngx.re"

In OpenResty, except for the init_by_lua and init_worker_by_lua phases, all other phases will set up an isolated global variable table to avoid polluting other requests during processing. Even in the phases where global variables can be defined, you should still try to avoid defining global variables.

Generally speaking, problems that attempt to be solved using global variables can actually be solved with module variables, which can be more clear. Here is an example of variables in a module:

local _M = {}

_M.color = {
    red = 1,
    blue = 2,
    green = 3
}

return _M

In a file named hello.lua, I defined a module that includes a table called color. Then, I added the corresponding configuration in nginx.conf:

location / {
    content_by_lua_block {
        local hello = require "hello"
        ngx.say(hello.color.green)
    }
}

This configuration will require the module in the content phase and print the value of green as the HTTP response body.

You may wonder why module variables are so magical.

In fact, a module is only loaded once in the same worker process. Subsequent requests handled by this worker can share the data in the module. The reason why we say that data is suitable to be encapsulated as “global” variables in modules is because the workers in OpenResty are completely isolated. Therefore, each worker independently loads the module, and the module’s data cannot cross workers.

As for how to handle data that needs to be shared among workers, I will explain it in later chapters, so you don’t need to delve into it for now.

However, there is one place where you can easily make mistakes, that is, when accessing module variables, it is best to keep them read-only and not attempt to modify them. Otherwise, races can occur in high-concurrency situations. This kind of bug cannot be found through unit testing, and it occasionally occurs in production and is difficult to locate.

For example, the current value of the module variable green is 3, and you perform an increment operation in your code. Is the value of green now 4? Not necessarily—it could be 4, 5, or 6. This is because when writing to module variables, OpenResty does not lock them, resulting in competition. The value of the module variable can be updated by multiple requests simultaneously.

After discussing global variables, local variables, and module variables, let’s talk about variables that span phases.

In some situations, what we need is a variable that can be read and written across phases. Although variables like $host and $scheme in NGINX, which we are familiar with, satisfy the condition of crossing phases, they cannot be dynamically created. You must first define them in the configuration file before using them. For example:

location /foo {
    set $my_var ; # The $my_var variable needs to be created first
    content_by_lua_block {
        ngx.var.my_var = 123
    }
}

OpenResty provides ngx.ctx to solve this type of problem. It is a Lua table used to store request-based Lua data, which has the same lifespan as the current request. Let’s take a look at the example in the official documentation:

location /test {
    rewrite_by_lua_block {
        ngx.ctx.foo = 76
    }
    access_by_lua_block {
        ngx.ctx.foo = ngx.ctx.foo + 3
    }
    content_by_lua_block {
        ngx.say(ngx.ctx.foo)
    }
}

As you can see, we defined a variable called foo and stored it in ngx.ctx. This variable spans the rewrite, access, and content phases, and the value is finally printed in the content phase, which is the expected value of 79.

However, ngx.ctx also has its limitations:

For example, subrequests created using ngx.location.capture have their own independent ngx.ctx data, which does not affect the ngx.ctx of the parent request.
Another example is that internal redirections created using ngx.exec will destroy the ngx.ctx of the original request and generate a blank ngx.ctx.

Both of these limitations are well-documented in the official documentation with detailed code examples. If you are interested, you can refer to it.

Final Words #

Finally, let me add a few more remarks. In this lesson, we learned about the principles and several key concepts of OpenResty. However, you don’t need to memorize them inside out. After all, these concepts only become meaningful and vivid when combined with practical requirements and code.

I wonder how you interpret it. Feel free to leave a comment and discuss it with me. Also, feel free to share this article with your colleagues and friends so that we can exchange ideas and make progress together.