09 Why Luarestycore Offers Higher Performance

09 Why luarestycore Offers Higher Performance #

Hello, I’m Wen Ming.

In the previous two lessons, we discussed that Lua is an embedded development language. Lua’s core is kept concise and powerful, allowing you to embed Lua in Redis and NGINX to flexibly complete business logic. Lua can also call existing C functions and data structures to avoid reinventing the wheel.

In Lua, you can use the Lua C API to call C functions, and in LuaJIT, you can also use FFI. For OpenResty:

In the core lua-nginx-module, the API for calling C functions is implemented using the Lua C API;
In lua-resty-core, the existing lua-nginx-module API is re-implemented using FFI.

At this point, you might be wondering: why reimplement it using FFI?

Don’t worry, let’s take the very simple API ngx.base64_decode as an example to see the differences between Lua C API and FFI implementations. This way, you can have a more intuitive understanding of their performance.

Lua CFunction #

Let’s first take a look at how the Lua C API is used in lua-nginx-module. We can search for decode_base64 in the project’s code and find its implementation in ngx_http_lua_string.c:

lua_pushcfunction(L, ngx_http_lua_ngx_decode_base64);
lua_setfield(L, -2, "decode_base64");

The above code may look intimidating, but luckily, we don’t need to delve into the details of the two functions starting with lua_ and their specific parameters. We just need to know one thing - a CFunction named ngx_http_lua_ngx_decode_base64 is registered here, and it corresponds to the exposed API ngx.base64_decode.

Let’s continue our search in this C file. We can find the definition of ngx_http_lua_ngx_decode_base64 at the beginning of the file:

static int ngx_http_lua_ngx_decode_base64(lua_State *L);

For C functions that can be called by Lua, their interfaces must follow the form required by Lua, which is typedef int (*lua_CFunction)(lua_State* L). It takes a pointer of type lua_State named L as its parameter, and its return type is an integer representing the number of return values, not the return value itself.

Its implementation is as follows (I have removed the error handling code here):

static int
ngx_http_lua_ngx_decode_base64(lua_State *L)
{
    ngx_str_t p, src;
    
    src.data = (u_char *) luaL_checklstring(L, 1, &src.len);
    
    p.len = ngx_base64_decoded_length(src.len);
    
    p.data = lua_newuserdata(L, p.len);
    
    if (ngx_decode_base64(&p, &src) == NGX_OK) {
        lua_pushlstring(L, (char *) p.data, p.len);
    } else {
        lua_pushnil(L);
    }
    
    return 1;
}

In this code snippet, the most important parts are ngx_base64_decoded_length and ngx_decode_base64. They are C functions provided by NGINX itself.

As we know, functions written in C cannot pass return values directly to Lua code. They need to use the stack to pass parameters and return values between Lua and C. That’s why there is a lot of code that is hard to understand at first glance. At the same time, these operations cannot be tracked by the JIT, so for LuaJIT, these operations are in a black box and cannot be optimized.

LuaJIT FFI #

The FFI (Foreign Function Interface) is different. The interaction part of FFI is implemented in Lua, and this part of the code can be traced and optimized by the JIT (Just-In-Time) compiler. Of course, the code will also be more concise and easier to understand.

Let’s take base64_decode as an example. Its FFI implementation is scattered in two repositories: lua-resty-core and lua-nginx-module. First, let’s take a look at the implemented code in the former:

ngx.decode_base64 = function (s)
     local slen = #s
     local dlen = base64_decoded_length(slen)

     local dst = get_string_buf(dlen)
     local pdlen = get_size_ptr()
     local ok = C.ngx_http_lua_ffi_decode_base64(s, slen, dst, pdlen)
     if ok == 0 then
         return nil
     end
     return ffi_string(dst, pdlen[0])
 end

You will find that compared to C functions, the code implemented by FFI is much cleaner. The specific implementation is ngx_http_lua_ffi_decode_base64 in the lua-nginx-module repository. If you are interested in this, you can look up the implementation of this function yourself. It is quite simple, so I won’t paste the code here.

However, for those observant readers, have you noticed some patterns in function names from the code snippet above?

Yes, functions in OpenResty have naming conventions, and you can infer their purposes from the names. For example:

ngx_http_lua_ffi_ is used for Lua functions that handle NGINX HTTP requests using FFI.
ngx_http_lua_ngx_ is used for Lua functions that handle NGINX HTTP requests using C functions.
Other functions starting with ngx_ and lua_ respectively belong to NGINX and Lua built-in functions.

Furthermore, the C code in OpenResty also follows strict code conventions. I recommend reading the official C coding style guide. It is essential documentation for developers who want to learn OpenResty’s C code and submit pull requests. Otherwise, even if your pull request is well-written, it may receive repeated comments and requests for modification due to code style issues.

For more APIs and details about FFI, I recommend reading the LuaJIT official tutorial and documentation. Technical articles cannot replace official documentation, and I can only help you point out the learning path and avoid some detours within limited time. However, you will still need to tackle the challenging parts yourself.

LuaJIT FFI GC #

When using FFI, we may be confused about who manages the memory allocated in FFI. Should we manually free it in C or let LuaJIT automatically collect it?

Here is a simple rule: LuaJIT is only responsible for resources allocated by itself, while ffi.C is the namespace of C libraries. Therefore, the memory allocated using ffi.C is not managed by LuaJIT and needs to be manually released by you.

For example, if you use ffi.C.malloc to allocate memory, you need to use the corresponding ffi.C.free to free it. The LuaJIT official documentation provides an example:

local p = ffi.gc(ffi.C.malloc(n), ffi.C.free)
...
p = nil -- Last reference to p is gone.
-- GC will eventually run finalizer: ffi.C.free(p)

In this code snippet, ffi.C.malloc(n) allocates a chunk of memory, and ffi.gc registers a destructor callback function ffi.C.free for it. In this way, when the cdata variable p is garbage collected by LuaJIT, it will automatically call ffi.C.free to release the C-level memory. Since the GC is responsible for the cdata, the p variable in the above code will be automatically released by LuaJIT.

It is important to note that if you need to allocate large chunks of memory in OpenResty, I recommend using ffi.C.malloc instead of ffi.new. The reasons are obvious:

ffi.new returns a cdata object, and that part of the memory is managed by LuaJIT.
LuaJIT has a memory management limit for GC, and the LuaJIT in OpenResty does not enable the GC64 option, so the maximum limit of memory for a single worker is only 2G. If the LuaJIT’s memory management limit is exceeded, it will cause an error.

When using FFI, we also need to pay special attention to memory leaks. However, everyone makes mistakes, and bugs can be present in any code written by humans. So, is there any tool to detect memory leaks?

At this point, OpenResty’s powerful testing and debugging toolchain comes into play.

Let’s talk about testing first. In the OpenResty ecosystem, we use Valgrind to detect memory leaks.

The testing framework test::nginx mentioned in the previous lesson has a specific mode for memory leak detection to run a set of unit test cases. You only need to set the environment variable TEST_NGINX_USE_VALGRIND=1. Before releasing a new version, the official OpenResty project fully regresses under this mode. We will provide further details in the testing chapter.

The OpenResty CLI resty also has the --valgrind option, which allows you to run a specific Lua code segment separately. Even if you did not write a test case, it is still possible to use this option.

Now let’s look at the debugging tools.

OpenResty provides an extension based on SystemTap for dynamic analysis of OpenResty programs. In the toolset of this project, you can search for the keyword ‘gc’ and find the tools lj-gc and lj-gc-objs.

For offline analysis such as core dump, OpenResty provides a set of tools for GDB. Similarly, you can search for ‘gc’ inside it and find the tools lgc, lgcstat, and lgcpath.

We will provide detailed explanations of how to use these debugging tools in the upcoming debugging chapter. For now, just get an impression. With these tools, you won’t make random attempts when encountering memory issues. After all, OpenResty has a dedicated toolset to help you locate and solve these problems.

lua-resty-core #

From the above comparison, we can see that the FFI approach not only has simpler code, but can also be optimized by LuaJIT, making it the preferred choice. In fact, in reality, the implementation of CFunction has been deprecated by OpenResty, and the related implementation has also been removed from the code repository. Now, the new API is implemented using the FFI approach in the lua-resty-core repository.

Before the release of version 1.15.8.1 in May 2019, lua-resty-core was not enabled by default in OpenResty. This not only caused performance loss, but also potentially introduced bugs. Therefore, I strongly recommend that users still using older versions manually enable lua-resty-core. You only need to add one line of code in the init_by_lua phase:

require "resty.core"

Of course, in the belated version 1.15.8.1, the lua_load_resty_core directive has been added, and lua-resty-core is enabled by default. Personally, I feel that OpenResty was too cautious in enabling lua-resty-core, and open source projects should enable similar functionality by default as early as possible.

In addition to re-implementing some of the APIs in the lua-nginx-module project, such as ngx.re.match and ngx.md5, lua-resty-core also implements many new APIs, such as ngx.ssl, ngx.base64, ngx.errlog, ngx.process, ngx.re.split, ngx.resp.add_header, ngx.balancer, ngx.semaphore, etc. We will introduce them in the following OpenResty API section.

Final Words #

After discussing so much content, I would like to emphasize that FFI is not a silver bullet for performance, even though it is good. Its efficiency mainly stems from being traceable and optimized by JIT. If the Lua code you write cannot be JIT-compiled and instead needs to be executed in interpreted mode, the efficiency of FFI will be lower.

So which operations can be JIT-compiled and which cannot? How can we avoid writing code that cannot be JIT-compiled? I will reveal the answer to these questions in the next section.

Finally, I leave you with a hands-on assignment: can you find one or two APIs that exist in both lua-nginx-module and lua-resty-core, and then compare the performance differences between them? You can also see how much the performance is improved with FFI.

Feel free to leave comments and share your thoughts and findings. Also, feel free to share this article with your colleagues and friends for discussion and progress together.