21 Playing With Time Regular Expressions and Other Common Apis

21 Playing with Time- Regular Expressions and Other Common APIs #

Hello, I’m Wen Ming. In the previous few lessons, you have become familiar with several important Lua APIs in OpenResty. Today, let’s learn about some other commonly used APIs related to regular expressions, time, processes, etc.

Regular Expression #

Let’s start by looking at the most commonly used and most important regular expression in OpenResty. In OpenResty, we should always use the ngx.re.* series of APIs to handle logic related to regular expressions, rather than using Lua’s built-in pattern matching. This is not only for performance reasons, but also because Lua’s built-in regular expressions are a separate system and not compliant with the PCRE specification, which can be a source of frustration for most developers.

In the previous lessons, you have somewhat experienced some ngx.re.* APIs, and the documentation is very detailed, so I won’t list them one by one here. Instead, I would like to emphasize two topics separately.

ngx.re.split #

The first one is ngx.re.split. String splitting is a common functionality, and OpenResty also provides the corresponding API. However, many developers in the community cannot find this function and have to write their own.

Why is that? Actually, the ngx.re.split API is not in the lua-nginx-module, but in lua-resty-core. Moreover, it is not documented on the lua-resty-core homepage, but appears in the documentation of the lua-resty-core/lib/ngx/re.md file in a third-level directory. For various reasons, many developers are completely unaware of the existence of this API.

Similar to this “hidden API,” we have mentioned before such as ngx_resp.add_header, enable_privileged_agent, and so on. So how can we quickly solve this problem? In addition to reading the documentation on the lua-resty-core homepage, you also need to read all the .md format documents in the lua-resty-core/lib/ngx/ directory.

We have praised many things about the good aspects of OpenResty’s documentation before, but in this regard, being able to query a complete API list on one page does indeed have room for improvement.

lua_regex_match_limit #

The second thing I want to introduce is lua_regex_match_limit. We haven’t specifically talked about the Nginx directives provided by OpenResty before because in most cases, the default values are sufficient, and there is no need to modify them at runtime. However, the lua_regex_match_limit directive related to regular expressions is an exception.

We know that if the regular expression engine we use is based on backtracking NFA, it is possible to have catastrophic backtracking, which means that the regular expression backtracks too much during matching, causing the CPU to reach 100% and blocking normal services.

Once catastrophic backtracking occurs, we need to use gdb or systemtap to analyze dumps or analyze production environments, and it is not easy to discover in advance because it only occurs with specific requests. This obviously gives attackers an opportunity, which is referred to as ReDoS (RegEx Denial of Service).

If you are interested in automating the discovery and thorough resolution of this problem, you can refer to an article I wrote on my official account: How to Completely Avoid Catastrophic Backtracking in Regular Expressions?

Today, here I mainly want to introduce how to effectively avoid this issue in OpenResty, which is to use the following line of code:

lua_regex_match_limit 100000;

lua_regex_match_limit is used to limit the number of backtracks made by the PCRE regular expression engine. This way, even if catastrophic backtracking occurs, the consequences will be limited to a certain range and will not cause your CPU to be overloaded.

Here, I will briefly explain that the default value of this directive is 0, which means there is no limit. If you have not replaced OpenResty’s built-in regular expression engine and are dealing with many complex regular expressions, you may consider resetting the value of this Nginx directive.

Time API #

Next, let’s talk about the time API. OpenResty provides around 10 APIs related to time, which indicates their importance. Generally speaking, the most commonly used time API is ngx.now, which can print the current timestamp. For example, the following line of code:

resty -e 'ngx.say(ngx.now())'

From the printed result, you can see that ngx.now includes the decimal part, making it more precise. On the other hand, ngx.time only returns the integer part. As for other APIs like ngx.localtime, ngx.utctime, ngx.cookie_time, and ngx.http_time, they mainly return and process time in different formats. If you need to use them, you can refer to the documentation. They are not difficult to understand, so I won’t explain them separately.

However, it is worth mentioning that these APIs that return the current time will continuously return the cached value if there is no non-blocking network IO operation to trigger them. They won’t return the real-time time as we expect. Take a look at the following example code:

$ resty -e 'ngx.say(ngx.now())
os.execute("sleep 1")
ngx.say(ngx.now())'

Between the two ngx.now calls, we use Lua’s blocking function sleep for 1 second. But from the printed result, it can be seen that the two timestamps returned are exactly the same.

Now, what if we use a non-blocking sleep function? Like the following new code:

$ resty -e 'ngx.say(ngx.now())
ngx.sleep(1)
ngx.say(ngx.now())'

Obviously, it will print different timestamps. Here it also introduces ngx.sleep, a non-blocking sleep function. Besides being able to sleep for a specified time, this function has another special use.

For example, if you have a piece of code that is doing intensive computations and takes a relatively long time, during this time, the corresponding request for this code will continuously occupy the worker and CPU resources, causing other requests to queue up and unable to receive timely responses. In this case, we can intersperse ngx.sleep(0) in this code to yield control, so that other requests can also be processed.

Worker and Process APIs #

Let’s take a look at the worker and process APIs. OpenResty provides the ngx.worker.* and ngx.process.* APIs to obtain information related to workers and processes. The former is related to Nginx worker processes, while the latter refers to all Nginx processes, including worker processes, master processes, and privileged processes, among others.

In fact, ngx.worker.* is provided by the lua-nginx-module, while ngx.process.* is provided by lua-resty-core. Do you remember the exercise we left in the previous class? How can we ensure that only one timer is started in a multi-worker scenario? Actually, this requires the use of the ngx.worker.id API. You can perform a simple check before starting the timer:

if ngx.worker.id == 0 then
    start_timer()
end

This way, we can achieve the goal of starting only one timer. It is important to note that worker ids start from 0, which is different from Lua where array indices start from 1. Please don’t get confused.

As for the other worker and process-related APIs, there is nothing particularly noteworthy. I’ll leave it up to you to learn and practice on your own.

Truth and Null Values #

Lastly, let’s take a look at the issue of truth and null values. In OpenResty, the determination of truth and null values has always been a troublesome and confusing point.

Let’s first take a look at the definition of truth values in Lua: Anything other than nil and false is considered a truth value.

Therefore, truth values also include: 0, empty strings, empty tables, and so on.

Next, let’s examine the null value (nil) in Lua. It signifies something that is undefined. For example, if you declare a variable but have not yet initialized it, its value will be nil:

$ resty -e 'local a
ngx.say(type(a))'

And nil is also a data type in Lua.

Now that we understand these two points, let’s examine the specific pitfalls that arise based on these definitions.

ngx.null #

The first pitfall is ngx.null. Because Lua’s nil cannot be used as a value in a table, OpenResty introduced ngx.null as an empty value in a table:

$ resty -e 'print(ngx.null)'
null

$ resty -e 'print(type(ngx.null))'
userdata

From the above code snippets, you can see that ngx.null is printed as null, and its type is userdata. Can it be considered as a falsy value? No, in fact, the boolean value of ngx.null is true:

$ resty -e 'if ngx.null then
ngx.say("true")
end'

Therefore, remember that only nil and false are falsy values. If you overlook this point, it’s easy to fall into a pitfall. For example, when using lua-resty-redis, if you make the following judgement:

local res, err = red:get("dog")
if not res then
    res = res + "test"
end

If the returned value res is nil, it means the function call has failed. If res is ngx.null, it means the key dog does not exist in Redis. In this case, the code will crash with a 500 error.

cdata:NULL #

The second pitfall is cdata:NULL. When you use the LuaJIT FFI interface to call a C function that returns a NULL pointer, you will encounter another type of null value, which is cdata:NULL.

$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
if cdata_null then
    ngx.say("true")
end'

Similar to ngx.null, cdata:NULL is also a truth value. But what’s even more perplexing is that the following code will print true, which means cdata:NULL is equivalent to nil:

$ resty -e 'local ffi = require "ffi"
local cdata_null = ffi.new("void*", nil)
ngx.say(cdata_null == nil)'

So how should we handle ngx.null and cdata:NULL? Obviously, it is impractical to expect the application layer to deal with these annoying details. It is best to create a layer of abstraction and not let the callers know about these intricacies.

cjson.null #

Lastly, let’s take a look at the null value that appears in cjson. The cjson library decodes the NULL value in a JSON to Lua’s lightuserdata and represents it as cjson.null:

$ resty -e 'local cjson = require "cjson"
local data = cjson.encode(nil)
local decode_null = cjson.decode(data)
ngx.say(decode_null == cjson.null)'

After encoding and decoding in JSON, Lua’s nil becomes cjson.null. As you can imagine, the reason for its introduction is the same as ngx.null, because nil cannot be used as a value in a table.

So far, after learning about all the null values in OpenResty, I wonder if you feel confused? Don’t panic. Read this section a few more times and summarize it yourself, so that you won’t get dizzy and confused. Of course, when you write something like if not foo then, you should think carefully whether this condition can be satisfied.

Conclusion #

After studying today’s lesson, we have introduced the commonly used Lua API in OpenResty. I wonder if you have understood it all?

Finally, I will leave you with a question to ponder: In the example of ngx.now, why doesn’t its value change when there is no yield operation? Feel free to leave a comment sharing your thoughts. Also, please feel free to share this article and let’s communicate and progress together.