12 Advanced Secrets Understanding Lua's Unique Concepts and Pits

12 Advanced Secrets- Understanding Lua’s Unique Concepts and Pits #

Hello, I’m Wen Ming.

In the previous section, we explored the library functions related to tables in LuaJIT. Alongside these commonly used functions, today I will introduce some unique or less frequently used concepts in Lua, as well as common pitfalls when using Lua in OpenResty.

Weak Tables #

First, let’s talk about weak tables, a unique concept in Lua that is related to garbage collection. Like other high-level languages, Lua has automatic garbage collection, so you don’t have to worry about the specifics of its implementation or explicitly call the garbage collector. Spaces that are not referenced will be automatically collected by the garbage collector.

However, simple reference counting is not always sufficient. Sometimes, we need a more flexible mechanism. For example, when we insert a Lua object Foo (a table or a function) into the table tb, a reference to the object Foo is created. Even if Foo is not referenced elsewhere, the reference from tb keeps it alive, preventing the GC from reclaiming the memory occupied by Foo. In such cases, we have two options:

  • Manually release Foo.
  • Let it reside in memory permanently.

For example, consider the following code:

local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
print(#tb) -- 2

collectgarbage()
print(#tb) -- 2

table.remove(tb, 1)
print(#tb) -- 1

However, you don’t want the memory to be occupied by objects that are not being used, especially with the 2GB memory limit in LuaJIT. Manually releasing objects can be difficult and increase code complexity.

This is where weak tables come in handy. As the name suggests, a weak table is a table where all its elements are weak references. Let’s modify the code a bit:

local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "v"})
print(#tb)  -- 2

collectgarbage()
print(#tb) -- 0

As you can see, objects that are not in use are garbage collected. The most important line of code is:

setmetatable(tb, {__mode = "v"})

Does it look familiar? Yes, it’s an operation on a metatable! When a table’s metatable has a __mode field, the table becomes a weak table.

  • If the value of __mode is k, it means the keys of the table are weak references.
  • If the value of __mode is v, it means the values of the table are weak references.
  • You can also set it to kv, indicating that both the keys and values of the table are weak references.

Any of these three kinds of weak tables will have the entire key-value pair collected when either the key or value is reclaimed.

In the code example above, the value of __mode is v, and tb is an array, so the values in the array (tables and function objects) can be automatically garbage collected. However, if you change the value of __mode to k, the garbage collection won’t occur. For example:

local tb = {}
tb[1] = {red}
tb[2] = function() print("func") end
setmetatable(tb, {__mode = "k"})
print(#tb)  -- 2

collectgarbage()
print(#tb) -- 2

Please note that we only demonstrated weak tables with weak values in array-like tables. Similarly, you can use objects as keys to create weak tables of hash tables. For example:

local tb = {}
tb[{color = red}] = "red"
local fc = function() print("func") end
tb[fc] = "func"
fc = nil

setmetatable(tb, {__mode = "k"})
for k,v in pairs(tb) do
     print(v)
end

collectgarbage()
print("----------")
for k,v in pairs(tb) do
     print(v)
end

After manually calling collectgarbage() to force a garbage collection, all elements of the tb table are reclaimed. However, in actual code, there is no need to manually call collectgarbage(). It runs automatically in the background without us having to worry about it.

Since we mentioned the collectgarbage() function, let me say a few more words about it. This function can take multiple options, and by default, it performs a full garbage collection (collect). Another useful option is count, which returns the amount of memory used by Lua. This statistical data is helpful to detect memory leaks and reminds us not to approach the 2GB limit.

The code related to weak tables can be complex and difficult to understand, and it may introduce hidden bugs as well. What are these bugs? Don’t worry, in the next section, I will specifically introduce a memory leak problem caused by using weak tables in an open source project.

Closures and Upvalues #

Now let’s talk about closures and upvalues. As I mentioned before, in Lua, all values are first-class citizens, including functions. This means that functions can be saved in variables, passed as arguments, and returned as values from other functions. For example, in the weak table example mentioned earlier:

tb[2] = function() print("func") end

This code snippet stores an anonymous function as the value of a table.

In Lua, the following two function definitions are equivalent. However, note that the latter assigns the function to a variable, which is a commonly used approach:

local function foo() print("foo") end
local foo = function() print("foo") end

Additionally, Lua supports writing one function inside another function, which is called a nested function. Here’s an example:

$ resty -e '
local function foo()
     local i = 1
     local function bar()
         i = i + 1
         print(i)
     end
     return bar
end

local fn = foo()
print(fn()) -- 2
'

As you can see, the bar function can read the local variable i defined in the foo function and modify its value, even though this variable is not defined in bar. This feature is called lexical scoping.

In fact, these features of Lua are the basis of closures. In simple terms, a closure is a function that accesses a variable in the lexical scope of another function.

If we look at it from the perspective of closure definition, all Lua functions are actually closures, even if they are not nested. This is because the Lua compiler wraps the Lua script with an additional main function. For example, with the following few lines of code:

local foo, bar
local function fn()
     foo = 1
     bar = 2
end

After compilation, it will become the following:

function main(...)
     local foo, bar
     local function fn()
         foo = 1
         bar = 2
     end
end

And the function fn captures the two local variables of the main function, making it a closure.

Of course, we know that the concept of closures is not unique to Lua, and it is not exclusive to Lua. You can compare with other languages to deepen your understanding. Only by understanding closures can you understand the upvalues that we will discuss next.

Upvalues are a unique concept in Lua. Literally, it can be translated as “values above”. In practice, an upvalue is a variable captured from the lexical scope outside of the closure. Let’s continue with the code snippet from above:

local foo, bar
local function fn()
     foo = 1
     bar = 2
end

You can see that the function fn captures two local variables, foo and bar, which are not in its own lexical scope. And these two variables are actually the upvalues of the function fn.

Common Pitfalls #

After introducing several concepts in Lua, let me talk about some of the pitfalls related to Lua in OpenResty development.

In the previous content, we mentioned some differences between Lua and other programming languages, such as starting indexes from 1 and default global variables. In actual code development with OpenResty, we will encounter more issues related to Lua and LuaJIT. In the following, I will discuss some more common ones.

Here, I need to remind you that even if you know all the “pitfalls”, it is inevitable that you will have to stumble upon them yourself in order to leave a deep impression. Of course, the difference is that you will be able to quickly climb out of the pit and find the crux.

Starting Index: 0 or 1 #

The first pitfall is that the indexes in Lua start from 1. We have mentioned this many times before. But I have to say that this is not the whole truth.

Because in LuaJIT, arrays created using ffi.new start indexing from 0:

local buf = ffi_new("char[?]", 128)

So, if you want to access the cdata buf in the code snippet above, please remember that the index starts from 0, not 1. When using FFI to interact with C, pay special attention to this.

Regular Expression Pattern Matching #

The second pitfall is related to regular expression pattern matching. OpenResty has two sets of string matching methods: Lua’s built-in string library and OpenResty’s ngx.re.* API.

Among them, Lua’s regular expression pattern matching has its own format, which is different from PCRE. Here is a simple example:

resty -e 'print(string.match("foo 123 bar", "%d%d%d"))'   123

This code extracts the numeric part from the string. You will notice that it is completely different from the regular expression we are familiar with. Lua’s built-in regular expression matching library has high code maintenance costs and low performance—it cannot be JIT compiled, and patterns that have been compiled once will not be cached.

Therefore, when you use Lua’s built-in string library to perform operations such as find and match, if you have similar regex requirements, do not hesitate to use OpenResty’s ngx.re instead. Only when looking for fixed strings, consider using the plain matching mode and the string library.

Here is a suggestion: In OpenResty, we always use OpenResty’s API first, then LuaJIT’s API, and use Lua libraries with caution.

Unable to Distinguish Array and Dictionary when Encoding JSON #

The third pitfall is that you cannot distinguish between an array and a dictionary when encoding JSON. Since Lua only has the table data structure, it is naturally unable to determine whether to encode an empty table into an array or a dictionary:

resty -e 'local cjson = require "cjson"
local t = {}
print(cjson.encode(t))
'

For example, in the code snippet above, the output is {}. As you can see, OpenResty’s cjson library defaults to encoding an empty table as a dictionary. However, we can modify this global default value using the function encode_empty_table_as_object:

resty -e 'local cjson = require "cjson"
cjson.encode_empty_table_as_object(false)
local t = {}
print(cjson.encode(t))
'

This time, the empty table is encoded as an array: [].

However, changing this global setting has a large impact. Can we specify the encoding rules for a specific table? The answer is yes, and we have two methods to achieve this.

The first method is to assign the cjson.empty_array userdata to the specified table. This way, when encoding to JSON, it will be treated as an empty array:

$ resty -e 'local cjson = require "cjson"
local t = cjson.empty_array
print(cjson.encode(t))
'

However, sometimes we are not sure whether the specified table will always be empty. We want it to be encoded as an array when it is empty, so we need to use the second method, which is cjson.empty_array_mt.

It marks the specified table so that it will be encoded as an array when the table is empty. From the name cjson.empty_array_mt, you can see that it is set using the metatable. Here is an example:

$ resty -e 'local cjson = require "cjson"
local t = {}
setmetatable(t, cjson.empty_array_mt)
print(cjson.encode(t))
t = {123}
print(cjson.encode(t))
'

You can execute this code locally and check if the output matches your expectations.

Limitations on the Number of Variables #

Let’s look at the fourth pitfall: limitations on the number of variables. In Lua, a function has a maximum number of local variables and upvalues, which can be confirmed in the Lua source code:

/*
@@ LUAI_MAXVARS is the maximum number of local variables per function
@* (must be smaller than 250).
*/
#define LUAI_MAXVARS            200


/*
@@ LUAI_MAXUPVALUES is the maximum number of upvalues per function
@* (must be smaller than 250).
*/
#define LUAI_MAXUPVALUES        60

These two thresholds are hard-coded as 200 and 60, respectively. Although you can manually modify the source code to adjust these values, the maximum limit is 250.

In general, we will not exceed these thresholds. However, when writing OpenResty code, you still need to pay attention to this and avoid using too many local variables and upvalues. Instead, try to use do .. end to wrap them in order to reduce the number of local variables and upvalues.

Let’s take a look at this pseudo code as an example:

local re_find = ngx.re.find
function foo() ... end
function bar() ... end
function fn() ... end

If only the foo function uses re_find, then we can refactor it like this:

do
    local re_find = ngx.re.find
    function foo() ... end
end
function bar() ... end
function fn() ... end

In this way, at the level of the main function, we have one fewer local variable (re_find). This is an optimization technique in a single large Lua file.

Final Thoughts #

From the perspective of “asking why more often,” where does the threshold of 250 in Lua come from? This can be considered as our thought-provoking question for today. Feel free to leave your comments and share this article with your colleagues and friends. Let’s communicate and progress together.