11 Dissecting the Unique Data Structures in Lua Table and Metatable Features

11 Dissecting the Unique Data Structures in Lua- table and metatable Features #

Hello, I’m Wen Ming. Today, let’s learn about the only data structure in LuaJIT: table.

Unlike other scripting languages with rich data structures, LuaJIT only has one data structure, table, which does not differentiate between arrays, hashes, sets, etc., but combines them together. Let’s review an example mentioned earlier:

local color = {first = "red", "blue", third = "green", "yellow"}
print(color["first"])                 --> output: red
print(color[1])                         --> output: blue
print(color["third"])                --> output: green
print(color[2])                         --> output: yellow
print(color[3])                         --> output: nil

In this example, the color table contains arrays and hashes, which can be accessed separately without interfering with each other. For example, you can use the ipairs function to iterate through the array part only:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
for k, v in ipairs(color) do
     print(k)
end
'

The operations on table are so important that LuaJIT has extended the standard Lua 5.1 table library, and OpenResty has further extended the table library of LuaJIT. Now, let’s take a look at these library functions together.

table library functions #

Let’s start with the standard table library functions. There are not many table library functions built into Lua 5.1, so we can quickly go through them.

table.getn Get number of elements #

As we mentioned in the “Standard Lua and LuaJIT” chapter, correctly getting the number of elements in a table is a difficult problem in LuaJIT.

For sequences, you can use table.getn or the unary operator # to correctly return the number of elements. For example, in the following example, it will return the expected value of 3.

$ resty -e 'local t = { 1, 2, 3 }
print(table.getn(t)) '

However, for tables that are not sequences, it cannot return the correct value. For example, in the second example, it will return 1.

$ resty -e 'local t = { 1, a = 2 }
print(#t) '

Fortunately, this confusing function has been replaced by an extension in LuaJIT, which we will mention later. So in the context of OpenResty, unless you are certain that you are getting the length of a sequence, please do not use the table.getn function and the unary operator #.

In addition, table.getn and the unary operator # have a time complexity of O(n), not O(1), which is another reason to avoid using them as much as possible.

table.remove Remove specified element #

Next, let’s look at the table.remove function, which is used to remove elements from a table based on their indices. This means it can only remove elements from the array part of the table. Let’s use the color example again:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
  table.remove(color, 1)
  for k, v in pairs(color) do
      print(v)
  end'

This code will remove the element blue at index 1. You may wonder how to remove elements from the hash part of the table. It’s also simple: you just set the value corresponding to the key to nil. For example, in the color example, the value green corresponding to third is removed as follows:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
  color.third = nil
  for k, v in pairs(color) do
      print(v)
  end'

table.concat Concatenate elements #

Next, let’s look at the table.concat function, which concatenates elements in a table based on their indices. Since this operation is also based on indices, it is obviously for the array part of the table. Let’s continue with the color example:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
print(table.concat(color, ", "))'

After using the table.concat function, it outputs blue, yellow, skipping the hash part.

In addition, this function can also specify the starting and ending indices for concatenation. For example:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow", "orange"}
print(table.concat(color, ", ", 2, 3))'

This time, it outputs yellow, orange, skipping blue.

You may think these operations are quite simple, but I want to say that appearances can be deceiving, and the sea is unfathomable. Do not underestimate this seemingly insignificant function—it can have unexpected effects in performance optimization, and it is one of the protagonists in our upcoming chapter on performance optimization.

table.insert Insert an element #

Finally, let’s look at the table.insert function, which can insert a new element at a specified index, affecting the array part of the table. Let’s use the color example again:

$ resty -e 'local color = {first = "red", "blue", third = "green", "yellow"}
table.insert(color, 1,  "orange")
print(color[1])
'

As you can see, the first element of color is now orange. Of course, you can also omit the index, in which case the element will be inserted at the end of the table.

Here, I must clarify that although table.insert is a common operation, its performance is not optimistic. If you are not inserting elements based on a specified index, then you need to call LuaJIT’s lj_tab_len every time to get the length of the array in order to insert the element at the end. As we mentioned in table.getn, the time complexity of getting the length of a table is O(n).

Therefore, for table.insert operations, we should try to avoid using them in hot code, such as:

local t = {}
for i = 1, 10000 do
     table.insert(t, i)
end

LuaJIT’s Table Extension Functions #

Next, let’s take a look at the table extension functions in LuaJIT. LuaJIT extends two very useful table functions on top of the standard Lua, which are used to create and clear a table. Allow me to introduce them in detail below.

table.new(narray, nhash) - Create a new table #

The first function is table.new(narray, nhash). This function pre-allocates the specified sizes for the array and hash, instead of growing dynamically when elements are inserted. The two parameters, narray and nhash, determine the size of the array and hash respectively. Dynamic growth is a costly operation involving space allocation, resizing, and rehashing, which should be avoided whenever possible.

It’s worth noting that the documentation for table.new is not present on the LuaJIT official website, but rather buried deep in the project’s extension documentation on GitHub. Even with the help of Google, it is hard to find. As a result, only a few engineers are aware of its existence.

Here is a simple example to demonstrate how to use it. First, you need to require it as the function is an extension:

local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 100 do
   t[i] = i
end

As you can see, this code creates a new table with 100 array elements and 0 hash elements. Of course, you can create a table with both 100 array elements and 50 hash elements according to your needs:

local t = new_tab(100, 50)

Additionally, exceeding the predefined size is still possible, but it will result in degraded performance, rendering the use of table.new meaningless.

For example, in the following code, we set the size to 100, but end up using 200:

local new_tab = require "table.new"
local t = new_tab(100, 0)
for i = 1, 200 do
   t[i] = i
end

Therefore, you need to preset the sizes of the array and hash spaces in table.new according to the actual scenario in order to find a balance between performance and memory usage.

table.clear() - Clear a table #

The second function is table.clear(), which is used to clear all data in a table without releasing the memory occupied by the array and hash. Thus, it is very useful for reusing Lua tables and avoids the overhead of repeatedly creating and destroying tables.

Here is an example:

$ resty -e 'local clear_tab =require "table.clear"
local color = {first = "red", "blue", third = "green", "yellow"}
clear_tab(color)
for k, v in pairs(color) do
     print(k)
end'

However, in reality, there are not many scenarios where this function can be used. In most cases, we should leave this task to the LuaJIT GC to complete.

table extension functions in OpenResty #

As mentioned earlier, OpenResty, LuaJIT’s own maintained branch, has extended the table module with several new APIs: table.isempty, table.isarray, table.nkeys, and table.clone.

It is important to note that before using these new APIs, please check the version of OpenResty you are using, as most of these APIs can only be used in versions after OpenResty 1.15.8.1. This is because OpenResty did not release any new versions for about a year before version 1.15.8.1, and these APIs were added during this release gap.

I have provided a link in the article, and here I will only use table.nkeys as an example. The other three APIs are relatively easy to understand from their names, so you can refer to the documentation on GitHub to understand them. I must say that the quality of OpenResty’s documentation is very high. It includes code examples, whether it can be JIT-ed, and important considerations. Compared to the documentation of Lua and LuaJIT, it is several orders of magnitude better.

Now, let’s focus on the table.nkeys function. Its name may be confusing, but it is actually a function used to get the length of a table. It returns the number of elements in the table, including elements in both array and hash parts. Therefore, we can use it as a replacement for table.getn, as shown in the example below:

local nkeys = require "table.nkeys"

print(nkeys({}))  -- 0
print(nkeys({ "a", nil, "b" }))  -- 2
print(nkeys({ dog = 3, cat = 4, bird = nil }))  -- 2
print(nkeys({ "a", dog = 3, cat = 4 }))  -- 3

Metatables #

After explaining the table function, let’s take a look at metatables, which are derived from table. Metatables are a unique concept in Lua and are widely used in practical projects. It is not an exaggeration to say that you can see metatables in almost all lua-resty-* libraries.

The behavior of metatables is similar to operator overloading. For example, we can overload __add to calculate the union of two Lua arrays, or overload __tostring to define a function for converting to a string.

Lua provides two functions for handling metatables:

  • The first one is setmetatable(table, metatable), which is used to set a metatable for a table.
  • The second one is getmetatable(table), which is used to get the metatable of a table.

After introducing this for a while, you may be more interested in its functionality, so let’s take a look at what metatables can be used for. Here is a code snippet from a real project:

local version = {
  major = 1,
  minor = 1,
  patch = 1
}
version = setmetatable(version, {
  __tostring = function(t)
    return string.format("%d.%d.%d", t.major, t.minor, t.patch)
  end
})
print(tostring(version))

First, we define a table named version. As you can see, the purpose of this code snippet is to print the version number from the version table. However, we cannot directly print version. If you try to operate on it, you will find that it only outputs the address of this table.

print(tostring(version))

So, we need to customize the string conversion function for this table, which is __tostring. At this point, metatables come into play. We use setmetatable to redefine the __tostring method of the version table, so we can print the version number: 1.1.1.

In addition to __tostring, another metatable method we often overload in practical projects is __index.

When we look up an element in a table, we first search directly in the table. If not found, we continue to search in the __index of the metatable.

For example, in the following example, we remove patch from the version table:

local version = {
  major = 1,
  minor = 1
}
version = setmetatable(version, {
  __index = function(t, key)
    if key == "patch" then
      return 2
    end
  end,
  __tostring = function(t)
    return string.format("%d.%d.%d", t.major, t.minor, t.patch)
  end
})
print(tostring(version))

In this case, t.patch actually cannot get a value, so it will go to the __index function, and the result will print 1.1.2.

In fact, __index can be a function or a table. If you run the following code, you will see that they have the same effect:

local version = {
  major = 1,
  minor = 1
}
version = setmetatable(version, {
  __index = {patch = 2},
  __tostring = function(t)
    return string.format("%d.%d.%d", t.major, t.minor, t.patch)
  end
})
print(tostring(version))

Another metatable method is __call. It acts like a functor and allows a table to be called.

Let’s modify the code for printing the version number based on the code above and see how to call a table:

local version = {
  major = 1,
  minor = 1,
  patch = 1
}

local function print_version(t)
  print(string.format("%d.%d.%d", t.major, t.minor, t.patch))
end

version = setmetatable(version, {__call = print_version})

version()

In this code, we use setmetatable to add a metatable to the version table, and the __call metatable refers to the print_version function. So, if we try to call version as a function, the print_version function will be executed.

getmetatable is the counterpart of setmetatable. It can retrieve the set metatable, as shown in the following code:

local version = {
  major = 1,
  minor = 1
}
version = setmetatable(version, {
  __index = {patch = 2},
  __tostring = function(t)
    return string.format("%d.%d.%d", t.major, t.minor, t.patch)
  end
})
print(getmetatable(version).__index.patch)

Besides the three metatables mentioned today, there are some less frequently used metatables. You can refer to the documentation for more information when you encounter them.

Object Orientation #

Finally, let’s talk about object orientation. You may know that Lua is not an object-oriented language, but we can use metatables to achieve OO.

Let’s take a look at a practical example. lua-resty-mysql is the official MySQL client for OpenResty, and it uses metatables to simulate classes and class methods. Here’s an example of how it’s used:

$ resty -e 'local mysql = require "resty.mysql" -- first import the lua-resty library
local db, err = mysql:new() -- create a new instance of the class
db:set_timeout(1000) -- call a method of the class'

You can directly execute the above code using the resty command line. These few lines of code are easy to understand, the only thing that might confuse you is:

Why is it using a colon instead of a dot when calling class methods?

In fact, both the colon and dot can be used here. db:set_timeout(1000) and db.set_timeout(db, 1000) are completely equivalent. The colon is a syntactic sugar in Lua that allows you to omit the first parameter self of the function.

As you may know, there are no secrets in source code. Let’s take a look at the specific implementation corresponding to the above code, so that you can have a better understanding of how metatables are used to simulate object orientation:

local _M = { _VERSION = '0.21' } -- use a table to simulate a class
local mt = { __index = _M } -- mt stands for metatable, __index points to the class itself

-- constructor of the class
function _M.new(self) 
     local sock, err = tcp()
     if not sock then
         return nil, err
     end
     return setmetatable({ sock = sock }, mt) -- use a table and metatable to simulate class instances
end

-- member function of the class
 function _M.set_timeout(self, timeout) -- use the self parameter to access the instance of the class to be operated on
     local sock = self.sock
     if not sock then
        return nil, "not initialized"
     end

    return sock:settimeout(timeout)
end

As you can see, the table _M simulates a class. During initialization, it only has the _VERSION member variable. Then, it defines member functions like _M.set_timeout. In the constructor _M.new(self), we create a table and set its metatable as mt, and the __index metamethod of mt points to _M. This way, the returned table simulates an instance of the class _M.

Conclusion #

Alright, that’s it for today. In fact, table and metatable are extensively used in the lua-resty-* libraries of OpenResty and in OpenResty-based open-source projects. I hope that through this lesson, you will find it easier to understand the source code of these libraries.

Of course, besides table, Lua has other commonly used functions as well, which we will study together in the next lesson.

Lastly, I would like to leave you with a question to ponder. Why does the lua-resty-mysql library simulate object-oriented (OO) programming for encapsulation? Feel free to discuss this question in the comments section, and also feel free to share this article with your colleagues and friends. Let’s communicate and progress together.