46 Q& a Four Is Shared Dictionary Cache Really Necessary

46 Q&A Four- Is Shared Dictionary Cache Really Necessary- #

Hello, I’m Wen Ming.

In the latest column update, the fourth section of OpenResty Performance Optimization has been completed. Congratulations on staying on track and actively learning and practicing, and thank you for your enthusiastic comments.

Many of the questions raised in the comments are valuable, and I have already replied to most of them in the app. Today, I have selected some questions that are not convenient to reply to on mobile or are more typical and interesting, and I will answer them collectively. On the other hand, this is also to ensure that no one misses any key information.

Now let’s take a look at these 5 questions for today.

Question 1: How to dynamically load Lua modules? #

Q: I have a question about dynamic loading implemented in OpenResty: After replacing a new file, how can I use the loadstring function to load the new file? I understand that loadstring can only load strings, so how can I reload a Lua file/module in OpenResty?

A: We know that loadstring is used to load strings, while loadfile can load specified files, for example: loadfile("foo.lua"). In fact, both commands achieve the same result.

As for loading Lua modules, here is a specific example:

resty -e 'local s = [[
local ngx = ngx
local _M = {}
function _M.f()
    ngx.say("hello world")
end
return _M
]]
local lua = loadstring(s)
local ret, func = pcall(lua)
func.f()'

In this example, the string s contains a complete Lua module. So, when you notice changes in the code of this module, you can reload it using loadstring or loadfile. This way, the functions and variables inside the module will be updated accordingly.

Furthermore, you can wrap the process of checking for changes and reloading the code in a function called code_loader:

local func = code_loader(name)

This makes the code for updates more concise. Additionally, in the code_loader function, it is common to use an LRU cache to cache the string s, in order to avoid calling loadstring every time. This is basically a complete implementation.

Question 2: Why doesn’t OpenResty disable blocking operations? #

Q: Over the years, I have had a doubt. Since these blocking calls are strongly discouraged by the official documentation, why not disable them directly? Or add a flag to allow users to disable them?

A: Let me give you my personal opinion. First of all, it’s because the peripheral ecosystem of OpenResty is not yet mature enough, and sometimes we have to use blocking libraries to implement certain functions. For example, before version 1.15.8, calling external command-line tools still required using the Lua library os.execute instead of lua-resty-shell. Similarly, in OpenResty, reading and writing files can only be done using Lua’s I/O library. There is no non-blocking alternative.

Secondly, OpenResty is very cautious when it comes to such optimizations. For example, lua-resty-core has been developed for a long time, but it has never been enabled by default. You have to manually call require 'resty.core'. It wasn’t until the latest version, 1.15.8, that it was officially endorsed.

Lastly, the maintainers of OpenResty prefer to generate highly optimized Lua code through compilers and DSLs to standardize the way blocking operations are called. That’s why there haven’t been efforts to provide flag options within the OpenResty platform itself. Of course, I have reservations about whether this approach can solve practical problems.

From the perspective of external developers, how to avoid blocking operations is a more practical question. We can extend Lua code analysis tools such as luacheck to detect and warn about common blocking operations. We can also directly modify _G to prohibit or rewrite certain functions, for example:

resty -e '_G.ngx.print = function()
ngx.say("hello")
end
ngx.print()'

With this example code, you can directly modify the ngx.print function.

Question 3: Will the NYI operations of LuaJIT have a significant impact on performance? #

Q: The loadstring function is never on the NYI list of LuaJIT. Will it have a significant impact on performance?

A: Regarding the NYI (Not Yet Implemented) operations of LuaJIT, we don’t need to overreact. For operations that can be JIT compiled, using the JIT approach is naturally the best option. However, for operations that cannot be JIT compiled yet, it doesn’t mean that we cannot use them at all.

When it comes to performance optimization, we need to approach it based on statistical and scientific methods, which is also the purpose of flame graph sampling. Premature optimization is the root of all evil. We only need to optimize hot code that is called frequently and consumes a large amount of CPU.

Regarding the issue of loadstring, we only call it and reload the code when there are code changes, regardless of how many requests are made. Therefore, it is not a frequent operation, and we don’t need to worry about its impact on the overall system performance.

Taking the second blocking issue into consideration, in OpenResty, there are times when we call blocking file I/O operations during the initialization and init-worker phases. These operations have a higher impact on performance compared to NYI operations. However, because they are only executed once when the server starts, we can accept it.

To summarize, performance optimization should be viewed from a macro perspective, which is an important point you need to pay attention to. Otherwise, being obsessed with a certain detail may lead to spending a lot of time on optimization without achieving significant results.

Question 4: Can dynamic upstream be implemented on its own? #

Q: For the dynamic upstream, my approach is to set up 2 upstreams for a service and select different upstreams based on routing conditions. When the machine’s IP changes, I directly modify the IP in the upstream. In comparison to using balancer_by_lua directly, what are the disadvantages or pitfalls of this approach?

A: Looking at this case alone, the advantage of balancer_by_lua is that it allows users to choose the load balancing algorithm, such as round-robin or chash, or even their own implemented algorithms. It is flexible and has high performance.

If we look at the final result, it is the same when using the routing rule approach. However, implementing upstream health checks on your own adds a lot of additional workload.

We can also extend this question to scenarios like AB testing, where different upstreams are needed. How should we implement it?

You can decide which upstream to use in the balancer_by_lua phase based on the URI, host, parameters, etc. You can also use an API Gateway to turn these judgments into routing rules. In the initial access phase, you can determine which route to use based on the judgment and then find the specified upstream based on the binding relationship between the route and the upstream. This is a common practice for API Gateways, and we will discuss it more specifically in the practical section later on.

Question 5: Is shared dictionary caching necessary? #

Q: In actual production applications, I think the shared dictionary caching layer is necessary. It seems that everyone only remembers the benefits of LRU cache, such as no data format restrictions, no need for deserialization, no need to calculate memory space based on key/value size, independent worker processes without contention, no read/write locks, and high performance, etc.

However, people seem to forget one fatal weakness of LRU cache, which is that its lifecycle is tied to the worker process. Every time Nginx reloads, this cache will be completely lost. At this time, if there is no shared dictionary, the L3 data source will be overwhelmed.

Of course, this is a situation with high concurrency, but since caching is used, it indicates that the business volume is definitely not small, so the previous analysis still applies. I wonder if my point of view is correct?

A: In most cases, what you said is correct. Shared dictionary caching is indeed necessary because it will not be lost during reload. However, there is also a special case. If in the init or init_worker phase, you can proactively retrieve all the data from the L3 data source, then having only the LRU cache is also acceptable.

For example, in the open-source API gateway APISIX, the data source is in etcd. It only retrieves data from etcd and caches it in the LRU cache during the init_worker phase. Subsequent cache updates are obtained through the watch mechanism of etcd. In this way, even if Nginx is reloaded, there will be no cache storm.

Therefore, when making choices for technology, we can have preferences, but we should not make absolute generalizations because there is no silver bullet that can fit all caching scenarios. It is a good idea to build a minimal viable solution based on the needs of the actual scenario, and then gradually add to it.

That’s all for today’s answers to these questions. Finally, feel free to write down your questions in the comments area, and I will continue to answer them. I hope that through communication and answering questions, I can help you turn what you have learned into what you have gained. You are also welcome to share this article and let’s communicate and improve together.