20 Exceeding Web Server Privileges With Progress and Scheduled Tasks

20 Exceeding Web Server Privileges with Progress and Scheduled Tasks #

Hello, I’m Wen Ming.

Earlier, we introduced the OpenResty API, shared dictionary caching, and cosocket. The functionalities they provide are still within the scope of Nginx and the web server, offering a lower development cost and easier maintenance, as well as a programmable web server.

However, OpenResty doesn’t stop there. Today, we will introduce a few functionalities in OpenResty that go beyond the web server. These include cron jobs, privileged processes, and non-blocking ngx.pipe.

Scheduled Tasks #

In OpenResty, there are times when we need to execute certain tasks periodically in the background, such as data synchronization and log cleaning. If you were to design it, how would you do it? The most straightforward approach is to provide an API interface to complete these tasks, and then use the system’s crontab to schedule curl to access this interface, thereby achieving this requirement in a roundabout way.

However, this not only creates a sense of disconnection but also brings higher complexity to operations. Therefore, OpenResty provides ngx.timer to solve these types of requirements. You can think of ngx.timer as a simulated client request in OpenResty used to trigger the corresponding callback function.

In fact, OpenResty’s scheduled tasks can be divided into the following two types:

  • ngx.timer.at, used to execute one-time scheduled tasks;
  • ngx.timer.every, used to execute fixed-period scheduled tasks.

Do you remember the question I left you with at the end of the previous lesson? The question was how to overcome the limitation of not being able to use cosockets in init_worker_by_lua. The answer is actually ngx.timer.

The following code starts a scheduled task with a delay of 0. It starts the callback function handler and uses cosockets in this function to access a website:

init_worker_by_lua_block {
        local function handler()
            local sock = ngx.socket.tcp()
            local ok, err = sock:connect("www.baidu.com", 80)
        end

        local ok, err = ngx.timer.at(0, handler)
    }

This way, we bypass the limitation of not being able to use cosockets at this stage.

Returning to the user requirement mentioned at the beginning of this section, ngx.timer.at does not solve the requirement of running periodically. In the code example above, it is a one-time task.

So, how can we achieve periodic execution? At first glance, based on the ngx.timer.at API, you have two options:

  • You can use an infinite while loop in the callback function, complete the task, sleep for a period of time, and implement the periodic task on your own;
  • You can create another new timer at the end of the callback function.

However, before making a choice, there is one thing we need to clarify: a timer is essentially a request, although this request is not initiated by the terminal. For a request, it needs to exit after completing its own task; it can’t stay resident forever, otherwise it will easily cause various resource leaks.

Therefore, the first solution of using while true to implement a periodic task on your own is not reliable. The second solution, although feasible, is not easy to understand since it recursively creates timers.

So, is there a better solution? In fact, the ngx.timer.every API introduced later in OpenResty is specifically designed to solve this problem. It is a solution closer to crontab.

However, the downside is that once a timer is started, you have no chance to cancel this scheduled task, after all, ngx.timer.cancel is still a todo feature.

At this point, you will face a problem: scheduled tasks are running in the background and cannot be canceled; if there are a large number of scheduled tasks, it is easy to exhaust system resources.

Therefore, OpenResty provides the directives lua_max_pending_timers and lua_max_running_timers to limit this. The former represents the maximum number of pending scheduled tasks, and the latter represents the maximum number of currently running scheduled tasks.

You can also use Lua API to obtain the values of the currently pending and running scheduled tasks. Here are two examples:

content_by_lua_block {
            ngx.timer.at(3, function() end)
            ngx.say(ngx.timer.pending_count())
        }

This code will print 1, indicating that there is 1 scheduled task waiting to be executed.

content_by_lua_block {
            ngx.timer.at(0.1, function() ngx.sleep(0.3) end)
            ngx.sleep(0.2)
            ngx.say(ngx.timer.running_count())
        }

This code will print 1, indicating that there is 1 scheduled task currently running.

Privileged Processes #

Next, let’s talk about privileged processes. We all know that Nginx is mainly divided into the master process and the worker process. The worker process is the one that actually handles user requests. We can use the process.type API provided by lua-resty-core to obtain the type of process. For example, you can use resty to run the following function:

$ resty -e 'local process = require "ngx.process"
ngx.say("process type:", process.type())'

You will see that it returns not worker, but single. This means that the Nginx started by resty only has a worker process and no master process. In fact, this is true. In the implementation of resty, you can see that the following configuration line disables the master process:

master_process off;

OpenResty extends Nginx and adds a special type of process called the privileged agent. The privileged agent process is very special:

  • It does not listen on any ports, which means it does not provide any services externally.
  • It has the same privileges as the master process, usually the root user’s privileges, which allows it to perform tasks that the worker process cannot.
  • The privileged agent process can only be enabled in the init_by_lua context.
  • Furthermore, the privileged agent process only makes sense when run in the init_worker_by_lua context because there is no request triggering, so it won’t go through the content, access and other contexts.

Next, let’s look at an example of enabling the privileged agent process:

init_by_lua_block {
    local process = require "ngx.process"

    local ok, err = process.enable_privileged_agent()
    if not ok then
        ngx.log(ngx.ERR, "enables privileged agent failed error:", err)
    end
}

By running this code to enable the privileged agent process and then starting the OpenResty service, you will see that the Nginx process now has the presence of the privileged agent:

nginx: master process
nginx: worker process
nginx: privileged agent process

However, if the privilege only runs once in the init_worker_by_lua phase, it is obviously not a good idea. So how do we trigger the privileged agent process?

Yes, the answer is hidden in the knowledge we just talked about. Since it does not listen on any ports and cannot be triggered by terminal requests, the only way is to use ngx.timer we just introduced to trigger it periodically:

init_worker_by_lua_block {
    local process = require "ngx.process"

    local function reload(premature)
        local f, err = io.open(ngx.config.prefix() .. "/logs/nginx.pid", "r")
        if not f then
            return
        end
        local pid = f:read()
        f:close()
        os.execute("kill -HUP " .. pid)
    end

    if process.type() == "privileged agent" then
         local ok, err = ngx.timer.every(5, reload)
        if not ok then
            ngx.log(ngx.ERR, err)
        end
    end
}

The above code implements the functionality of sending an HUP signal to the master process every 5 seconds. Of course, you can also implement more interesting functionalities based on this, such as polling the database to see if there are any privileged agent tasks and execute them. Because the privileged agent process has root privileges, it obviously has a bit of a “backdoor” program taste.

Non-blocking ngx.pipe #

Finally, let’s take a look at the non-blocking ngx.pipe. In the code example we just mentioned, we used the standard library of Lua to execute external command lines and send signals to the master process:

os.execute("kill -HUP " .. pid)

This operation is naturally blocking. So, in OpenResty, is there a non-blocking method to call external programs? After all, if you are using OpenResty as a complete development platform rather than just a web server, this is your requirement.

For this purpose, the lua-resty-shell library was born, which allows you to call command lines non-blocking:

$ resty -e 'local shell = require "resty.shell"
local ok, stdout, stderr, reason, status =
    shell.run([[echo "hello, world"]])
    ngx.say(stdout)

This code can be considered as another way of writing “hello world” by calling the system’s echo command for output. Similarly, you can use resty.shell to replace the os.execute call in Lua.

We know that the underlying implementation of lua-resty-shell depends on the ngx.pipe API in lua-resty-core. Therefore, the example of using lua-resty-shell to print “hello world” can be written using ngx.pipe as follows:

$ resty -e 'local ngx_pipe = require "ngx.pipe"
local proc = ngx_pipe.spawn({"echo", "hello world"})
local data, err = proc:stdout_read_line()
ngx.say(data)'

This is actually the implementation code of lua-resty-shell. You can refer to the documentation and test cases of ngx.pipe for more usage methods. I won’t go into further detail here.

Final Thoughts #

With that, I have finished covering the main content for today. From the various features mentioned above, we can see that OpenResty is not only striving to improve upon Nginx but also attempting to move towards a more universal platform. It is hoped that developers can unify their technology stacks and use OpenResty to address their development needs. This is quite favorable for operations, as deploying just one OpenResty reduces maintenance costs.

Finally, I leave you with a question to ponder. Since there may be multiple Nginx workers, the timer will run in each worker, which is generally not acceptable in most scenarios. How can we ensure that the timer only runs once?

Feel free to leave a comment with your solution, and please share this article with your colleagues and friends. Let us exchange ideas and progress together.