17 Business Scripts Why Programmable Subscription Based Caching Services Are More Useful

17 Business Scripts Why Programmable Subscription-Based Caching Services Are More Useful #

Hello, I am Xu Changlong.

We are accustomed to using cache clusters to cache data, but this common memory cache service has many inconveniences. For example, clusters occupy a large amount of memory, cannot atomically modify a specific field in the cache, and there is network loss during multiple communications.

Many times, we do not need all the fields when retrieving data, but because the cache does not support filtering, the performance will decrease significantly in scenarios where data is obtained in batches. These problems are even more evident in scenarios with high read and write operations.

Is there any way to solve these problems? In this lesson, I will introduce you to another interesting data caching method - programmable subscription-based cache service. After learning today’s content, I believe you will have new thoughts on how to implement cache services.

Cache as a Service #

The concept of programmable subscription-based cache service means that we can independently implement a data caching service and provide it directly to business services. This implementation can proactively cache data and provide data organization and computation services based on the needs of the business.

Although implementing a self-designed data caching service may be cumbersome, it also has many advantages. Apart from increasing throughput capacity, we can also implement more interesting custom functions, improve computational capabilities, and even allow our cache to directly provide basic data query services to external users.

Image

The above image illustrates the structure of a self-implemented caching system. It can be said that this type of cache performs better in terms of performance and efficiency because its data processing method differs from the traditional approach.

In the traditional approach, the cache service does not process data and stores serialized strings, making it difficult to directly modify most data. When we use this type of cache for external services, the business service needs to retrieve all data into local memory, then iterate and process it before it can be used.

On the other hand, a programmable cache can store data structurally in a map, which saves more memory compared to the traditional approach of storing serialized strings.

Moreover, our service no longer needs to fetch data from other services for computation, which saves a significant amount of time spent on network interactions. This makes it suitable for scenarios with high real-time requirements. If our hot data volume is large, it can be combined with embedded engines like RocksDB to provide a large amount of data service using limited memory.

In addition to regular data cache services, programmable cache also supports features such as data filtering, statistical computation, querying, sharding, and data aggregation. For querying services, I would like to supplement that it is recommended to provide services to external users through a simple text protocol similar to Redis, as this will provide better performance compared to using the HTTP protocol.

Lua Script Engine #

While caching can improve the flexibility of business services, it also has many drawbacks. The biggest drawback is that after modifying the business logic, we need to restart the service to update our logic. Since a large amount of data is stored in memory, each restart requires time-consuming preheating and synchronization costs.

Therefore, we need to upgrade the design again. In this case, a Lua script engine is a good choice. Lua is a lightweight embedded scripting language that can be used to implement a high-performance and hot-updatable script service, enabling efficient and flexible interaction with embedded services.

I have drawn a diagram to illustrate how to implement a programmable cache service using Lua scripts:

Image

As shown in the above figure, we provide Kafka consumption, periodic task management, in-memory caching, support for various data formats, and various data driver adapters for these services. Furthermore, in order to reduce the frequent restarts caused by logic changes, we embed a Lua script engine in the caching service, enabling dynamic updates of the business logic.

Using the Lua engine is very convenient. Let’s take a look at an example in Go language that demonstrates how to embed Lua. The code is as follows:

package main

import "github.com/yuin/gopher-lua"

// VarChange is a function invoked by Lua
func VarChange(L *lua.LState) int {
   lv := L.ToInt(1)            // Get the first parameter of the called function and convert it to int
   L.Push(lua.LNumber(lv * 2)) // Multiply the parameter by 2 and return the result to Lua
   return 1                    // Return the number of result parameters
}

func main() {
   L := lua.NewState() // Create a new Lua thread
   defer L.Close()     // Automatically recycle the thread after execution

   // Register the callable function in the Lua script
   // When the "varChange" function is called in Lua, the Go function VarChange registered here will be executed
   L.SetGlobal("varChange", L.NewFunction(VarChange))

   // Load the Lua script directly
   // The script content is as follows:
   // print "hello world"
   // print(varChange(20)) # call the Go function declared in Lua
   if err := L.DoFile("hello.lua"); err != nil {
      panic(err)
   }

   // Or execute a string directly
   if err := L.DoString(`print("hello")`); err != nil {
      panic(err)
   }
}

// The output after execution is:
// hello world
// 40
// hello

From this example, we can see that the Lua engine can directly execute Lua scripts, and Lua scripts can call any registered Go function and exchange variables with them.

Let’s recall what we are doing: a data caching service. So, we need Lua to be able to access and modify the cached data in the service. How does Lua exchange data with the embedded language? Let’s take a look at an example of their mutual calls and data exchange:

package main

import (
   "fmt"
   "github.com/yuin/gopher-lua"
)

func main() {
   L := lua.NewState()
   defer L.Close()
   // Load the script
   err := L.DoFile("vardouble.lua")
   if err != nil {
      panic(err)
   }
   // Invoke the function in the Lua script
   err = L.CallByParam(lua.P{
      Fn:      L.GetGlobal("varDouble"), // Specify the function name to be called
      NRet:    1,                        // Specify the number of return values
      Protect: true,                     // Return error on failure
   }, lua.LNumber(15))                     // Support multiple parameters
   if err != nil {
      panic(err)
   }
   // Get the return result
   ret := L.Get(-1)
   // Clean up and wait for the next use
   L.Pop(1)

   // Convert the result type for easy output
   res, ok := ret.(lua.LNumber)
   if !ok {
      panic("unexpected result")
   }
   fmt.Println(res.String())
}

// The output is:
// 30

The content of “vardouble.lua” is as follows:

function varDouble(n)
    return n * 2
end

Through this mechanism, Lua and Go can exchange data and call each other. For cache services that require high performance, it is recommended to manage the instance object pool of loaded Lua scripts and LState script objects together. This can make it more convenient and avoid loading the script every time Lua is called. It also facilitates the use of multiple threads and coroutines.

Managing Lua Scripts #

Through the previous explanation, we can see that in actual usage, Lua runs many instances in memory. In order to better manage and improve efficiency, it is best to use a script management system to manage all Lua instances, achieving unified script updates, compilation caching, resource scheduling, and singleton control.

Lua scripts themselves are single-threaded, but they are very lightweight, with approximately 144KB of memory consumption per instance. Some services can run hundreds or thousands of Lua instances.

To improve the parallel processing capability of the service, we can start multiple coroutines, each running an independent Lua thread. For this purpose, the gopher-lua library provides an implementation similar to a thread pool. Through this approach, we do not need to create and close Lua instances frequently. The specific example provided by the official documentation is as follows:

// A pool to save LState instances
type lStatePool struct {
    m     sync.Mutex
    saved []*lua.LState
}

// Get an LState
func (pl *lStatePool) Get() *lua.LState {
    pl.m.Lock()
    defer pl.m.Unlock()
    n := len(pl.saved)
    if n == 0 {
        return pl.New()
    }
    x := pl.saved[n-1]
    pl.saved = pl.saved[0 : n-1]
    return x
}

// Create a new LState
func (pl *lStatePool) New() *lua.LState {
    L := lua.NewState()
    // setting the L up here.
    // load scripts, set global variables, share channels, etc...
    // Here we can do some initialization
    return L
}

// Put the LState object back to the pool for later use
func (pl *lStatePool) Put(L *lua.LState) {
    pl.m.Lock()
    defer pl.m.Unlock()
    pl.saved = append(pl.saved, L)
}

// Shutdown all handles
func (pl *lStatePool) Shutdown() {
    for _, L := range pl.saved {
        L.Close()
    }
}

// Global LState pool
var luaPool = &lStatePool{
    saved: make([]*lua.LState, 0, 4),
}

// Task running inside a coroutine
func MyWorker() {
    // Get an LState from the pool
    L := luaPool.Get()
    // Put the LState back to the pool after the task is done
    defer luaPool.Put(L)
    // Various Lua script tasks can be executed using the LState variable
    // For example, calling the varDouble function from the previous example
    err := L.CallByParam(lua.P{
        Fn:      L.GetGlobal("varDouble"), // Specify the function name to call
        NRet:    1,                        // Specify the number of return values
        Protect: true,                     // Return error if an error occurs
    }, lua.LNumber(15)) // Multiple arguments are supported here
    if err != nil {
        panic(err) // For demonstration purposes only, not recommended to use panic in actual production
    }
}

func main() {
    defer luaPool.Shutdown()
    go MyWorker() // Start a coroutine
    go MyWorker() // Start another coroutine
    /* etc... */
}

Using this approach, we can create a batch of LStates in advance, let them load all the necessary Lua scripts, and directly call them when executing Lua scripts to provide external services, improving the reuseability of our resources.

Variable Interaction #

In fact, our data can be stored in both Lua and Go, and we can retrieve each other’s data by calling each other. I personally prefer to encapsulate the data in Go and provide it for Lua to call, mainly because it is relatively standardized and easier to manage. After all, scripts can have overhead.

As mentioned earlier, we combine some data with struct and map to provide data services to the outside world. So how do Lua and Golang exchange struct-like data?

Here I chose the example provided by the official website, but added a lot of comments to help you understand this interaction process.

    // Struct used for exchange in Go
    type Person struct {
        Name string
    }
    
    // Assign a type name for this type
    const luaPersonTypeName = "person"
    
    // Declare this type in the LState object, which is only executed once during initialization of LState
    // Registers my person type to given L.
    func registerPersonType(L *lua.LState) {
        //Declare this type in LState
        mt := L.NewTypeMetatable(luaPersonTypeName)
        //Specify that person corresponds to the type identifier type
        //In this way, person in Lua is like a class declaration
        L.SetGlobal("person", mt)
        
        //Static attributes of person in Lua
        //Define the static methods of person in Lua
        //After this declaration, person.new can be called in Lua to call the newPerson function in Go
        L.SetField(mt, "new", L.NewFunction(newPerson))
        
        //Instance of person created after person.new in Lua is of type table. You can think of table as an object in Lua
        //The following statement mainly defines a set of methods for table, which can be called in Lua
        //personMethods is a map[string]LGFunction 
        //Used to tell Lua the correspondence between methods and Go functions
        L.SetField(mt, "__index", L.SetFuncs(L.NewTable(), personMethods))
    }
    
    //All methods of the person instance object
    var personMethods = map[string]lua.LGFunction{
        "name": personGetSetName,
    }
    
    //Constructor
    //When person.new is called in Lua, this Go function will be triggered
    func newPerson(L *lua.LState) int {
        //Initialize the go struct object and set the name to the first parameter 
        person := &Person{L.CheckString(1)}
        //Create a lua userdata object for data transmission
        //Generally, userdata wraps go struct, while table is Lua's own object
        ud := L.NewUserData()
        ud.Value = person //Put go struct into the object
        
        //Set the lua object type as person type
        L.SetMetatable(ud, L.GetTypeMetatable(luaPersonTypeName))
        
        //Return the created object to Lua
        L.Push(ud)
        
        //Tell Lua script the number of returned data
        return 1
    }
    
    //Checks whether the first lua argument is a *LUserData 
    //with *Person and returns this *Person.
    func checkPerson(L *lua.LState) *Person {
        //Check if the first argument is a userdata passed by other languages
        ud := L.CheckUserData(1)
        
        //Check if the conversion is successful
        if v, ok := ud.Value.(*Person); ok {
            return v
        }
        
        L.ArgError(1, "person expected")
        return nil
    }
    
    //Getter and setter for the Person#Name
    func personGetSetName(L *lua.LState) int {
        //Check the first stack, if there is only one, then there is only the value modification parameter
        p := checkPerson(L)
        if L.GetTop() == 2 {
            //If there are two in the stack, then the second is the value modification parameter
            p.Name = L.CheckString(2)
            
            //Indicates no data to be returned, only data modification
            return 0
        }
        
        //If there is only one in the stack, it is to obtain the name value operation and return the result
        L.Push(lua.LString(p.Name))
        
        //Indicates that one parameter will be returned
        return 1
    }
    
    func main() {
        //Create a lua LState
        L := lua.NewState()
        defer L.Close()
        
        //Initialize and register
        registerPersonType(L)
        
        //Execute the lua script
        if err := L.DoString(`
            //Create a person and set his name
            p = person.new("Steven")
            print(p:name()) -- "Steven"
            
            //Modify his name
            p:name("Nico")
            print(p:name()) -- "Nico"
        `); err != nil {
            panic(err)
        }
    }

As you can see, we can easily complete the mutual calling and data exchange through the Lua script engine, thereby implementing many practical functions, and even use a small amount of data to directly write in the form of Lua script to load services. - In addition, gopher-lua also provides module functions to help us better manage scripts and code. If you are interested, you can delve into it on your own, and the reference materials are here.

Cache Preheating and Data Sources #

After understanding Lua, let’s take a look at how the service loads data. When the service starts, we need to load the data cache into the cache and do a cache preheating. After all the data is loaded, the API port will be opened to provide external services.

If Lua scripts are used during the loading process, different formats of data can be adapted and processed when the service starts, which also makes the data sources more diverse.

The common data source is the offline file generated by the big data mining cycle. The latest files are mounted through NFS or HDFS for regular refreshing and loading. This method is suitable for large and slow-updating data. The downside is that data needs to be organized during loading. If the situation is complex enough, it may take 1-10 minutes to load 800M data.

In addition to using files, we can also recover data by scanning data tables after the program starts. However, this puts pressure on the database, so it is recommended to use dedicated replicas. Compared with the disk offline file method, this method loads more slowly.

Both of the above methods are slow, and we can also embed RocksDB into the process, which can greatly increase our data storage capacity and achieve high-performance reading and writing of memory and disk. However, the cost is a relatively lower query performance.

RocksDB data can be generated by big data and saved as RocksDB-formatted database files, which can be directly loaded into our service. This method greatly reduces the time for organizing and loading data during system startup, enabling more data queries.

In addition, if we have relational data query requirements locally, we can also embed the SQLite engine, through which various relational data queries can be done. The data for SQLite can also be generated in advance using tools and used directly by our service. But please note that this database should not exceed 100,000 records, otherwise it may cause the service to freeze.

Finally, for offline file loading, it is best to have a CheckSum file or a similar type of file to check the integrity of the file before loading it. Since we are using a network disk, we are not sure whether the file is being copied. We need some tips to ensure the integrity of our data. The most direct way is to generate a file with the same name after each copy is completed, and internally record its CheckSum for easy verification before loading.

Offline files can help us quickly realize data sharing and unification among multiple nodes. If we need multiple nodes to maintain eventual consistency of data, we need to use offline file + synchronous subscription to synchronize data.

Subscription-based Data Synchronization and Startup Synchronization #

So, how do we synchronize and update our data?

In normal circumstances, our data comes from multiple underlying data services. If we want to synchronize changes to the data in real time, we usually use the binlog subscription to synchronize the change information to Kafka, and then use Kafka’s group consumption to notify the caches distributed in different clusters.

The services that receive the message changes will trigger a Lua script to perform data synchronization updates. Through Lua, we can trigger synchronous updates to other related caches. For example, when a user purchases a product, we need to synchronize and refresh their points, orders, and the number of messages in their message list.

Periodic Tasks #

When it comes to task management, we have to mention periodic tasks. Periodic tasks are usually used for data refreshing and analysis. By combining periodic tasks with Lua custom logic scripts, we can achieve regular statistics, providing us with more convenience.

In the process of executing tasks at regular intervals or delaying refresh, a common approach is to manage tasks using a time wheel. With this approach, timed tasks can be triggered as events, making it easy to manage the list of pending tasks in memory and parallelize multiple periodic tasks without the need for continual querying using sleep loops. If you are interested in time wheels, you can click here to see a specific implementation.

In addition, as mentioned earlier, many of our data is updated in batches through offline files. If we update once per hour, then the newly updated data within one hour needs to be synchronized.

Generally, we handle it like this: when we load the offline files during service startup, we save the time when the offline files are generated. We use this time to filter the messages in the data update queue. When our queue task progress catches up to a time near the current time, we then start the data service for external use.

Summary #

In the field of multi-reading and multi-writing services, there are many real-time interactive services that require high data real-time processing. It is difficult to meet the requirements of these services with a centralized cache. Therefore, in the industry, most services provide real-time interactive services by using in-memory data. However, this approach is very cumbersome to maintain, and data recovery is required after restarting. In order to achieve business logic updates without restarting, the industry usually uses hot update solutions with embedded scripts.

The commonly used general-purpose script engine is Lua, which is a very popular and convenient script engine. In the industry, many well-known games and services use Lua to implement customized business functions for high-performance services, such as Nginx, Redis, etc.

By combining Lua with our customized cache service, we can create powerful features to deal with different scenarios. Since Lua is very memory efficient, we can open thousands of Lua threads in the process, and even provide state machine-like services to clients with one LState thread per user.

Using the above method, combined with Lua and static languages exchanging data and calling each other, along with our task management and various data-driven approaches, we can create an almost universal cache service. I recommend you to personally practice this approach in some small projects, which I believe will allow you to view familiar services from different perspectives and gain more insights.

Thought question #

How to make a Go goroutine access the data stored in an LState?

Feel free to leave a comment in the discussion area to communicate and discuss with me. See you in the next class!