15 Lightweight Object Recycling Station Recycler, Object Pool Technology Analysis

15 Lightweight object recycling station Recycler, object pool technology analysis #

In the previous two lessons, we learned about the high-performance design principles of Netty’s memory pool. In this lesson, we will introduce another pooling technology in Netty: the Recycler object pool. When you first come across the concept of object pooling in Netty, you might have similar questions:

What is the difference between an object pool and a memory pool? Do they have any connection?
There are many ways to implement an object pool, did Netty implement it on its own? How did they implement it?
How should we use an object pool in practice?

With these questions in mind, let’s dive into today’s lesson.

Getting Started with Recycler #

To get a hands-on understanding of how Recycler is used, let’s go through an example. Let’s assume we have a User class and we need to reuse User objects. Here is the implementation code:

public class UserCache {

    private static final Recycler<User> userRecycler = new Recycler<User>() {

        @Override

        protected User newObject(Handle<User> handle) {

            return new User(handle);

        }

    };

    static final class User {

        private String name;

        private Recycler.Handle<User> handle;

        public void setName(String name) {

            this.name = name;

        }

        public String getName() {

            return name;

        }

        public User(Recycler.Handle<User> handle) {

            this.handle = handle;

        }

        public void recycle() {

            handle.recycle(this);

        }

    }

    public static void main(String[] args) {

        User user1 = userRecycler.get(); // 1. Get a User object from the object pool

        user1.setName("hello"); // 2. Set the properties of the User object

        user1.recycle(); // 3. Recycle the object to the object pool

        User user2 = userRecycler.get(); // 4. Get an object from the object pool

        System.out.println(user2.getName());

System.out.println(user1 == user2);

The output in the console is as follows:

hello

true

In the code example, an object pool instance, userRecycler, is defined. It implements the newObject() method, which is called to create a new object if there are no available objects in the object pool. Additionally, a Recycler.Handle object needs to be created and bound to the User object. This allows us to retrieve a User object from the object pool using userRecycler.get(), and when the object is no longer needed, it can be recycled into the object pool by calling the recycle() method implemented in the User class.

The usage of Recycler is quite simple, and it can be used as a utility class in a project.

Design Philosophy of Recycler #

Both object pools and memory pools are used to improve the concurrency capability of Netty. We know that frequent creation and destruction of objects in Java can be very costly, so many people cache some common objects and prioritize retrieving object instances from the object pool when needed. By reusing objects, we not only avoid the performance overhead of frequent creation and destruction, but also make it friendly to JVM GC. This is the purpose of an object pool.

Recycler is a custom lightweight object recycling station provided by Netty. With the help of Recycler, objects can be obtained and recycled. Since Recycler is Netty’s own implementation of an object pool, how is it designed? First, let’s take a look at the internal structure of Recycler, as shown in the following diagram:

From the UML diagram of Recycler, we can see that it contains four core components: Stack, WeakOrderQueue, Link, and DefaultHandle. Next, we will introduce them one by one.

First, let’s take a look at the relationship between each component in the internal structure of Recycler, as described in the following diagram:

The first core component is Stack. Stack is the top-level data structure of the entire object pool, describing the construction of the entire object pool and used to store objects recycled by the current thread. In a multi-threaded scenario, to avoid lock competition issues, each thread will hold its own object pool, and FastThreadLocal is used internally to achieve thread privatization. You can understand FastThreadLocal as ThreadLocal in Java, which will be introduced in a dedicated course later.

It is necessary to understand the data structure of Stack first. Let’s take a look at the source code definition of Stack:

static final class Stack<T> {

    final Recycler<T> parent; // The owner of this Recycler

    final WeakReference<Thread> threadRef; // A weak reference to the owning thread
    final AtomicInteger availableSharedCapacity; // The maximum number of recycled objects that other threads can store when recycling objects from different threads

    final int maxDelayedQueues; // The maximum number of WeakOrderQueues

    private final int maxCapacity; // The maximum size of the object pool, default maximum size is 4k

    private final int ratioMask; // Controls the recycle ratio of objects, default ratio is 1/8

    private DefaultHandle<?>[] elements; // An array for storing cached data

    private int size; // The number of cached DefaultHandle objects

    private int handleRecycleCount = -1; 
    // The three important nodes of the WeakOrderQueue linked list

    private WeakOrderQueue cursor, prev;

    private volatile WeakOrderQueue head;
    // Omitted other code

}

According to the internal structure diagram of Recycler, Stack is used to store the DefaultHandle array of cached data and maintains the three important nodes in the WeakOrderQueue linked list. We will introduce the concept of WeakOrderQueue in detail later. In addition, I have already marked the other important properties of Stack in the source code as comments. Most of them are already clear. One property that may be difficult to understand is availableSharedCapacity. Each Stack maintains a linked list of WeakOrderQueue, and each WeakOrderQueue node stores objects released by threads other than the current thread. For example, in the diagram, Thread A represents the current thread, and the linked list of WeakOrderQueue stores objects released by other threads such as Thread B and Thread C. The initialization of availableSharedCapacity is new AtomicInteger(max(maxCapacity / maxSharedCapacityFactor, LINK_CAPACITY)), with a default size of 16K. When other threads recycle objects, the number of objects created by Thread A cannot exceed availableSharedCapacity. One question is, since Stack is private to each thread, why does availableSharedCapacity need to be an AtomicInteger? This is because multiple threads such as Thread B and Thread C may create WeakOrderQueues for Thread A, and there may be simultaneous operations on availableSharedCapacity.

The second component to introduce is WeakOrderQueue. WeakOrderQueue is used to store objects recycled from other threads to the current thread and, at an appropriate time, Stack will harvest these objects from the WeakOrderQueue of other threads. As shown in the above diagram, when Thread B recycles memory allocated by Thread A, it is put into the WeakOrderQueue of Thread A.

The third component is Link. Each WeakOrderQueue contains a Link linked list, and recycled objects are stored on nodes in the Link linked list. Each Link node by default stores 16 objects. When each Link node becomes full, a new Link node is created and put at the tail of the linked list.

The fourth component is DefaultHandle. The objects being recycled are stored in DefaultHandle instances, which are used by both Stack and WeakOrderQueue to store recycled objects. Stack contains an elements array, which stores DefaultHandle instances. Each Link node stores 16 objects, which are also represented using DefaultHandle in DefaultHandle instances. So far, we have introduced the memory structure of Recycler, and we have a preliminary understanding of Recycler. As a high-performance object pool, how does Netty ensure the efficient allocation and recovery of objects in a multi-threaded scenario? Next, let’s take a look at the principles of object acquisition and recovery in Recycler.

Obtaining Objects from Recycler #

Previously, we introduced how Recycler is used. From the code example, we can see that the entry point for obtaining objects from the object pool is in the Recycler#get() method, which directly locates the source code:

public final T get() {
    if (maxCapacityPerThread == 0) {
        return newObject((Handle<T>) NOOP_HANDLE);
    }
    Stack<T> stack = threadLocal.get(); // Get the Stack cached by the current thread
    DefaultHandle<T> handle = stack.pop(); // Pop a DefaultHandle object from the Stack
    if (handle == null) {
        handle = stack.newHandle();
        handle.value = newObject(handle); // Create an object and save it in DefaultHandle
    }
    return (T) handle.value;
}

The logic of the Recycler#get() method is very clear. First, it uses FastThreadLocal to get the unique stack cache Stack of the current thread, and then tries to pop a DefaultHandle object instance from the top of the stack. If there are no available DefaultHandle object instances in the stack, it will call newObject to generate a new object and complete the binding between the handle and the user object and the stack.

So how does the Stack pop a DefaultHandle object instance from the elements array? Is it just taking an instance from the elements array? Let’s trace the source code of stack.pop() together:

DefaultHandle<T> pop() {
    int size = this.size;
    if (size == 0) {
        // Try to transfer some objects from the objects recycled by other threads to the elements array
        if (!scavenge()) {
            return null;
        }
        size = this.size;
    }
    size--;
    DefaultHandle ret = elements[size]; // Pop the instance from the top of the stack
    elements[size] = null;
    if (ret.lastRecycledId != ret.recycleId) {
        throw new IllegalStateException("recycled multiple times");
    }
    ret.recycleId = 0;
    ret.lastRecycledId = 0;
    this.size = size;
    return ret;
}

If there are available object instances in the elements array of the stack, it directly pops the object instance. If there are no available object instances in the elements array, it will call the scavenge method. The role of scavenge is to transfer some object instances from the objects recycled by other threads to the elements array, which means it will try to transfer some object instances from the WeakOrderQueue linked list. Each Stack has a WeakOrderQueue linked list, and each WeakOrderQueue node maintains the objects recycled by the corresponding asynchronous thread. So what strategy is used to transfer object instances from the WeakOrderQueue linked list? Let’s continue tracing the source code of scavenge:

boolean scavenge() {
    // Try to transfer object instances from the WeakOrderQueue to the Stack
    if (scavengeSome()) {
        return true;
    }
    // If the transfer fails, reset the cursor pointer to the head node
    prev = null;
    cursor = head;
    return false;
}

boolean scavengeSome() {
    WeakOrderQueue prev;
    WeakOrderQueue cursor = this.cursor; // The cursor pointer points to the reading position of the current WeakOrderQueue linked list
    // If the cursor pointer is null, it means getting objects from the WeakOrderQueue linked list for the first time
    if (cursor == null) {
        prev = null;
        cursor = head;
        if (cursor == null) {
            return false;
        }
    } else {
        prev = this.prev;
    }
    boolean success = false;
    // Continuously loop to find an available object instance from the WeakOrderQueue linked list
    do {
        // Try to transfer some object instances from the WeakOrderQueue to the Stack
        if (cursor.transfer(this)) {
            success = true;
            break;
        }
        WeakOrderQueue next = cursor.next;
        if (cursor.owner.get() == null) {
            // If the thread has exited, but there are still data
            ...

if (this.elementsSize == this.maxCapacity) {
    // 如果 elements 数组已经达到最大容量，则直接抛弃掉 item
    return ;
}

// 向 elements 数组的下一个空闲位置插入 item
if (this.elementsSize == this.maxCapacity) { // 当前 elements 数组已满
    elements[this.elementsSize] = item; // 不入队，直接抛弃掉
} else {
    elements[this.elementsSize++] = item;
}

return ;

异线程对象回收 #

如果是其他线程回收自己分配的对象时，会调用 pushLater 方法。在 pushLater 方法中，会将对象包装成 DefaultHandle 挂载在对应的 WeakOrderQueue 链表中。首先通过线程当前的 ThreadLocal 值获取当前线程对应的 WeakOrderQueue，如果没有获取到，则调用 new WeakOrderQueue(this, thread) 创建一个 WeakOrderQueue，并将其挂载在 ThreadLocal 中。

private void pushLater(DefaultHandle<?> item, Thread currentThread) {
    // 判断当前线程是否已经持有了自己对应的 WeakOrderQueue
    WeakOrderQueue queue = this.threadLocal.get();

    if (queue == null) { // 如果当前线程还没有对应的 WeakOrderQueue，则创建一个 WeakOrderQueue 并将其挂载到 ThreadLocal 中
        queue = new WeakOrderQueue(this, currentThread);
        this.threadLocal.set(queue);
    } else if (queue.recycleId != item.recycleId) { // 如果 WeakOrderQueue 存在，但是 recycleId 不匹配，则重新创建一个 WeakOrderQueue，并将其挂载到 ThreadLocal 中
        queue = new WeakOrderQueue(this, currentThread);
        this.threadLocal.set(queue);
    }

    queue.add(item); // 向 WeakOrderQueue 链表中添加 item
}

int size = this.size;

// 1. Exceeding the maximum capacity 2. Controlling the recycling rate

if (size >= maxCapacity || dropHandle(item)) {

    return;

}

if (size == elements.length) {

    elements = Arrays.copyOf(elements, min(size << 1, maxCapacity));

}

elements[size] = item;

this.size = size + 1;

The logic for recycling objects in the same thread is very simple. Data is added directly to the elements array of the Stack, and the object is stored at the position pointed to by the top pointer of the stack. If the size exceeds the maximum capacity of the Stack, the object is discarded directly. Likewise, the dropHandle method is used here to control the rate of object recycling, where every 8 objects will be recycled to the Stack.

Recycling objects in a different thread #

Next, let’s analyze the scenario where objects are recycled in a different thread. As you may have guessed, when objects are recycled in a different thread, they are not added to the Stack directly, but interact with WeakOrderQueue instead. Let’s first take a look at the source code for pushLater:

private void pushLater(DefaultHandle<?> item, Thread thread) {

    Map<Stack<?>, WeakOrderQueue> delayedRecycled = DELAYED_RECYCLED.get(); // Cache to help other threads recycle objects

    WeakOrderQueue queue = delayedRecycled.get(this); // Retrieve the `WeakOrderQueue` corresponding to the `Stack` object

    if (queue == null) {

        // Help recycle objects in up to 2*CPU cores threads

        if (delayedRecycled.size() >= maxDelayedQueues) {

            delayedRecycled.put(this, WeakOrderQueue.DUMMY); // WeakOrderQueue.DUMMY means that the current thread can no longer help recycle objects for this `Stack`

            return;

        }

        // Create a new `WeakOrderQueue`

        if ((queue = WeakOrderQueue.allocate(this, thread)) == null) {

            // Drop object

            return;

        }

        delayedRecycled.put(this, queue);

    } else if (queue == WeakOrderQueue.DUMMY) {

        // Drop object

        return;

    }

    queue.add(item); // Add the object to the `WeakOrderQueue`'s `Link` linked list

}

The implementation process of pushLater can be summarized in two steps: getting the WeakOrderQueue and adding the object to the WeakOrderQueue.

First, let’s see how to get the WeakOrderQueue object. The DELAYED_RECYCLED cache of the current object is retrieved through FastThreadLocal. DELAYED_RECYCLED stores the mapping relationship for the current thread to help other threads recycle objects. If item is an object allocated by ThreadA and the current thread is ThreadB, and ThreadB helps ThreadA recycle item, then the key stored in DELAYED_RECYCLED is StackA. Then, the WeakOrderQueue corresponding to StackA is retrieved from delayedRecycled. If the WeakOrderQueue does not exist, a new WeakOrderQueue is created for StackA and added to the DELAYED_RECYCLED cache. WeakOrderQueue.allocate() checks whether the total number of objects helped to be recycled for StackA exceeds 2K. If it does not exceed 2K, the head pointer of StackA is set to the new created WeakOrderQueue. Otherwise, no more objects will be recycled for StackA.

Of course, ThreadB does not only help ThreadA recycle objects, it can help multiple threads recycle objects. Therefore, DELAYED_RECYCLED uses a Map structure. To prevent memory inflation of DELAYED_RECYCLED, Netty has taken protective measures. As can be seen from delayedRecycled.size() >= maxDelayedQueues, each thread can help recycle objects for up to twice the number of CPU cores. If the threshold is exceeded, if the current object is bound to StackX, a special WeakOrderQueue.DUMMY will be added to StackX in the Map, indicating that the current thread cannot help StackX recycle objects.

Next, let’s analyze how objects are added to the WeakOrderQueue. We directly trace the source code of queue.add(item):

void add(DefaultHandle<?> handle) {

    handle.lastRecycledId = id;

    Link tail = this.tail;

    int writeIndex;

    // If the Link at the tail of the linked list is already full, create a new one and append it to the tail

    if ((writeIndex = tail.get()) == LINK_CAPACITY) {

        // Check whether the total number of objects helped to be recycled for the corresponding Stack exceeds the maximum number of objects that can be stored by other threads

        if (!head.reserveSpace(LINK_CAPACITY)) {

            // Drop it.

            return;

        }

        this.tail = tail = tail.next = new Link();

        writeIndex = tail.get();

    }

    tail.elements[writeIndex] = handle; // Add the object to the tail of the Link

    handle.stack = null; // Set the stack attribute of the handle to null

    tail.lazySet(writeIndex + 1);

}

Before writing the object to the WeakOrderQueue, it first checks whether the tail node of the Link linked list still has space to store objects. If there is still space, the data is directly written to the tail of the tail Link. Otherwise, the object is discarded directly. If the tail Link no longer has space, a new Link is created before storing the object. Before creating the new Link, it checks whether the total number of objects helped to be recycled by other threads exceeds the threshold set by Stack. If the threshold is exceeded, the object will also be discarded.

After the object is added to the Link, the stack attribute of the handle is set to null. When retrieving the object, the stack attribute of the handle is set back to the original value. Why is this done? Isn’t it troublesome? If the Stack is no longer used and is expected to be garbage collected, but it is found that the handle still holds a reference to the Stack, then the Stack cannot be garbage collected, resulting in a memory leak.

So far, the implementation principle of how Recycler recycles objects has been fully analyzed. In a multi-threaded scenario, Netty has considered very carefully. When Recycler recycles objects and stores them in WeakOrderQueue, the objects in WeakOrderQueue will be used as the reserve of the Stack when retrieving objects from Recycler. It effectively solves the problem of cross-thread recycling and is a quite innovative and unique design.

Application of Recycler in Netty #

Recycler is also widely used in Netty. Let’s take a look at the references related to newObject in Netty’s source code, as shown in the following figure:

Among them, PooledHeapByteBuf and PooledDirectByteBuf are commonly used, corresponding to the pooled implementations of heap memory and off-heap memory, respectively. For example, when using PooledDirectByteBuf, instead of creating a new object instance every time, a pre-allocated object instance is obtained from the object pool. When PooledDirectByteBuf is no longer used, it is recycled and returned to the object pool.

In addition, the MemoryRegionCache in the memory pool also uses object pools. The MemoryRegionCache stores a queue, and each Entry node in the queue is used to store memory blocks. In Netty, the Entry node is allocated and released in the form of an object pool. I won’t go into detail here. I suggest you go through the source code and learn when and how the Entry node is allocated and released, in order to deepen your understanding of the Recycler object pool.

Conclusion #

Finally, let’s briefly summarize a few important points about object pools:

The object pool has two important components: Stack and WeakOrderQueue.
When retrieving an object from the Recycler, it first tries to find it from the Stack. If there are no available objects in the Stack, it will try to migrate some objects from WeakOrderQueue to the Stack.
When the Recycler recycles objects, there are two scenarios: recycling objects in the same thread and recycling objects in different threads. When recycling objects in the same thread, the objects are directly added to the Stack. When recycling objects in different threads, the objects are added to the Link in WeakOrderQueue.
The rate of object recycling is controlled. Every 8 objects, one object is recycled, and the rest are discarded.

After learning about memory pools and object pools, I believe you have gained a lot, and you can also appreciate how important it is to learn data structures. In order to avoid dependencies, Netty did not rely on third-party libraries to implement the object pool. Instead, it used a unique approach to implement a lightweight object pool. The excellent design ideas are worth learning from in development. If you understand Recycler, you can directly use it as a utility class in your project, which can effectively improve the performance of applications in high-concurrency scenarios.