45 Q& a Where Are the Buffers Located in the Network Send and Receive Process

45 Q&A Where Are the Buffers Located in the Network Send and Receive Process #

Hello, I am Ni Pengfei.

Since the update of this column, we have already completed the last module of the four basic modules - the network module. I am glad that you are still actively learning, thinking, practicing, and enthusiastically leaving comments and interacting. Many students have also shared their analysis methods and optimization techniques for various performance issues encountered in actual production environments. Thank you all for that.

Today is the fifth session of performance optimization Q&A. As usual, I have taken out some typical questions from the comments of the network module as the content for today’s Q&A, and will reply to them collectively. Similarly, in order to facilitate your learning and understanding, they are not arranged strictly in the order of the articles.

For each question, I have attached a screenshot of the question asked in the comment section. If you need to review the original content, you can scan the QR code in the lower right corner of each question.

Question 1: Location of Buffer in the Network Transmission Process #

First, I have a question about the location of the send/receive queue and buffer in the network transmission process.

In my previous article What You Must Know About Linux Networking, I introduced the send/receive process of Linux networking. This process involves multiple queues and buffers, including:

A circular buffer that interacts with the network card through DMA during packet sending/receiving;
The sk_buff buffer - a kernel data structure allocated by the network card interrupt handler for processing network frames;
The socket buffer - the buffer used by the application to interact with the network protocol stack through socket interface.

However, this raises two questions.

First, where are these buffers located? Are they in the network card hardware or in memory? If you think about it carefully, it becomes clear that these buffers are all located in the memory managed by the kernel.

Among them, the circular buffer belongs to the scope of the network card device driver because it needs to interact with the network card through DMA.

The sk_buff buffer is a doubly-linked list that maintains the network frame structure. Each element in the list represents a network frame (Packet). Although the TCP/IP protocol stack has multiple layers, the transfer between different layers actually only requires manipulating the pointers in this data structure, without the need to copy data.

The socket buffer allows the application to configure different sizes of receive or send buffers for each socket. When the application sends data, it writes the data into the buffer; when it receives data, it reads from the buffer. The further processing of the data in the buffer is handled by the TCP or UDP protocol in the transport layer.

Second, what is the relationship between these buffers and the Buffer and Cache mentioned earlier in the memory section?

This question is not difficult to answer either. As mentioned in the memory section, the Buffers mentioned in memory are directly related to block devices, while the others are Cache.

In fact, sk_buff, socket buffer, connection tracking, etc. are all managed by the slab allocator. You can directly check the memory usage of them by using /proc/slabinfo.

Question 2: Does the kernel protocol stack run through a kernel thread? #

For the second question, is the operation of the kernel protocol stack based on running through a kernel thread? How does the kernel execute the network protocol stack?

When it comes to network transmission and reception, as I mentioned in the interrupt handling article, there is a dedicated kernel thread called ksoftirqd for soft interrupts. Each CPU is bound to a ksoftirqd kernel thread. For example, when there are 2 CPUs, there are ksoftirqd/0 and ksoftirqd/1 as the two kernel threads.

However, it is worth noting that not all network functions are handled in the ksoftirqd kernel threads. There are many other mechanisms in the kernel (such as hard interrupts, kworker, slab, etc.) that work together to ensure the normal operation of the entire network protocol stack.

As for the working principle of the network protocol stack in the kernel and how to dynamically trace the execution flow of the kernel, there will be dedicated articles in future columns to discuss. If you are interested in this part, you can try analyzing it using tools such as perf, systemtap, and bcc-tools that we mentioned before.

Question 3: Is the maximum number of connections limited to 65535 ports #

We know that both TCP and UDP use 16-bit port numbers, which means the maximum value is 65535. Does this mean that if we use the TCP protocol on a single machine with a single IP address, the maximum number of concurrent connections is only 65535?

To answer this question, you first need to understand that the Linux protocol stack uses a five-tuple to identify a connection (i.e., protocol, source IP, source port, destination IP, destination port).

With this understanding, we can analyze this question based on two scenarios: the client side and the server side.

For the client side, each time a TCP connection request is made, an available local port needs to be assigned to connect to the remote server. Since this local port is exclusive, the client can only initiate a maximum of 65535 connections.

For the server side, it usually listens on a fixed port (such as port 80) and waits for client connections. Based on the five-tuple structure, we know that the client’s IP and port are variable. If we don’t consider IP addressing schemes and resource limitations, the theoretical maximum number of connections for a server can reach 2 to the power of 48 (32 bits for IP and 16 bits for the port), which is far greater than 65535.

Therefore, overall, the client can support a maximum of 65535 connections, while the server can support a massive number of connections. However, due to the performance of the Linux protocol stack itself, as well as various physical and software resource limitations, achieving such a large number of connections is still far from possible (in fact, C10M is already quite challenging).

Question 4: After-class reflection on “How to optimize NAT performance” #

At the end of the article How to optimize NAT performance, I left two questions for you to think about.

MASQUERADE is one of the most commonly used SNAT rules, usually used to provide a shared egress IP for multiple internal IP addresses. Suppose there is a Linux server that uses MASQUERADE to provide outbound access for all internal IP addresses. Then,

Can MASQUERADE still work if multiple internal IP addresses have the same port number?
Are there any potential problems with this usage when there are a large number of internal IP addresses or requests?

For these two questions, I, along with fellow students such as “wo lai ye” and “ninuxer,” have provided good answers:

First, when multiple internal IP addresses have the same port number, MASQUERADE can still work correctly. However, you may have heard that after configuring MASQUERADE, each application needs to manually configure and modify the port number.

In fact, MASQUERADE records the information of each connection through the conntrack mechanism. As I mentioned in the third question just now, five tuple is used to identify a connection, as long as these five elements are not simultaneously the same, the network connection can proceed normally.

Next, when the number of internal IP addresses and concurrent connections is relatively small, this usage method doesn’t have major issues. However, when the number of IP addresses or concurrent connections is particularly large, various resource limitations may be encountered.

For example, since MASQUERADE converts multiple internal IP addresses to the same external IP (i.e., SNAT), to ensure that the source ports of the packets sent out are not duplicated, the source ports of the original network packets may also be reassigned. In this case, the port number of the converted external IP becomes an important factor limiting the number of connections.

In addition to this, connection tracking, network bandwidth of the MASQUERADE machine, etc., are potential bottlenecks, and there is also a single point of failure issue. These situations need to be specifically noted in our actual usage.

Today, I mainly answered these questions, and I also welcome you to continue writing your questions and thoughts in the comment section. I will continue to answer them. I hope that with each Q&A session, we can internalize the knowledge from the articles into your abilities. We need to practice in real scenarios and progress through communication.