38 Case Study How to Use Tcpdump and Wireshark to Analyze Network Traffic

38 Case Study How to Use tcpdump and Wireshark to Analyze Network Traffic #

Hello, I’m Ni Pengfei.

In the previous section, we learned about the analysis and optimization methods of DNS performance issues. Let’s briefly review. DNS provides a mapping relationship between domain names and IP addresses, and it is also a commonly used method for global load balancing (GSLB) implementation.

Usually, services that need to be exposed to the public network are bound to a domain name, which is convenient for people to remember and avoids the impact of changes in the backend service IP address on users.

However, it is worth noting that DNS resolution is affected by various network conditions, and performance may be unstable. For example, if the public network latency increases, the cache expires and needs to be re-requested from the upstream server, or when there is a high traffic peak, the DNS server performance may be insufficient, all of which will increase the delay of DNS response.

At this time, you can use the debugging functions of nslookup or dig to analyze the DNS resolution process, and then debug the delay of the DNS server with tools such as ping, in order to locate performance bottlenecks. Usually, you can optimize the performance of DNS using methods such as caching, pre-fetching, and HTTPDNS.

In the previous section, we used ping, which is the most commonly used tool for testing service latency. In many cases, ping can help us locate latency issues. However, sometimes ping itself may also have unexpected problems. In this case, we need to capture the network packets sent and received when the ping command is executed, and then analyze these network packets to find the root cause of the problem.

tcpdump and Wireshark are the most commonly used tools for network packet capture and analysis, and they are essential tools for analyzing network performance.

tcpdump only supports command line usage and is commonly used to capture and analyze network packets in servers.
In addition to packet capture, Wireshark also provides a powerful graphical interface and summary analysis tools, which are particularly simple and practical when analyzing complex network scenarios.

Therefore, when analyzing network performance in practice, it is common to capture packets with tcpdump first and then analyze them with Wireshark.

Today, I will show you how to use tcpdump and Wireshark to analyze network performance problems.

Case Preparation #

This case is still based on Ubuntu 18.04 and is also applicable to other Linux systems. The environment I used for this case is as follows:

Machine configuration: 2 CPU, 8GB memory.
Pre-installation of tools such as tcpdump and Wireshark, for example:

# Ubuntu
apt-get install tcpdump wireshark

# CentOS
yum install -y tcpdump wireshark

Since the graphical interface of Wireshark cannot be used via SSH, I recommend installing it on your local machine (e.g. Windows). You can download and install Wireshark from https://www.wireshark.org/.

As before, all commands in the case are run as the root user by default (except when running Wireshark in Windows). If you are logged into the system as a regular user, please run the sudo su root command to switch to the root user.

Revisiting ping #

As mentioned before, ping is one of the most commonly used network tools, often used to probe the connectivity and latency between network hosts. The principles and usage of ping have been briefly introduced in the previous Linux Network Basics. In the case of slow DNS, we have also used ping to test the latency (RTT) of DNS servers multiple times.

However, although ping is relatively simple, sometimes you may find that the ping tool itself can also have anomalies, such as running slowly while the actual network latency is not significant.

Next, let’s open a terminal and SSH into the test machine. Execute the following command to test the connectivity and latency between the test machine and Geekbang’s official website. If everything is normal, you will see the following output:

# ping 3 times (the default interval is 1 second per ping)
# Assuming the DNS server is still set as 114.114.114.114 as in the previous issue
$ ping -c3 geektime.org
PING geektime.org (35.190.27.188) 56(84) bytes of data.
64 bytes from 35.190.27.188 (35.190.27.188): icmp_seq=1 ttl=43 time=36.8 ms
64 bytes from 35.190.27.188 (35.190.27.188): icmp_seq=2 ttl=43 time=31.1 ms
64 bytes from 35.190.27.188 (35.190.27.188): icmp_seq=3 ttl=43 time=31.2 ms

--- geektime.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 11049ms
rtt min/avg/max/mdev = 31.146/33.074/36.809/2.649 ms

We have already learned about the output of ping in the Linux Network Basics. You can review it yourself, interpret and analyze the output of this ping.

However, please note that if you find that ping ends quickly when running, execute the following command and try again. We will explain the meaning of this command later.

# Disallow receiving packets containing "googleusercontent" from the DNS server
$ iptables -I INPUT -p udp --sport 53 -m string --string googleusercontent --algo bm -j DROP

Based on the output of ping, you can see that the IP address resolved for geektime.org is 35.190.27.188, and the three subsequent ping requests all received responses, with latencies (RTT) of slightly over 30ms.

However, things get interesting at the summary. There were 3 transmissions, 3 responses received, no packet loss, but the total time for the three transmissions and receptions exceeded 11s (11049ms), which is somewhat unbelievable.

Recalling the DNS resolution issue in the previous section, you may suspect that this could be a problem with slow DNS resolution. But is it really?

Looking back at the ping output, the IP address was used for all three ping requests, indicating that ping only needs to resolve the IP once at the beginning and can then use the IP thereafter.

Let’s try nslookup again. Execute the following nslookup command in the terminal. Note that this time we also added the time command to output the execution time of nslookup:

$ time nslookup geektime.org
Server: 114.114.114.114
Address: 114.114.114.114#53

Non-authoritative answer:
Name: geektime.org
Address: 35.190.27.188

real    0m0.044s
user    0m0.006s
sys     0m0.003s

As you can see, domain name resolution is still very fast, taking only 44ms, which is obviously much shorter than 11s.

What should we analyze from here? Actually, at this point, we can use tcpdump to capture packets and see what packets ping is sending and receiving.

Let’s open another terminal (Terminal 2), SSH into the test machine, and execute the following command:

$ tcpdump -nn udp port 53 or host 35.190.27.188

Of course, you can use tcpdump directly without any parameters to capture packets, but in that case, you may capture many unrelated packets. Since we have already executed the ping command and know that the IP address of geekbang.org is 35.190.27.188, and that the ping command performs DNS queries, the above command filters based on these rules.

Let me explain this command in detail.

-nn indicates that the captured domain names (i.e., no reverse resolution), protocols, and port numbers are not resolved.
udp port 53 indicates that only packets with UDP protocol and port number (including source and destination ports) as 53 are displayed.
host 35.190.27.188 indicates that only packets with IP address (including source and destination addresses) as 35.190.27.188 are displayed.
The “or” between these two filtering conditions indicates an OR relationship, which means that as long as either of the two conditions is met, it will be displayed.

Next, go back to Terminal 1 and execute the same ping command:

$ ping -c3 geektime.org
...
--- geektime.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 11095ms
rtt min/avg/max/mdev = 81.473/81.572/81.757/0.130 ms

After the command is executed, return to Terminal 2 and check the output of tcpdump:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:02:31.100564 IP 172.16.3.4.56669 > 114.114.114.114.53: 36909+ A? geektime.org. (30)
14:02:31.507699 IP 114.114.114.114.53 > 172.16.3.4.56669: 36909 1/0/0 A 35.190.27.188 (46)
14:02:31.508164 IP 172.16.3.4 > 35.190.27.188: ICMP echo request, id 4356, seq 1, length 64
14:02:31.539667 IP 35.190.27.188 > 172.16.3.4: ICMP echo reply, id 4356, seq 1, length 64
14:02:31.539995 IP 172.16.3.4.60254 > 114.114.114.114.53: 49932+ PTR? 188.27.190.35.in-addr.arpa. (44)
14:02:36.545104 IP 172.16.3.4.60254 > 114.114.114.114.53: 49932+ PTR? 188.27.190.35.in-addr.arpa. (44)
14:02:41.551284 IP 172.16.3.4 > 35.190.27.188: ICMP echo request, id 4356, seq 2, length 64
14:02:41.582363 IP 35.190.27.188 > 172.16.3.4: ICMP echo reply, id 4356, seq 2, length 64
14:02:42.552506 IP 172.16.3.4 > 35.190.27.188: ICMP echo request, id 4356, seq 3, length 64
14:02:42.583646 IP 35.190.27.188 > 172.16.3.4: ICMP echo reply, id 4356, seq 3, length 64

In this output, the first two lines represent the options of tcpdump and basic information of the interface. Starting from the third line, it shows the output of captured network packets. The format of these outputs is timestamp protocol source address.source port > destination address.destination port packet details (this is the most basic format, additional fields can be added through options).

The earlier fields are relatively easy to understand. However, the detailed information of the network packets varies depending on the protocols. Therefore, in order to understand the meanings of these network packets, you need to have a basic understanding of the format and interaction principles of common network protocols.

Of course, in reality, these contents are all documented in the Request for Comments (RFC) published by the Internet Engineering Task Force (IETF).

For example, the first line represents an A record query request sent from the local IP to 114.114.114.114. Its packet format is documented in RFC1035, and you can click here to view it. In this tcpdump output:

36909+ represents the query identifier, which will also appear in the response. The plus sign indicates the recursive query is enabled.
A? represents the query for A record.
geektime.org. represents the domain name to be queried.
30 represents the packet length.

The following line is the response from 114.114.114.114 to the A record DNS query - the A record value for the domain name “geektime.org.” is 35.190.27.188.

The third and fourth lines are ICMP echo request and ICMP echo reply. By subtracting the timestamp of the response packet 14:02:31.539667 from the request packet’s timestamp 14:02:31.508164, you can obtain the time used by ICMP, which is 30ms. This seems fine.

However, the two reverse address resolution PTR requests that follow are somewhat suspicious. Because we only see the request packets, but not the response packets. If you carefully observe their timestamps, you will find that these two records have a time gap of 5 seconds before the next network packet appears, meaning that the two PTR records consumed 10 seconds.

Moving on, the last four packets are two normal ICMP requests and replies, and according to the timestamps, their delay is also 30ms. Here, we have actually found the root cause of the slow ping, which is the lack of response and timeout from two PTR requests. The purpose of PTR reverse address resolution is to look up the domain name from an IP address. However, not all IP addresses have a defined PTR record, so PTR queries are likely to fail.

Therefore, when you use ping and find that the latency in the results is not significant, but the ping command itself is slow, don’t panic. It is likely that the PTR is causing the issue.

Once we understand the problem, solving it is relatively simple. We just need to disable PTR. Following the usual approach, execute the man ping command to consult the manual and find the corresponding method, which is to add the -n option to disable name resolution. For example, we can execute the following command in the terminal:

$ ping -n -c3 geektime.org
PING geektime.org (35.190.27.188) 56(84) bytes of data.
64 bytes from 35.190.27.188: icmp_seq=1 ttl=43 time=33.5 ms
64 bytes from 35.190.27.188: icmp_seq=2 ttl=43 time=39.0 ms
64 bytes from 35.190.27.188: icmp_seq=3 ttl=43 time=32.8 ms

--- geektime.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 32.879/35.160/39.030/2.755 ms

You can see that now it only takes 2 seconds to finish, which is much faster than the previous 11 seconds.

So far, I have shown you how to use tcpdump to solve the most common issue of slow ping.

Finally, if you executed the iptables command at the beginning, don’t forget to remove it:

$ iptables -D INPUT -p udp --sport 53 -m string --string googleusercontent --algo bm -j DROP

However, after deleting it, you may still have a question. Why do we filter packets based on the seemingly unrelated string “googleusercontent” when our case has nothing to do with Google?

In fact, if we switch to a different DNS server, we can use PTR to find the domain name corresponding to 35.190.27.188:

$ nslookup -type=PTR 35.190.27.188 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53
Non-authoritative answer:
188.27.190.35.in-addr.arpa	name = 188.27.190.35.bc.googleusercontent.com.
Authoritative answers can be found from:

As you can see, although we obtained the PTR record, the result is not geekbang.org but 188.27.190.35.bc.googleusercontent.com. This is actually why the ping becomes slow after dropping packets containing “googleusercontent” at the beginning of the case. Because iptables is actually discarding the PTR response, which results in a timeout for the PTR request.

tcpdump can be considered as the most effective tool for network performance analysis. Next, let me show you more usage methods of tcpdump.

tcpdump #

We know that tcpdump is also a commonly used network analysis tool. It is based on libpcap and uses the AF_PACKET socket in the kernel to capture network packets transmitted in the network interface. It provides powerful filtering rules to help you extract the most relevant information from a large number of network packets.

tcpdump shows you the detailed details of each network packet, which requires you to have a basic understanding of network protocols before using it. To understand the detailed design and implementation details of network protocols, RFC is of course the most authoritative source of information.

However, the content of RFC may not be friendly to beginners. If you are not familiar with network protocols, I recommend you to study “TCP/IP Illustrated”, especially Volume 1 of the TCP/IP protocol suite. This is the core knowledge that every programmer should master.

Returning to the tcpdump tool itself, its basic usage method is quite simple, which is tcpdump [options] [filter expression]. Of course, the options and expressions are enclosed in square brackets, indicating that they are optional.

Tip: In Linux tools, if you see options enclosed in square brackets in the documentation, it means that these are optional options. At this time, pay attention to whether these options have default values.

If you check the manual for tcpdump and the manual for pcap-filter, you will find that tcpdump provides a large number of options and various filtering expressions. But don’t worry, by mastering some common options and filtering expressions, you can meet the needs of most scenarios.

To help you get started with tcpdump faster, I have also organized some of the most common usages for you and created a table for reference.

First, let’s take a look at several commonly used options. In the ping example above, we used the -nn option to indicate that IP addresses and port numbers do not need to be resolved to names. I will explain other common options in the table below.

tcpdump options table

Next, let’s look at common filtering expressions. We just used udp port 53 or host 35.190.27.188, which means capturing DNS protocol request and response packets, as well as packets with a source or destination address of 35.190.27.188.

I have also organized other commonly used filtering options in the table below.

tcpdump filtering expressions table

Finally, I would like to emphasize the output format of tcpdump, which I have already introduced earlier:

Timestamp Protocol SourceAddress.SourcePort > DestinationAddress.DestinationPort PacketDetails

The detailed information of the network packet depends on the protocol, and the format displayed by different protocols is different. Therefore, for more detailed usage, you still need to consult the man manual of tcpdump (you can also execute man tcpdump to get it).

However, after all this explanation, you should have also noticed that although tcpdump is powerful, its output format is not intuitive. Especially when there are a large number of network packets in the system (such as exceeding several thousand PPS), it is not easy to analyze problems from the network packets captured by tcpdump.

In comparison, Wireshark provides a more user-friendly interface and a series of summary analysis tools through graphical interface, which allows you to quickly solve network performance problems. Next, we will take a detailed look at it.

Wireshark #

Wireshark is also one of the most popular network analysis tools, and its biggest advantage is that it provides a cross-platform graphical interface. Similar to tcpdump, Wireshark also provides powerful filtering rule expressions and includes a range of summary analysis tools.

For example, let’s take the ping example mentioned earlier. You can execute the following command to save the captured network packets to a file named ping.pcap:

$ tcpdump -nn udp port 53 or host 35.190.27.188 -w ping.pcap

Then, copy it to the machine where you have Wireshark installed. For example, you can use scp to copy it locally:

$ scp host-ip/path/ping.pcap .

Next, open it with Wireshark. After opening it, you will see the following interface:

From Wireshark’s interface, you can see that it not only displays the header information of each network packet in a more structured format, but also uses different colors to differentiate between the DNS and ICMP protocols. You can also see at a glance that the two middle PTR queries have no response packets.

Furthermore, when you select a network packet from the list, you can also see detailed information about each layer in the network packet details below. For example, take the PTR packet with the number 5 as an example:

You can see the source and destination addresses at the IP layer (Internet Protocol), the UDP protocol (User Datagram Protocol) at the transport layer, and the DNS protocol (Domain Name System) at the application layer.

By clicking on the arrows on the left side of each layer, you can see all the information about the protocol header for that layer. For example, clicking on DNS will show you the values and meanings of various fields of the DNS protocol, such as Transaction ID, Flags, and Queries.

Of course, Wireshark has many more features than this. Next, let me show you an example involving HTTP and help you understand the workings of TCP three-way handshake and four-way handshake.

In this example, we will access http://example.com/. In Terminal 1, execute the following commands to first find the IP address of example.com. Then, use the tcpdump command to filter the obtained IP address and save the results to web.pcap:

$ dig +short example.com
93.184.216.34
$ tcpdump -nn host 93.184.216.34 -w web.pcap

In fact, you can directly use the domain name in the host expression, that is, tcpdump -nn host example.com -w web.pcap.

Next, switch to Terminal 2 and execute the following curl command to access http://example.com:

$ curl http://example.com

Finally, go back to Terminal 1 and press Ctrl+C to stop tcpdump, and then copy the resulting web.pcap file.

After opening web.pcap with Wireshark, you will see the following interface in Wireshark:

Since HTTP is based on TCP, the first three packets you see are the TCP three-way handshake packets. Next, the middle packets are the HTTP request and response packets, and the final three packets are the three-way handshake packets when the TCP connection is closed.

From the menu bar, click Statistics -> Flow Graph, and then select TCP Flows for the Flow type in the pop-up window. This allows you to see the execution process of the TCP flows throughout the entire process more clearly:

In fact, this is very similar to what you usually see in tutorials about TCP three-way handshake and four-way handshake. As a comparison, the typical TCP three-way handshake and four-way handshake processes are usually like this:

(Images from CoolShell)

However, when you compare these two images, you will find that the captured packets here are not exactly the same as the four-way handshake mentioned above. In reality, there are only three packets in the actual handshake, not four.

The reason for having three packets is that after the server receives the client’s FIN, the server also needs to close the connection, so the ACK and FIN can be combined and sent together, saving one packet and making it a “three-way handshake”.

Usually, when the server receives the client’s FIN, it may still have not finished sending data, so it will first reply to the client with an ACK packet. After waiting for a while and completing the sending of all data packets, it will send the FIN packet. This is the four-way handshake.

When capturing packets, Wireshark will display the following interface (the original network packets are from the Wireshark TCP 4-times close example, which you can download here):

Of course, Wireshark has many more features than those discussed here. You can also refer to the official documentation and WIKI for more usage methods.

Summary #

Today, we learned how to use tcpdump and Wireshark, and through several examples, we learned how to use these two tools to analyze the process of network communication and identify potential performance issues.

When you find that using an IP address is fast for the same network service, but using a domain name is much slower, you might suspect that DNS is causing trouble. DNS resolution includes not only A record requests that translate domain names into IP addresses, but also PTR requests that “cleverly” look up domain names based on IP addresses.

In fact, looking up domain names based on IP addresses and looking up protocol names based on port numbers are default behaviors of many network tools, and this often leads to slow performance of performance tools. Therefore, network performance tools usually provide an option (such as -n or -nn) to disable name resolution.

In your work, when you encounter network performance issues, don’t forget about the powerful tools tcpdump and Wireshark. You can use them to capture the actual transmitted network packets and investigate potential performance issues.

Reflection #

Finally, I would like to discuss with you how you use tcpdump and Wireshark. What network problems have you solved using tcpdump or Wireshark? How do you troubleshoot, analyze, and solve them? You can summarize your own ideas based on the network knowledge you have learned today.

Feel free to discuss with me in the comments section, and feel free to share this article with your colleagues and friends. Let’s practice in real-world scenarios and progress through communication.