15 Visualization Tools How to Realize Visualization Monitoring of Training

15 Visualization Tools How to Realize Visualization Monitoring of Training #

Hello, I’m Fang Yuan. Welcome to the 15th lesson.

In the previous lesson, we learned about the entire process of building and training a model using linear regression as an example. In the field of deep learning, training a model is a necessary step, and during the training process, we often need to visualize and monitor the model’s parameters, evaluation metrics, and other information.

Today, we will mainly learn about two visualization tools and how to use them to achieve visual monitoring during the training process.

In TensorFlow, the most commonly used visualization tool is undoubtedly Tensorboard, while TensorboardX allows PyTorch to enjoy the convenient functionality of Tensorboard. Additionally, Facebook has developed an interactive visualization tool called Visdom for PyTorch, which can provide rich visualizations of real-time data to help us monitor the experiment process in real time.

Let’s start with TensorboardX.

TensorboardX #

Tensorboard is an additional tool for TensorFlow that is used to record the details of a model’s parameters, evaluation metrics, and images during training. It provides a web interface for viewing and analyzing these details and visualizes the training process in a browser. Tensorboard helps us observe the training process of neural networks and understand the training trends.

Since Tensorboard is such a convenient tool, other deep learning frameworks besides TensorFlow also want to have access to its features. This is where TensorboardX comes in.

Installation #

Installing Tensorboard is easy, and we can use pip to install it. The command is as follows:

pip install tensorboard

If you already have TensorFlow installed, you do not need to install Tensorboard separately.

Next, we need to install TensorboardX. It is important to note that PyTorch versions 1.8 and above come with TensorboardX built-in, and it is located in torch.utils.tensorboard, so no additional configuration is required.

If you are using a version of PyTorch prior to 1.8, installing TensorboardX is also straightforward. We can still use the pip command to install it:

pip install tensorboardX

Usage and Launching #

In order to use TensorboardX, we first need to create an instance of SummaryWriter and then use the add_scalar method or the add_image method to record numbers or images to the SummaryWriter instance.

The definition of the SummaryWriter class is as follows:

torch.utils.tensorboard.writer.SummaryWriter(log_dir=None)

The log_dir parameter represents the path to save the log. By default, it is saved in the “runs/current_time_hostname” folder.

Once the instance is created, let’s take a look at the add_scalar method, which is used to record numeric constants. Its definition is as follows:

add_scalar(tag, scalar_value, global_step=None, walltime=None)

According to the definition, we will go through each parameter:

tag: A string representing the name of the data. Different data with different names will be displayed as different curves.
scalar_value: A floating-point number representing the value to be saved.
global_step: An integer representing the number of training steps.
walltime: A floating-point number representing the time of the record. By default, it is time.time().

We typically use the add_scalar method to record the changes in loss, accuracy, learning rate, and other numerical values during the training process. This allows us to monitor the training process visually.

The add_image method is used to record individual image data (requires the support of the Pillow library). Its definition is as follows:

add_image(tag, img_tensor, global_step=None, walltime=None, dataformats='CHW')

The meanings of tag, global_step, and walltime are the same as in the add_scalar method, so there is no need to repeat them. Let’s take a look at the meanings of the other additional parameters.

img_tensor: A tensor of PyTorch or an array of NumPy representing the image data.
dataformats: A string representing the format of the image data. The default is “CHW”, which stands for Channel x Height x Width. It can also be “HWC”, “HW”, and so on.

Let’s take an example to deepen our understanding. The specific code is as follows:

from torch.utils.tensorboard import SummaryWriter
# For PyTorch versions prior to 1.8, use:
# from tensorboardX import SummaryWriter
import numpy as np

# Create an instance of SummaryWriter
writer = SummaryWriter()

for n_iter in range(100):
    writer.add_scalar('Loss/train', np.random.random(), n_iter)
    writer.add_scalar('Loss/test', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
    writer.add_scalar('Accuracy/test', np.random.random(), n_iter)
    
img = np.zeros((3, 100, 100))
img[0] = np.arange(0, 10000).reshape(100, 100) / 10000
img[1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000

writer.add_image('my_image', img, 0)
writer.close()

Let me summarize what this code does for you.

First, create an instance of SummaryWriter. Please note that for PyTorch versions before 1.8, you should use “from tensorboardX import SummaryWriter”, and for PyTorch versions 1.8 and later, you should use “from torch.utils.tensorboard import SummaryWriter”.

Then, we generate some random numbers to simulate the Loss and Accuracy during the training and prediction process. We use the add_scalar method to record these values. Finally, an image is generated and recorded using the add_image method.

After running the code above, a “runs” folder will be created in the current directory, which stores the data we need to record.

Next, execute the following command in the current directory to start Tensorboard.

tensorboard --logdir=runs

After starting Tensorboard, enter “http://127.0.0.1:6006/” in your browser (Tensorboard’s default port is 6006) to visualize the data we just recorded.

The interface of Tensorboard is shown in the following picture. The right part of the picture shows the Loss and Accuracy recorded using the add_scalar method. As you can see, Tensorboard has plotted the curves for us according to the iteration steps, making it very intuitive to monitor the training process.

In the “IMAGES” tab, you can see the image data recorded using the add_image method, as shown in the following picture.

Visualizing the Training Process #

Okay, up to this point, we have installed TensorboardX and started it, and demonstrated how to use this tool.

Now, let’s see how we can visualize the training process in our actual training process. We will take the linear regression model constructed and trained in the previous lesson as an example to practice.

The following code was covered in the previous lesson, and it defines a linear regression model and randomly generates training set X and the corresponding labels Y.

import random
import numpy as np
import torch
from torch import nn

# Model definition
class LinearModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(1))
    self.bias = nn.Parameter(torch.randn(1))

  def forward(self, input):
    return (input * self.weight) + self.bias

# Data
w = 2
b = 3
xlim = [-10, 10]
x_train = np.random.randint(low=xlim[0], high=xlim[1], size=30)
y_train = [w * x + b + random.randint(0,2) for x in x_train]

Then, during the training process, we add the SummaryWriter instance and the add_scalar method as discussed earlier. The specific code is as follows.

# Tensorboard
from torch.utils.tensorboard import SummaryWriter

# Training
model = LinearModel()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, weight_decay=1e-2, momentum=0.9)
y_train = torch.tensor(y_train, dtype=torch.float32)

writer = SummaryWriter()

for n_iter in range(500):
    input = torch.from_numpy(x_train)
    output = model(input)
    loss = nn.MSELoss()(output, y_train)
    model.zero_grad()
    loss.backward()
    optimizer.step()
    writer.add_scalar('Loss/train', loss, n_iter)

With the above code, we record the transformation process of the Loss during training. The specific trend is shown in the following figure.

As you can see, the Loss is decreasing, indicating that the model is fitting our training data better as the training process progresses. Up to this point, we have completed the entire process of using the TensorboardX tool for training visualization monitoring.

In addition to the common methods mentioned above, TensorboardX also provides many other methods such as add_histogram, add_graph, add_embedding, add_audio, etc. Interested students can refer to the official documentation. I believe that with the knowledge of the two add methods you have already learned, you will be able to quickly call other methods proficiently.

Visdom #

Visdom is an interactive visualization tool specifically designed for PyTorch, developed by Facebook. It provides a rich set of visualizations for real-time data, which can be viewed in a web browser. It also allows easy sharing of visualization results with others, helping us monitor scientific experiments conducted on remote servers in real time.

Installation and Startup #

Installing Visdom is straightforward and can be done using pip. The specific command is as follows:

pip install visdom

After executing the installation command, we can start Visdom using the following command:

python -m visdom.server

The default port for Visdom is 8097, but it can be changed using the -p option.

Once successfully started, open the web browser and enter “http://127.0.0.1:8097/” to access the main interface of Visdom.

The main interface of Visdom is shown in the following image.

Please note that the usage of Visdom is slightly different from Tensorboard. With Tensorboard, we generate log files first, and then start the visualization interface. On the other hand, with Visdom, we start the visualization interface first and it dynamically updates and plots the data as it enters the Visdom window.

Quick Start #

Now let’s get our hands dirty and see how Visdom can plot data.

The specific process can be divided into four steps: first, we need to instantiate the Visdom window class; then, use the line() method to create a line plot window and initialize it; next, use a randomly generated set of data to update the line plot window. Finally, use the image() method to plot an image.

The specific code for the above process is as follows:

from visdom import Visdom
import numpy as np
import time

# Instantiate the Visdom window class
viz = Visdom()

# Create a window and initialize it
viz.line([0.], [0], win='train_loss', opts=dict(title='train_loss'))

for n_iter in range(10):
    # Randomly get a loss value
    loss = 0.2 * np.random.randn() + 1
    # Update the window image
    viz.line([loss], [n_iter], win='train_loss', update='append')
    time.sleep(0.5)

img = np.zeros((3, 100, 100))
img[0] = np.arange(0, 10000).reshape(100, 100) / 10000
img[1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000
# Visualize the image
viz.image(img)

As can be seen, the usage process is basically the same as Tensorboard, with only differences in function calls. The result of plotting a line graph is shown in the following image.

The corresponding image plot result is shown below. As you can see, Visdom dynamically updates the data when plotting.

Training Visualization Monitoring #

Similarly, the main purpose of learning visualization tools is to monitor our training process. Let’s take constructing and training a linear regression model as an example to practice.

The process of monitoring the training process with Visdom can be roughly divided into three steps:

Instantiate a window;
Initialize the window information;
Update the monitored information.

The process of defining the model and generating training data is the same as before, so I won’t repeat it here. The code for instantiating and initializing the Visdom window, and real-time logging of the loss during training, is as follows:

# Visdom
from visdom import Visdom
import numpy as np

# Training
model = LinearModel()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, weight_decay=1e-2, momentum=0.9)
y_train = torch.tensor(y_train, dtype=torch.float32)

# Instantiate a window
viz = Visdom(port=8097)
# Initialize the window information
viz.line([0.], [0.], win='train_loss', opts=dict(title='train loss'))

for n_iter in range(500):
    input = torch.from_numpy(x_train)
    output = model(input)
    loss = nn.MSELoss()(output, y_train)
    model.zero_grad()
    loss.backward()
    optimizer.step()
    # Update the monitored information
    viz.line([loss.item()], [n_iter], win='train_loss', update='append')

In the Visdom interface, we can see the change in loss as shown in the following image. Unlike Tensorboard, Visdom does not automatically scale or smooth the curves, so after 50 rounds, the shaking trend of the curve becomes less apparent due to the small range of loss values.

Summary #

In this lesson, I have taught you about two visualization tools: TensorboardX and Visdom.

I believe that through the explanation and practice of this lesson, you have mastered the installation, startup, and basic operations of these two visualization tools, such as how to draw line graphs and images with them.

The main purpose of learning to use visualization tools is to help us monitor real-time data during the training process of deep learning models, such as loss values, evaluation metrics, etc. Visualizing these data can help us perceive the changes in various parameters and indicators and grasp the training trend in real-time. Therefore, how to apply visualization tools to the model training process is the focus of our learning.

TensorboardX and Visdom also have a variety of other functions, such as drawing scatter plots, bar charts, heatmaps, etc. If you are interested, you can refer to the official documentation and try them out using the methods we learned today. With practice, you will definitely become proficient in using them.

Practice for Every Lesson #

Referring to the examples in the quick start guide for Visdom, now we need to generate two sets of random numbers, representing Loss and Accuracy respectively. In the process of iteration, how can we use code to simultaneously plot the two sets of data for Loss and Accuracy?

Please feel free to record your thoughts or doubts, and I also recommend sharing the visualization tools you learned today with more colleagues and friends.