03 Num Py Common Operations in Deep Learning

03 NumPy Common Operations in Deep Learning #

Hello, I’m Fang Yuan.

After studying the previous lesson, we have gained some understanding of NumPy arrays. As the saying goes, practice makes perfect. Today, let’s take an image classification project as an example to see what important functionalities NumPy has in actual projects.

Let’s start with a common scenario in the workplace - an online education recommendation platform that receives millions of text and image advertisements daily. In order to provide users with more accurate recommendations, your boss asks you to design a model that automatically identifies images containing various platform logos (e.g., the Geek Time logo).

To solve this image classification problem, we can break it down into three parts: data loading, training, and model evaluation (in fact, almost all deep learning projects can be divided in this way). In the data loading and model evaluation parts, we often use operations related to NumPy arrays.

Now let’s take a look at data loading.

Data Loading Stage #

In this stage, we need to load the training data and use it for model training. The training data usually includes three types: images, text, and structured data similar to two-dimensional tables.

Regardless of whether we use PyTorch, TensorFlow, or traditional machine learning library scikit-learn, we usually convert the data into NumPy arrays before performing further operations.

In our project, we need to read the images from the training set. For image processing, we generally use two modules: Pillow and OpenCV.

Although Pillow and OpenCV have similar functionalities, there are differences between them. In PyTorch, many image operations are based on Pillow. So when programming with PyTorch or facing image-related problems that need to be considered and solved, we should consider it from the perspective of Pillow.

Let’s take the GeekTime logo image as an example and read it using Pillow and OpenCV, then convert it into a NumPy array.

Using Pillow #

First, we need to use the following code in Pillow to read the image above:

from PIL import Image
im = Image.open('jk.jpg')
im.size
Output: 318, 116

The image is read and saved in Pillow in binary format. How do we convert it to NumPy format? It’s not difficult to do that. We just need to use the asarray function in NumPy to convert the data from Pillow to NumPy array format.

import numpy as np

im_pillow = np.asarray(im)

im_pillow.shape
Output: (116, 318, 3)

Using OpenCV: #

For OpenCV, we don’t need to manually convert the format. After reading the image using OpenCV, it is stored as a NumPy array by default, as shown in the code below.

import cv2
im_cv2 = cv2.imread('jk.jpg')
type(im_cv2)
Output: numpy.ndarray

im_cv2.shape
Output: (116, 318, 3)

Based on the output of the code, we can see that the last dimension of the array we read is 3. This is because the image format is RGB, which means there are three channels: R, G, and B. For most computer vision tasks, the majority of images are in the RGB format. If they are not in the RGB format, they need to be converted to RGB format before processing. Here’s something you need to pay attention to: the channel order after reading with Pillow is R, G, B; while with OpenCV it is B, G, R.

The channel order during model training needs to match the channel order during prediction. In other words, if you train using Pillow and read images directly using OpenCV for prediction, it won’t throw you an error but the results will be incorrect. So, please pay attention to this.

Next, let’s verify whether the channel order when using Pillow and OpenCV to read data is as mentioned above. We’ll introduce some common issues related to numpy array indexing, slicing, merging in this process.

How do we verify this statement? We just need to extract the data from the R, G, B channels separately, and set the data of the other two channels to 0.

Let me explain why we are doing this. RGB color mode is an industry standard color model. RGB represents the colors of the red, green, and blue channels. By combining these three colors together, we can obtain all the colors that our eyes can see.

There are 256 levels (0 to 255) for each of the RGB channels, representing different levels of light intensity. A higher number indicates a stronger intensity of light, while 0 represents the weakest intensity. In our example, if we set the data of the other two channels to 0 (effectively turning off the other two channels) and the resulting image appears with a reddish tone (you can take a look at the final output in the following text), we can conclude that the data of that channel comes from the R channel. The same method can be used to prove whether the data comes from the G and B channels.

Okay, first let’s extract the data from the R, G, and B channels. We can start with the concept of indexing and slicing in Python, which you should be familiar with if you know Python.

Just like the index in a book index, we can quickly find the content we need based on the page numbers indicated by the index. Similarly, in Python, we can access and retrieve data from arrays using indexing and slicing. The index in arrays has the same function, which is used to locate a value in the array. Slicing, on the other hand, means extracting content from a book from one page to another.

The indexing method of NumPy arrays is the same as that of Python lists and also supports slicing indexing.

Here, you need to pay attention to the fact that the form of using a colon to retrieve data often appears in NumPy arrays, as shown below:

im_pillow[:, :, 0]

What does this mean? Let’s take a look together. “:” represents the meaning of selecting all. After our image is read in, it will be saved in the array in the status shown in the figure.

The meaning of the above code is to take all the data with the third dimension index as 0, in other words, take all the data of the 0th channel of the image.

In this way, we can get the data of each channel through the following code:

im_pillow_c1 = im_pillow[:, :, 0]
im_pillow_c2 = im_pillow[:, :, 1]
im_pillow_c3 = im_pillow[:, :, 2]

After obtaining the data of each channel, we need to generate an array with all 0s, which has the same width and height as im_pillow.

Do you remember how to generate a zero array? You can think about it yourself first. The code for generating it is as follows:

zeros = np.zeros((im_pillow.shape[0], im_pillow.shape[1], 1))
zeros.shape
Output: (116, 318, 1)

Then, we just need to concatenate the zero array with im_pillow_c1, im_pillow_c2, and im_pillow_c3 to obtain the corresponding channel image data.

Array Concatenation #

Just now, we got the data of each channel separately, and now we need to concatenate the separated data with a zero array. As shown in the figure below, the red part can be regarded as single-channel data, and the white part is the zero array.

NumPy arrays provide the np.concatenate((a1, a2, …), axis=0) method for array concatenation. Here, a1, a2, … are the arrays we want to merge, and axis denotes which dimension we want to merge along. By default, it is along the 0 axis.

For our problem, we want to merge along the 2 axis, and our ultimate goal is to obtain the three images shown below:

So, let’s first merge im_pillow_c1 with the zero array to generate the array on the far left of the above figure. We need the image array to obtain the final image. The code for concatenation and the output result are as follows:

im_pillow_c1_3ch = np.concatenate((im_pillow_c1, zeros, zeros), axis=2)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
<ipython-input-21-e3d53c33c94d> in <module>
----> 1 im_pillow_c1_3ch = np.concatenate((im_pillow_c1, zeros, zeros), axis=2)
<__array_function__ internals> in concatenate(*args, **kwargs)
AxisError: axis 2 is out of bounds for array of dimension 2

You may be surprised to see this error. The reason for the error is that in a 2-dimensional array, the axis will go out of bounds if it equals 2.

Let’s take a look at the shapes of im_pillow_c1 and zeros.

im_pillow_c1.shape
Output: (116, 318)
zeros.shape
    Output: (116, 318, 1)
    
So it turns out that the two arrays we want to merge have different dimensions. How can we unify the dimensions? We can turn `im_pillow_c1` into `(116, 318, 1)`.

#### Method 1: Using np.newaxis

We can use np.newaxis to add an extra dimension to the array. Here's how to do it:

```python
im_pillow_c1 = im_pillow_c1[:, :, np.newaxis]
im_pillow_c1.shape
Output: (116, 318, 1)

By running the code above, we can convert the 2-dimensional array into a 3-dimensional array. This operation is often seen in deep learning related code. In PyTorch, the function is called unsqueeze(), while in TensorFlow, you can use tf.newaxis directly.

Next, we can merge im_pillow_c1 with zeros without any errors. Here’s the updated code:

im_pillow_c1_3ch = np.concatenate((im_pillow_c1, zeros, zeros), axis=2)
im_pillow_c1_3ch.shape
Output: (116, 318, 3)

Method 2: Direct Assignment #

The second method to add an extra dimension is by direct assignment. In fact, we can generate a fully zeroed array with the same shape as im_pillow, and then assign the values of im_pillow_c1, im_pillow_c2, and im_pillow_c3 to each channel, respectively. We will use this method to generate the arrays for the middle and right images shown above.

im_pillow_c2_3ch = np.zeros(im_pillow.shape)
im_pillow_c2_3ch[:,:,1] = im_pillow_c2

im_pillow_c3_3ch = np.zeros(im_pillow.shape)
im_pillow_c3_3ch[:,:,2] = im_pillow_c3

Now, we can display the three RGB channels of the image. If you need to plot the images, you can use matplotlib, which is a plotting library for NumPy. You can find various examples on this website, and modify the provided code according to your needs. I won’t go into details about how to plot here.

To validate the channel order, you can use the following code to print the original image, the red channel, the green channel, and the blue channel:

from matplotlib import pyplot as plt
plt.subplot(2, 2, 1)
plt.title('Origin Image')
plt.imshow(im_pillow)
plt.axis('off')
plt.subplot(2, 2, 2)
plt.title('Red Channel')
plt.imshow(im_pillow_c1_3ch.astype(np.uint8))
plt.axis('off')
plt.subplot(2, 2, 3)
plt.title('Green Channel')
plt.imshow(im_pillow_c2_3ch.astype(np.uint8))
plt.axis('off')
plt.subplot(2, 2, 4)
plt.title('Blue Channel')
plt.imshow(im_pillow_c3_3ch.astype(np.uint8))
plt.axis('off')
plt.savefig('./rgb_pillow.png', dpi=150)

Deep Copy and Shallow Copy #

Just now, through the exercise of obtaining image channel data, we saw that the operations were a bit cumbersome. The methods we introduced were mainly to help you master the knowledge of slicing and concatenating arrays.

In fact, there is a simpler way to obtain the BGR data of the three channels. We just need to read the image and assign the two channels to 0 directly. Here’s the code:

from PIL import Image
import numpy as np

im = Image.open('jk.jpg')
im_pillow = np.asarray(im)
im_pillow[:,:,1:] = 0

However, we encounter an error indicating that the array is read-only and cannot be modified. What can we do in this case? We can use copy() to create a copy of the array. Speaking of copy(), we need to talk about the concepts of deep copy and shallow copy. In the previous lesson, we mentioned that np.array() performs a deep copy, while np.asarray() performs a shallow copy.

To put it simply, a shallow copy, also known as a view, refers to an array that shares data with the original array. Please note that it’s just the data, not the shape. We usually use view() to create a view. Common slicing operations also return shallow copies of the original array.

Let’s take a look at the following code. Arrays a and b have the same data, but the shapes are different. However, when we modify the data in b, the data in a also changes.

a = np.arange(6)
print(a.shape)
Output: (6,)
print(a)
Output: [0 1 2 3 4 5]

b = a.view()
print(b.shape)
Output: (6,)
b.shape = 2, 3
print(b)
Output: [[0 1 2]
 [3 4 5]]
b[0,0] = 111
print(a)
Output: [111   1   2   3   4   5]
print(b)
Output: [[111   1   2]
 [  3   4   5]]

On the other hand, a deep copy refers to a complete copy of the original array, creating a new array. Modifying the new array does not affect the original array. We can use the copy() method to perform a deep copy.

Therefore, we can modify the code that previously resulted in an error as follows:

im_pillow = np.array(im)
im_pillow[:,:,1:] = 0

Don’t underestimate the difference between deep copy and shallow copy. Here’s an example of a pitfall I encountered before: I once developed a mobile phone-based human segmentation model.

In order to improve the segmentation performance of the model, I considered a new experimental method - considering the previous frame’s data as input for the current frame. During the training phase, there were no problems. However, during the debugging phase, I found that the model’s performance was very poor.

After further investigation, I discovered the reason behind the problem. The issue was that in order to visualize the segmentation results, I transformed the previous frame’s data and passed it into the current frame. However, I mistakenly used a shallow copy to pass the previous frame’s data, so the passed data had already been modified, instead of being the original output.

Naturally, the current frame could not obtain the correct results when receiving the modified data. Due to this pitfall, I almost gave up on the experiment. It was only after I switched to a deep copy that I was able to solve the problem.

Now, with the knowledge of deep copy and shallow copy, can you validate the channel order of the image loaded with OpenCV using the methods mentioned above?

Model Evaluation #

During model evaluation, we usually convert the model’s output into corresponding labels.

Let’s assume our task is to classify images into 2 categories: images containing GeekTime and images that do not contain GeekTime. The model will output an array of shape (2, ) called probs, which stores two probabilities. Let’s assume that the probability at index 0 is the probability of the image containing GeekTime and the probability at the other index is the probability of the image being something else. The sum of these two probabilities is 1. If the probability corresponding to GeekTime is greater, we can infer that the image contains GeekTime; otherwise, it belongs to the other category.

A simple method is to check if probs[0] is greater than 0.5. If it is, we can consider the image as the one we are looking for.

While this method works, what if we have many, many categories to classify the images into?

For example, let’s say we have 1000 categories from ImageNet. You might think of iterating through this array and finding the index with the highest value.

But what if your boss asks you to find the top 5 categories with the highest probabilities? Is there a simpler way? Let’s continue reading.

Argmax Vs Argmin: Finding the Index of the Maximum/Minimum Value #

NumPy’s argmax(a, axis=None) method can help us find the index of the maximum value. If axis is not specified, the array is considered as 1-dimensional by default.

For our problem, using the following code will give us the image with the highest probability:

np.argmax(probs)

The usage of argmin is similar to argmax, but it returns the index of the minimum value.

Argsort: Returning the Indexes of an Array after Sorting #

Now, let’s upgrade our problem. For example, we need to classify images into 10 categories and find the top three categories with the highest probabilities.

The probabilities output by the model are as follows:

probs = np.array([0.075, 0.15, 0.075, 0.15, 0.0, 0.05, 0.05, 0.2, 0.25])

We can use the function argsort(a, axis=-1, kind=None) to solve this problem. The np.argsort function sorts the original array in ascending order and returns the indexes of the sorted array. The np.argsort function has the following important parameters:

a is the original array to be sorted.
axis is the axis along which to sort. The default is -1, which means the last axis.
kind specifies the sorting algorithm to use. The default is quicksort, but there are other sorting algorithms available (you can refer to the sorting algorithms in data structures).

Let’s understand this with an example. Take a look at the code below, which describes the process of using argsort to sort probs and return the corresponding indexes.

probs_idx_sort = np.argsort(-probs)  # Note the negative sign for descending order
probs_idx_sort
Output: array([8, 7, 1, 3, 0, 2, 5, 6, 4])
# Indexes of the top three values with highest probabilities
probs_idx_sort[:3]
Output: array([8, 7, 1])

Summary #

Congratulations on completing this lesson. This lesson introduced some commonly used and important functions. You will often use these functions in almost all deep learning related projects, and you will often see them when reading other people’s code.

Let’s review the functions we learned today together. I have created a table summarizing their key functionalities and usage points.

I think the most difficult part of NumPy is the concept of axes that we learned in the previous lesson. If you understand the concept of axes clearly, understanding today’s content will be easier. After understanding the principles, the key is to practice hands-on.

Practice for Each Lesson #

Given an array scores with shape (256, 256, 2), where the sum of the elements at corresponding positions in scores[:, :, 0] and scores[:, :, 1] is 1. Now we want to generate an array mask based on scores. The requirement is that if the value in channel 0 of scores is greater than the value in channel 1, then the corresponding position in mask is 0; otherwise, it is 1.

Here is the scores array. You can try to implement it in code:

scores = np.random.rand(256, 256, 2)
scores[:, :, 1] = 1 - scores[:, :, 0]

Feel free to record your questions or gains in the comment section, and I also recommend sharing this lesson with your friends.