25 Answering the Relationship Between Gil and Multithreading

25 Answering the Relationship Between GIL and Multithreading #

Hello, I am Jingxiao.

Unknowingly, we have completed the second chapter of the advanced section together. I am very happy to see that many students have persisted in actively learning and have left many high-quality comments, which are worth our mutual thinking and exchange. Some students have also repeatedly pondered and pointed out some expressions in the article that were not rigorous or inappropriate, for which I am also very grateful.

I have replied to most of the comments in the corresponding articles. For some comments that are not convenient to reply on mobile phones or are very valuable and typical, I have specially selected them and compiled them as today’s FAQ, to reply to them collectively.

Question 1: The principle of infinite nesting when using self.append in a list #

Let’s start by answering the first question. In the following code, why does the variable x have an infinitely nested list?

x = [1]
x.append(x)
x
[1, [...]]

To help you understand this, let’s draw a diagram:

Here, x points to a list with the first element being 1. After the append operation is executed, the second element now points back to x, which means it points to the same list that x is pointing to. This creates an infinitely nested loop: [1, [1, [1, [1, ...]]]].

However, even though x is an infinitely nested list, the x.append(x) operation does not recursively traverse each element. It simply expands the second element of the original list and makes it point to x. Therefore, there won’t be any stack overflow issues, and it won’t raise an error.

As for the second point, why does len(x) return 2? Let’s look at x again. Although it is an infinitely nested list, the top level of x consists of only 2 elements: the first element being 1 and the second element being a list that points to itself. That’s why len(x) returns 2.

Question 2: Macro Understanding of Decorators #

Let’s look at the second question, which is Hu Yao’s question about decorators. In fact, the purpose and significance of decorators lies in the fact that they can change certain functionalities of a function without modifying the function itself.

Decorators is to modify the behavior of the function through a wrapper so we don't have to actually modify the function.

Any additional functionalities added by decorators are encapsulated within their own decorator functions or classes. If you want to use them, you just need to add @decorator at the top of the original function. Obviously, this approach allows your code to be highly abstracted, separated, and simplified.

Explaining the concept alone might still be a bit abstract, so let’s imagine a scenario and experience the charm of decorators through a real-life example. In the backend of some social networking websites, there are countless operations that need to check if the user is logged in before being called, such as posting comments or statuses in some posts.

If you don’t know decorators and program using conventional methods, the code you write would probably look like this:

# Post a comment
def post_comment(request, ...):
    if not authenticate(request):
        raise Exception('You must log in first')
    ...

# Post a status
def post_moment(request, ...):
    if not authenticate(request):
        raise Exception('You must log in first')
    ...

Obviously, it is very redundant to repeatedly call the authentication function authenticate(). A better solution is to separate the authentication function authenticate() and write it as a decorator, like the example below. This way, the code becomes highly optimized:

# Post a comment
@authenticate
def post_comment(request, ...):

# Post a status
@authenticate
def post_moment(request, ...):

However, it’s also worth noting that decorators are not the only solution in many cases. What I emphasize here is mainly the benefits that decorators bring:

  • The code becomes more concise.
  • The logic becomes clearer.
  • The hierarchy and separation of the program become more evident.

And these are the development patterns that we should follow and prioritize.

Question 3: The Relationship Between GIL and Multithreading #

The third question is about the relationship between the Global Interpreter Lock (GIL) and multithreading in Python.

In fact, the existence of the GIL does not contradict the support for multithreading in Python. As we mentioned before, the GIL means that at any given moment, only one thread can be executed in the program. On the other hand, multithreading in Python refers to the execution of multiple threads in an alternating manner, resulting in a “pseudo-parallel” outcome. However, at any specific moment, only one thread is actually running, so it is not true multithreading in parallel. I have drawn the following diagram to illustrate this mechanism:

Let’s take an example to understand this. Suppose I use 10 threads to crawl the content of 50 websites. Thread 1 starts crawling the first website and gets blocked by I/O, entering a waiting state. At this point, the GIL is released and thread 2 begins executing to crawl the second website, and so on. When the I/O operation of thread 1 is completed, the main program switches back to thread 1 to let it finish the remaining operations. From the user’s perspective, this is what we mean by multithreading.

Question 4: Applications of multiprocessing and multithreading #

The fourth question, which has been mentioned multiple times in the article, but I still want to emphasize it here.

If you want to accelerate CPU-intensive tasks, using multithreading is ineffective, please use multiprocessing. CPU-intensive tasks refer to tasks that consume a large amount of CPU resources, such as calculating the product of numbers from 1 to 100,000,000, or encoding and decoding a long piece of text, and so on.

The reason why using multithreading is ineffective is exactly what we just discussed earlier. The essence of Python multithreading is the mutual switching of multiple threads, but only one thread is allowed to run at any given moment. Therefore, using multiple threads is essentially no different from using a single main thread; in many cases, it may even lower the efficiency of the program due to the additional overhead caused by thread switching.

On the other hand, using multiprocessing allows multiple processes to execute tasks in parallel, which effectively improves the efficiency of the program.

As for I/O-intensive tasks, if you want to speed up, please prioritize using multithreading or asyncio. Of course, using multiprocessing can also achieve the goal, but it is completely unnecessary. Because for I/O-intensive tasks, most of the time is wasted on I/O waiting. Therefore, while one thread/task is waiting for I/O, we only need to switch to another thread/task to perform other I/O operations.

However, if there are a lot of I/O operations and they are heavy, and there are also many connections to establish, we generally choose asyncio. Because the task switching of asyncio is more lightweight, and it can handle far more tasks than the number of threads started by multithreading. Of course, if the I/O operations are not that heavy, then using multithreading is sufficient.

Today, I mainly answered these questions. I also welcome you to continue writing your questions and thoughts in the comments section, and I will continue to answer them. I hope that each comment and Q&A session can bring you new insights and value.