15 Python Object Comparison and Copy

15 Python Object Comparison and Copy #

Hello, I’m Jingxiao.

In the previous learning, we have actually encountered many examples of comparing and copying Python objects, such as the if statement that checks whether a and b are equal:

if a == b:
    ...

Another example is that l2 is a copy of l1:

l1 = [1, 2, 3]
l2 = list(l1)

But you may not be clear about what happens behind these statements. For example,

Is l2 a shallow copy or a deep copy of l1?
Does a == b compare the values of the two objects, or does it compare the two objects completely?

I hope that through this lesson, you will have a comprehensive understanding of these various knowledge points.

`'=='` VS `'is'` #

The equals (==) and the “is” operators are two commonly used ways to compare objects in Python. Simply put, the == operator compares the values of the objects. For example, the following code compares whether the values pointed to by variables a and b are equal:

a == b

On the other hand, the “is” operator compares the identity of the objects, i.e., whether they are the same object and whether they point to the same memory address.

In Python, the identity of each object can be obtained through the id(object) function. Therefore, the “is” operator is equivalent to comparing the IDs of the objects. Let’s look at the following example:

a = 10
b = 10

a == b
True

id(a)
4427562448

id(b)
4427562448

a is b
True

Here, Python first allocates a block of memory for the value 10, and then variables a and b point to this memory area. Therefore, a and b have the same value, the same ID, and both a == b and a is b return True.

However, it’s important to note that for integer numbers, the conclusion that a is b is True only applies to numbers within the range of -5 to 256. For example, in the following example:

a = 257
b = 257

a == b
True

id(a)
4473417552

id(b)
4473417584

a is b
False

Here, we assign the value 257 to both a and b. We can see that a == b still returns True because a and b have the same value. However, surprisingly, a is b returns False, and we also notice that the IDs of a and b are different. Why is that?

In fact, for the sake of performance optimization, Python internally maintains an array for integers in the range of -5 to 256, serving as a cache. Therefore, every time you attempt to create an integer within this range, Python will return the corresponding reference from this array instead of allocating a new block of memory.

However, if the integer exceeds this range, such as in the example with 257, Python will allocate two separate memory areas for the two instances of 257. As a result, the IDs of a and b are different, and a is b returns False.

Generally speaking, in practical work, we use the '==' operator much more frequently than the 'is' operator because we usually focus more on the values of variables rather than their internal memory addresses. However, when comparing a variable with a singleton, we typically use the 'is' operator. A typical example is checking whether a variable is None:

if a is None:
      ...

if a is not None:
      ...

Here, it’s important to note that the 'is' operator is usually more efficient than the '==' operator in terms of execution speed. Since the 'is' operator cannot be overloaded, Python doesn’t need to search for other places in the program where the comparison operator has been overloaded and make additional calls. Executing the comparison operator 'is' is simply comparing the IDs of the two variables.

However, the '==' operator is different. Executing a == b is equivalent to executing a.__eq__(b), and most data types in Python overload the __eq__ function, which often involves more complex processing. For example, for lists, the __eq__ function will iterate through the elements of the list and compare their order and values.

However, for immutable variables, if we have previously used '==' or 'is' to compare them, will the result always remain the same?

The answer is no. Let’s look at the following example:

t1 = (1, 2, [3, 4])
t2 = (1, 2, [3, 4])
t1 == t2
True

t1[-1].append(5)
t1 == t2
False

We know that tuples are immutable, but tuples can be nested, and their elements can be of the list type, which is mutable. So if we modify a mutable element in the tuple, the tuple itself is also modified. In this case, the results obtained with the 'is' or '==' operators may not be applicable.

It’s crucial to pay attention to this point when writing programs in your daily work. Do not omit condition checks where necessary.

Shallow Copy and Deep Copy #

Next, let’s take a look at shallow copy and deep copy in Python.

Instead of starting with concepts that require memorization, let’s first understand the differences between them through their operation methods with the help of code.

Let’s start with shallow copy. The common way to perform a shallow copy is to use the constructor of the data type itself. Here are two examples:

l1 = [1, 2, 3]
l2 = list(l1)

l2
[1, 2, 3]

l1 == l2
True

l1 is l2
False

s1 = set([1, 2, 3])
s2 = set(s1)

s2
{1, 2, 3}

s1 == s2
True

s1 is s2
False

Here, l2 is a shallow copy of l1, and s2 is a shallow copy of s1. For mutable sequences, we can also use the slice operator ':' to perform shallow copy, as shown in the example below:

l1 = [1, 2, 3]
l2 = l1[:]

l1 == l2
True

l1 is l2
False

Additionally, Python also provides the corresponding function copy.copy(), which can be used for any data type:

import copy
l1 = [1, 2, 3]
l2 = copy.copy(l1)

However, it’s important to note that for tuples, using tuple() or the slice operator ':' does not create a shallow copy. Instead, it returns a reference to the same tuple:

t1 = (1, 2, 3)
t2 = tuple(t1)

t1 == t2
True

t1 is t2
True

Here, the tuple (1, 2, 3) is only created once, and both t1 and t2 point to this tuple.

By now, you should have a clear understanding of shallow copy. Shallow copy means allocating a new block of memory and creating a new object, where the elements inside are references to the sub-objects of the original object. Therefore, if the elements in the original object are immutable, there is no problem. However, if the elements are mutable, shallow copy can often have side effects that need to be carefully considered. Let’s look at the following example:

l1 = [[1, 2], (30, 40)]
l2 = list(l1)
l1.append(100)
l1[0].append(3)

l1
[[1, 2, 3], (30, 40), 100]

l2
[[1, 2, 3], (30, 40)]

l1[1] += (50, 60)

l1 is a list that contains a list and a tuple as its elements. Next, we shallow copy l1 to create l2. In shallow copy, elements in l2 are references to the elements in the original object l1. Therefore, the elements in l2 point to the same list and tuple objects as l1.

Let’s take a look. l1.append(100) means adding the element 100 to the list l1. This operation does not affect l2 because l2 and l1 are two different objects and do not share the same memory address. After the operation, l2 remains the same and l1 changes to:

[[1, 2, 3], (30, 40), 100]

Next, l1[0].append(3) means adding the element 3 to the first list in l1. Since l2 is a shallow copy of l1, the first element in l2, which is also a list, will have the element 3 added as well. After this operation, both l1 and l2 are changed:

l1: [[1, 2, 3], (30, 40), 100]
l2: [[1, 2, 3], (30, 40)]

Finally, l1[1] += (50, 60) means concatenating the tuple in the second position of l1. This operation creates a new tuple as the second element in l1, and l2 does not reference the new tuple. Therefore, l2 remains unchanged while l1 changes to:

l1: [[1, 2, 3], (30, 40, 50, 60), 100]

Through this example, you can see the potential side effects of using shallow copy. If you want to avoid these side effects and create a completely independent copy, you need to use deep copy.

Deep copy means allocating a new block of memory, creating a new object, and recursively creating new sub-objects to copy the elements from the original object. Therefore, the new object and the original object are not related at all.

In Python, you can use copy.deepcopy() to achieve deep copy. For example, in the above example, it can be rewritten as follows to use deep copy:

import copy
l1 = [[1, 2], (30, 40)]
l2 = copy.deepcopy(l1)
l1.append(100)
l1[0].append(3)

l1
[[1, 2, 3], (30, 40), 100]

l2 
[[1, 2], (30, 40)]

As you can see, no matter how l1 changes, l2 remains the same. Because l1 and l2 are completely independent at this point and have no connection.

However, deep copy is not perfect and can also cause a series of problems. If there are references to the original object within the object being copied, the program can easily fall into an infinite loop:

import copy
x = [1]
x.append(x)

x
[1, [...]]

y = copy.deepcopy(x)
y
[1, [...]]

In the above example, the list x contains a reference to itself, resulting in an infinitely nested list. However, after deep copying x to y, the program does not encounter stack overflow. Why is that?

It’s because the deepcopy function maintains a dictionary during the copying process. The dictionary records the copied objects and their IDs. If the object to be copied is already stored in the dictionary, it is returned directly from the dictionary. Let’s take a look at the corresponding source code to understand:

def deepcopy(x, memo=None, _nil=[]):
    """Deep copy operation on arbitrary Python objects.
    	
	See the module's __doc__ string for more info.
	"""
	
    if memo is None:
        memo = {}
    d = id(x) # Get the ID of the object being copied
    y = memo.get(d, _nil) # Check if the object is already stored in the dictionary
    if y is not _nil:
        return y # If the object to be copied is already stored in the dictionary, return it
        ...

Summary #

In today’s class, we learned about object comparison and copying in Python, focusing on the following key points.

The comparison operator == is used to compare the values of objects, while is is used to compare their identities, i.e., whether they refer to the same memory address.
The is operator is more efficient than == because it cannot be overloaded. The is operation simply retrieves the ID of an object and compares it, while the == operation recursively traverses all values of an object and compares them one by one.
In shallow copying, the elements are references to the child objects in the original object. If the elements in the original object are mutable, changing them will also affect the copied object, leading to some side effects.
Deep copying, on the other hand, recursively copies every child object in the original object. As a result, the copied object and the original object are completely independent. Moreover, deep copying maintains a dictionary that records the copied objects and their IDs, improving efficiency and preventing infinite recursion.

Thought Question #

Finally, I have a thought question for you. In this class, I have used deep copy to create a copy of an infinitely nested list. So, what will the output be when we compare them using the equality operator '=='? Will it be True or False or something else? Why? I recommend that you first think about it yourself, and then run the code to check your hypothesis.

import copy
x = [1]
x.append(x)

y = copy.deepcopy(x)

# What will be the output of the following command?
x == y

Feel free to write your answers and learning reflections in the comments section. You are also welcome to share this article with your colleagues and friends. Let’s communicate and improve together.