15 Limited Operations on Pointers

15 Limited Operations on Pointers #

In previous articles, we have mentioned “pointers” many times, and you should be familiar with them by now. However, most of the time we were referring to pointer types and their corresponding pointer values. Today, we will discuss more in-depth content.

Let’s start with a quick review.

type Dog struct {
    name string
}

func (dog *Dog) SetName(name string) {
    dog.name = name
}

For the base type Dog, *Dog is its pointer type. And for a variable dog of type Dog with a non-nil value, the result of the address expression &dog is the pointer value of that variable (i.e., the underlying value).

If a method has a receiver of type *Dog, then that method is a pointer method of the base type Dog.

In this case, the receiver of this method is actually the pointer value of the current underlying value.

We can seamlessly access any field included in the underlying value through the pointer value and call any associated methods. This is probably the most frequently used “pointer” in writing Go programs.

Traditionally, a pointer is a value that points to a specific memory address. This memory address can be the starting address of any data or code, such as a variable, a field, or a function.

We just mentioned one of these cases, but in Go, there are a few other things that can represent “pointers”. The one closest to the traditional sense is the uintptr type. This type is actually a numeric type and is one of the built-in data types in Go.

Depending on the underlying computer’s architecture, it can store a 32-bit or 64-bit unsigned integer, which represents the bit pattern of any pointer, i.e., the raw memory address.

Next, let’s look at the unsafe package in the Go standard library. The unsafe package has a type called Pointer, which also represents a “pointer”.

unsafe.Pointer can represent any pointer to an addressable value, and it serves as a bridge between the pointer value and the uintptr value mentioned earlier. In other words, through it, we can perform bidirectional conversion between these two values. There is a crucial word here - addressable. Before we continue with unsafe.Pointer, we need to understand the exact meaning of this word.

Today’s question is: Can you list which values in Go are not addressable?

A typical answer to this question is: The values in the following list are not addressable.

Values of constants.
Literal values of basic types.
Result values of arithmetic operations.
Result values of indexing expressions and slicing expressions of various literals. However, there is an exception, the result value of indexing a slice literal is addressable.
Result values of indexing expressions and slicing expressions of string variables.
Result values of indexing expressions of dictionary variables.
Result values of function literals and method literals, as well as the result values of their call expressions.
Field values of struct literals, i.e., the result values of selection expressions of struct literals.
Result values of type conversion expressions.
Result values of type assertion expressions.
Result values of receive expressions.

Analysis of the Problem #

At first glance, these unaddressable values in the answers seem to have no pattern. But don’t worry, let’s go through them together. You can refer to the code in the demo35.go file, which should make it easier for you to understand.

The values of constants are always stored in a specific memory area, and these values are immutable. The same applies to literal values of primitive types, which can actually be considered as constants, but without any identifier to represent them.

The first keyword: immutable. Since string values in Go are also immutable, the resulting values based on indexing or slicing a string variable are also unaddressable, because even if you have the memory address of such a value, it doesn’t change anything.

The result of an arithmetic operation belongs to a kind of temporary result. It is meaningless to have the memory address of such a value before it is assigned to any variable or constant.

The second keyword: temporary result. This keyword can be used to explain many phenomena. We can consider the evaluation results of various expressions applied to literal values as temporary results.

As we all know, there are many types of expressions in Go, including the following commonly used ones.

Index expression to get the index of an element.
Slice expression to get a slice of a slice.
Selector expression to access a field.
Call expression to call a function or method.
Type conversion expression to convert a value’s type.
Type assertion expression to judge a value’s type.
Receive expression to send or receive elements from a channel.

When we apply these expressions to a literal value, we generally get a temporary result. For example, the indexing result value of an array literal or a map literal, or the slicing result value of an array literal or a slice literal. They are all temporary results and are unaddressable.

One exception to note is that the indexing result value of a slice literal is addressable. This is because, in any case, each slice value holds an underlying array, and each element value in this underlying array has a specific memory address.

You may ask, why is the slicing result value of a slice literal not addressable? This is because the slice expression always returns a new slice value, and this new slice value is a temporary result before being assigned to a variable.

You may have noticed that I have been saying that expressions applied to literal values of arrays, slices, or maps produce temporary results. If the expressions are applied to variables of array types or slice types, then the indexing or slicing result values are not temporary results and are addressable.

This is mainly because the values of variables themselves are not “temporary”. In comparison, literal values have no “resting place” before they are bound to any variable (or any identifier), and we cannot refer to them in any way. Such values are “temporary”.

Let’s talk about one more exception. The result value obtained by applying an indexing expression to a variable of map type is not a temporary result, but this value is unaddressable. The reason is that the storage location of each key-element pair in a map may change, and this change cannot be known from the outside.

As we all know, a map always has several hash buckets for uniformly storing key-element pairs. When certain conditions are met, a map may change the number of hash buckets and timely move the key-element pairs to the corresponding new hash buckets.

In this case, obtaining a pointer to any element value in the map is meaningless and unsafe. We don’t know when that element value will be moved to where, and we don’t know what else will be stored at the original memory address. Therefore, such values should be unaddressable.

The third keyword: unsafe. “Unsafe” operations may break the consistency of a program, cause unpredictable errors, and seriously affect the functionality and stability of the program.

Let’s talk about functions now. Functions are first-class citizens in Go, so we can assign a literal or identifier representing a function or method to a variable, pass it to a function, or pass it out from a function. However, such functions and methods are unaddressable. One reason is that functions are code and are immutable.

Another reason is that it is unsafe to have a pointer to a piece of code. In addition, the result values obtained from calling a function or method are also unaddressable because they are all temporary results.

As for the last few types listed in the typical answers, they are all temporary results because they are the results of some kind of expression applied to a literal value. Therefore, they are all unaddressable.

Okay, I’ve said so much, I hope you have gained some understanding. Let me summarize.

Immutable values are unaddressable. Constants, literal values of primitive types, string variables’ values, functions, and method literals are all like this. This rule is also for safety considerations.
Most values treated as temporary results are unaddressable. The result values of arithmetic operations belong to temporary results, and the result values of expressions applied to literal values belong to temporary results. However, there is one exception: although the indexing result value of a slice literal is also a temporary result, it is an addressable one.
If obtaining the pointer of a value may break the consistency of a program, then it is unsafe and the value is unaddressable. Due to the internal mechanism of maps, taking the address of the indexing result value of a map is unsafe. In addition, obtaining the address of a function or method represented by a literal or identifier is obviously unsafe.

Finally, if we assign a temporary result to a variable, then it becomes addressable. In this case, the obtained pointer points to the value held by that variable.

Knowledge Expansion #

Question 1: What are the limitations of non-addressable values?

The first and foremost limitation is that you cannot use the address-of operator & to obtain their pointers. However, attempting to take the address of non-addressable values will cause a compilation error, so there is no need to worry too much about it. Just remember the rules I mentioned earlier and pay attention to them when coding.

Let’s look at the following example. We will continue to use the Dog struct type as an example.

func New(name string) Dog {
    return Dog{name}
}

We write another function called New for the Dog struct. This function takes a parameter of type string named name, initializes a Dog value with it, and returns the value. Now, my question is: if I call this function and directly call its result value’s pointer method SetName in a chained manner, will it have the desired effect?

New("little pig").SetName("monster")

If you remember what I mentioned earlier, you should know that the result value of the New function call is a temporary result and is non-addressable.

But so what? Don’t forget, when I was talking about struct types and their methods, I also mentioned that we can call pointer methods on a value of a basic type because Go will automatically dereference it for us.

More specifically, for a Dog variable dog, the call expression dog.SetName("monster") will be automatically translated to (&dog).SetName("monster"), which means: first take the pointer value of dog, and then call the SetName method on that pointer value.

Do you see the problem now? Since the result value of the New function call is non-addressable, you cannot take the address of it. Therefore, this chained call will cause the compiler to report two errors: cannot call a pointer method on the result value of New("little pig"), and cannot take the address of New("little pig").

In addition, we all know that ++ and -- in Go are not operators, but important components of the increment and decrement statements.

Although the syntax definition in the Go language specification states that as long as an expression is added to the left of ++ or --, it can be used to form an increment statement or a decrement statement, it also explicitly specifies an important limitation, which is that the result value of this expression must be addressable. This makes it almost impossible to use expressions for literal values in this case.

However, there is an exception to this. Although the result values for dictionary literals and dictionary variable indexing expressions are non-addressable, such expressions can still be used in increment and decrement statements.

There are two similar rules. First, in an assignment statement, the expression on the left side of the assignment operator must be addressable, but indexing a dictionary is allowed.

The second one is in a for statement with a range clause. The expression on the left side of the range keyword must also be addressable, but indexing a dictionary is also allowed. Remember these three rules.

Compared to these fixed rules, the problem I just mentioned related to pointer methods requires a comprehensive understanding as it involves the combination of two knowledge points. At least in my interviews, it is an optional topic to explore.

Question 2: How to manipulate addressable values using unsafe.Pointer?

The previous foundation is important. But now let’s focus on the usage of pointers again. I mentioned earlier that unsafe.Pointer is a bridge between pointer values like *Dog and uintptr values. So, how can we use the intermediate “bridge” of unsafe.Pointer and the underlying operations of uintptr to manipulate values like dog?

First of all, let me make it clear that this is a black magic technique. It allows bypassing the checks of Go’s compiler and other tools and achieves the purpose of modifying data by sneaking into memory. It is not a normal programming method, and using it can be dangerous and likely to cause security risks. We should always prioritize using APIs provided by standard packages to write our programs. However, packages like reflect and go/ast can also serve as alternatives. As a developer of upper-level applications, please use caution when using any program entity from the unsafe package.

With that said, let’s take a closer look. Please consider the following code:

dog := Dog{"little pig"}
dogP := &dog
dogPtr := uintptr(unsafe.Pointer(dogP))

I first declared a variable dog of type Dog and then used the address-of operator & to retrieve its pointer value and assigned it to the variable dogP.

Finally, I used two type conversions. First, I converted dogP to a value of type unsafe.Pointer, and then immediately after, I converted the latter to a value of type uintptr and assigned it to the variable dogPtr. There are some conversion rules underlying these operations, as follows:

A pointer value (such as a value of type *Dog) can be converted to a value of type unsafe.Pointer, and vice versa.
A value of type uintptr can also be converted to a value of type unsafe.Pointer, and vice versa.
A pointer value cannot be directly converted to a value of type uintptr, and vice versa.

Therefore, when converting between pointer values and values of type uintptr, an intermediate value of type unsafe.Pointer must be used as a bridge. But what is the significance of converting a pointer value to a value of type uintptr?

namePtr := dogPtr + unsafe.Offsetof(dogP.name)
nameP := (*string)(unsafe.Pointer(namePtr))

Here is where we need to use the unsafe.Offsetof function to see the clues. The unsafe.Offsetof function is used to retrieve the offset, in bytes, between the starting storage addresses of two values—one being the value of a certain field and the other being the value of the structure to which the field belongs. When calling this function, we need to pass the field selection expression as an argument, such as dogP.name.

With this offset and knowing the starting storage address of the structure value (represented here by the variable dogPtr), we can add them together to obtain the starting storage address of the name field value of dogP. This address is represented by the variable namePtr.

Following this, we can further convert the value of namePtr into a value of type *string through two more type conversions. In doing so, we obtain a pointer value pointing to the name field value of dogP.

You may wonder why we went through this whole process instead of simply using the address expression &(dogP.name) to get the pointer value. Well, imagine if we didn’t even know what the structure type is and couldn’t access the variable dogP. Could we still access its name field?

The answer is yes, as long as we have namePtr. It is an unsigned integer, but at the same time, it is also a memory address pointing to internal data of the program. It can bring us some benefits, such as the ability to directly modify deeply hidden internal data.

However, once we intentionally or unintentionally leak this memory address, others will have the freedom to modify the value of dogP.name and any data stored at surrounding memory addresses. It doesn’t matter if they don’t know the structure of this data; can’t it be changed for better or worse? Incorrect modifications will undoubtedly bring unpredictable problems to the program and can even cause it to crash. This may be the best-case catastrophic outcome. Hence, I mentioned that using such abnormal programming techniques can be dangerous.

Now that you are aware of this technique and its dangers, please handle it with caution to prevent potential problems.

Summary #

Today we focused on issues related to pointers. The most commonly used and important pointers are based on basic types, such as values of type *Dog. How do we obtain such a pointer value? This requires the use of the address-of operator &.

However, there is a prerequisite here, which is that the operand of the address-of operator must be addressable. Regarding this, you need to remember three keywords: immutable, temporary results, and unsafe. As long as a value meets any of these three keywords, it is not addressable.

But there is one exception: the indexed result value of a slice literal is addressable. So what are the limitations of using non-addressable values? The most important limitation is regarding pointer methods: it is not possible to invoke a pointer method on a non-addressable value. This involves the combination of two knowledge points.

Compared to the aforementioned points, the importance of the unsafe.Pointer type and the uintptr type seems to be not that high. Their values can also represent pointers, and they are closer to the low-level and memory than the pointer values mentioned earlier.

Although we can use them to access or modify some internal data, and in terms of flexibility, this is much higher than the general way, but this often brings significant security risks.

Therefore, in many cases, using them to manipulate data has more disadvantages than advantages. However, it is always necessary to understand the other side of the coin.

Reflection Question #

Today’s reflection question is: Does a pointer value of a reference type have meaning? If it does not have meaning, why? If it does have meaning, what is its significance?

Click here to view the detailed code accompanying the Go language column article.