09 Operations and Contracts of Maps

09 Operations and Contracts of Maps #

So far, we have discussed advanced data types for collections that are designed to hold single elements.

They can store elements either using contiguous storage or by storing pointers to each other, where each element represents an independent value belonging to a particular type.

However, the dictionary (map) that we will discuss today is different. It can store a collection of key-value pairs instead of a collection of single values.

What is a key-value pair? It is a term derived from the English expression “key-value pair”. As the name suggests, a key-value pair represents a pair of a key and a value.

Note that a “key” and a “value” each represent an independent value belonging to a certain type, and binding them together forms a key-value pair.

In the Go language specification, to avoid ambiguity, they have replaced the term “key-value pair” with “key-element pair”. We will also use this clearer term to explain.

Background knowledge: why is the key type of a dictionary constrained? #

The dictionary type in Go is actually a specific implementation of a hash table. In this implementation, the key is of a restricted type, while the value can be of any type.

To understand the reason for this constraint, we need to first understand a crucial process in hash tables: mapping.

You can think of the key as an index for the value. We can use the key to look up the corresponding value in the hash table.

This relationship between keys and values is called “mapping” in mathematics, which is the original meaning of the word “map”. The mapping process in a hash table occurs during operations such as adding, deleting, modifying, and searching for key-value pairs.

For example, if we want to find the value corresponding to a certain key in a hash table, we need to pass the key as a parameter to the hash table.

The hash table will first use a hash function to convert the key into a hash value. The hash value is usually an unsigned integer. A hash table contains a certain number of buckets, which evenly store the key-value pairs in the hash table.

Therefore, the hash table first locates a bucket using the low bits of the hash value of the key, and then searches for the key in this bucket.

Since the key-value pairs are always stored together, once the key is found, the corresponding value can always be found. The hash table then returns the corresponding value as the result.

As long as the key-value pair exists in the hash table, it will always be found during the mapping process of the hash table. This process is consistent with what was described earlier for adding, modifying, and deleting key-value pairs in a hash table.

Now that we know, the first step in the mapping process is: converting the key value into a hash value.

In the dictionary of Go, each key value is represented by its hash value. This means that the dictionary does not store the value of any key independently, but stores their hash values independently.

Do you have a vague feeling about something? Let’s continue reading.

Our question today is: what types cannot be used as keys in a dictionary?

You can find the answer to this question in the Go language specification, but it’s not that simple. The typical answer is: the key type in Go dictionaries cannot be a function type, a dictionary type, or a slice type.

Problem Analysis #

Let’s analyze this problem.

According to the Go language specification, the == and != operators must be applicable between values of the key type. In other words, the values of the key type must support equality checking. Since function types, map types, and slice types do not support equality checking, the key type of a map cannot be any of these types.

Furthermore, if the key type is an interface type, the actual type of the key value cannot be any of the three types mentioned above. Otherwise, it will cause a panic (runtime exception) during program execution.

Let’s take an example:

var badMap2 = map[interface{}]int{
    "1":   1,
    []int{2}: 2, // This will cause a panic.
    3:    3,
}

The variable badMap2 is a map type with the key type interface{} and the value type int. This declaration does not cause any errors. In other words, I have managed to bypass the Go language compiler check with this declaration.

Note that I initialize this map with three key-value pairs using literals. The second key-value pair has a key value of []int{2} and a value of 2. This key value does not cause the Go language compiler to report an error because it is syntactically valid.

However, when we run this code, the Go runtime system will detect the problem and throw a panic, pointing to the line where the second key-value pair is defined in the literal. The later we discover the problem, the higher the cost of fixing it, so it is best to avoid setting the key type of a map to any interface type. If you must do so, make sure the code is within a controlled range.

Also note that if the key type is an array type, ensure that the element type of that type is not a function type, map type, or slice type.

For example, since the element type of the type [1][]string is []string, it cannot be used as the key type of a map. Furthermore, if the key type is a struct type, ensure the legality of the field types. Regardless of how deeply the illegal type is buried, such as map[[1][2][3][]string]int, the Go language compiler will detect it.

You may wonder why the value of the key type must support equality checking. I mentioned earlier that once the Go language locates a hash bucket, it attempts to find the key value in that bucket. How does it do that?

First, each hash bucket stores the hash values of all the keys it contains. The Go language compares the hash value of the key being searched with these hash values one by one to see if there is a match. If there is no match, it means that the key value being searched is not present in that bucket, and the Go language immediately returns the result.

If there is a match, the actual key value is compared again. Why is a comparison necessary? The reason is that different values can have the same hash value. This is known as a “hash collision.”

Therefore, even if the hash values are the same, the key values may not be the same. If the values of the key type cannot be compared for equality, the mapping process cannot continue. Finally, only when both the hash values and the key values are equal can it be determined that a matching key-value pair has been found.

The examples mentioned above are all in demo18.go.

Knowledge Expansion #

Question 1: Which types should be prioritized as key types for dictionaries?

By now, you already understand that in Go, some types support equality checks while others do not. So among the types that support equality checking, which ones are more suitable as key types for dictionaries?

Let’s put aside the context of using dictionaries for now and look at it from a performance perspective. In the mapping process mentioned earlier, “converting the key value into a hash value” and “comparing the key value to the key values in the hash bucket” are obviously two important and time-consuming operations.

Therefore, it can be said that the faster the hashing and equality checks are for a particular type, the more suitable it is as a key type.

For all basic types, pointer types, array types, struct types, and interface types, Go has a set of algorithms corresponding to them. This set of algorithms includes hashing and equality checks. Taking the hashing operation as an example, types with a smaller width usually have faster speed. This applies to boolean types, integer types, floating-point types, complex types, and pointer types. For string types, since their width is variable, the speed of hashing depends on the length of their values. Generally, shorter lengths result in faster hashing.

The width of a type refers to the number of bytes occupied by a single value of that type. For example, the bool, int8, and uint8 types all occupy 1 byte, so the width of these types is 1.

The above discussion pertains to basic types. Let’s now look at advanced types. Hashing an array type value actually involves hashing each element of the array and then combining the hash values, so the speed depends on the element type and the length of the array. The same principles apply as discussed before.

Similarly, hashing a struct type value involves hashing each field value of the struct and then combining the hash values, so it depends on the types of its fields and the number of fields. The specific hashing algorithm for interface types is determined by the actual type of the value.

I do not recommend using these advanced data types as key types for dictionaries, not only because the hashing and equality check operations are slower for them, but also because they have variables in their values.

For example, for an array, I can freely change the values of its elements, but before and after the change, it represents two different key values.

The situation for struct types may be better because if I can control the access permissions of their fields, I can prevent modifications from the outside. It is most dangerous to use interface types as key types for dictionaries.

Do you remember? If the Go runtime system finds that a key value does not support equality checks in this case, it will immediately panic. In the worst case, this is enough to crash the program.

So, among those basic types, which one should be prioritized? The answer is to prioritize numerical types and pointer types, and in general, types with smaller widths are better. If you must choose a string type, it is best to add additional constraints on the length of the key value.

And what is an unusual case? In general, Go sometimes optimizes dictionary operations, such as adding, deleting, modifying, and querying operations.

For example, when the key type of a dictionary is a string type; and when the key type of a dictionary is an integer type with a width of 4 or 8.

Question 2: Will reading from and writing to a dictionary with a value of nil succeed?

Okay, to avoid burning too much brain power, let’s talk about a simpler question. Since a dictionary is a reference type, when we only declare but do not initialize a variable of dictionary type, its value will be nil.

Will attempting to retrieve the corresponding element value using a key or adding a key-value pair succeed on such a variable? This question is simple, but we must keep it in mind, as it relates to the stability of program runtime.

Let me explain the answer. Except for adding a key-value pair, any operation we perform on a dictionary with a value of nil will not cause an error. When we try to add a key-value pair to a dictionary with a value of nil, the Go runtime system will immediately panic. You can try running the demo19.go file to see for yourself.

Summary

This time, we mainly discussed some confusing questions related to dictionary types. For example, why are the key types of dictionaries constrained? And what types should we typically choose as key types for dictionaries?

I started with the Go language specification and based my answers on the Go language source code. After carefully reading this article, you should have a certain understanding of the mapping process in dictionaries.

Furthermore, you should have some understanding of the hashing and equality operations performed by Go on those valid key types.

Once again, always pay attention to operations that may cause panics, such as adding a key-value pair to a dictionary with a value of nil.

Thought Exercise

Today’s thought exercise is about concurrency safety. More specifically, whether it is safe to perform operations on the same value simultaneously but in different goroutines (or go routines) within the same time frame. Safety here means that the value will not be disturbed or face other unpredictable problems due to these operations.

The specific thought exercise is: Are values of dictionary types concurrency-safe? If not, are they still unsafe when we only add or delete key-value pairs from the dictionary? Thank you for listening, and see you next time.

Click here to view the detailed code accompanying this Go language column article.