15 How to Find the Memory Address of Js Objects Through Hash Lookup

15 How to Find the Memory Address of JS Objects Through Hash Lookup #

Hello, I’m Shikawa.

As we have discussed before, in JavaScript, objects are stored in the heap while only referenced in the call stack. So how can we find the actual memory address of an object in the heap? This question can be answered by our understanding of dictionaries. Dictionaries are also referred to as maps, symbol tables, or associative arrays. To make it less abstract, let’s first talk about one implementation of dictionaries: hash tables.

Hash Table: How to Check if a Word Exists #

If you have used some document editing software, you should be familiar with the spell check feature, but how is this check done? At its core logic, it checks whether a word exists or not. So, how is it determined if a word exists or not? This is where a hash table comes into play. The implementation logic of a hash table is based on generating a unique hash value for each word and storing these values in an array. When we want to check if a word is valid, we simply check if its hash value exists in the array.

Assume we have an object composed of key-value pairs of cities as shown in the diagram above. As we can see, during the hashing process, a city’s key is passed through a hash function to generate a unique hash value, which is then placed in an array to form a hash table. Next, when we want to access the data in the hash table, we traverse the table to retrieve the corresponding value.

Here, we can see that the hash function is located in the middle of the diagram. We need a hash function to generate hash values. So, how are hash values generated? There are many ways to generate hash values for implementing a hash table, such as prime number hashing, ASCII hashing, and the djb2 algorithm.

Among these hash algorithms, the most basic one is prime number hashing. Let me give you an example using a prime number (11) as the modulus. In this example, we divide the key in each key-value pair by the modulus and store the remainder in an array to form a hash table. This gives us a unified index.

{key: 7, value: "Nanchang"}
{key: 24, value: "Taiyuan"}
{key: 42, value: "Zhengzhou"}
Prime number: 11
7 % 11  = 7 // remainder is 7
24 % 11 = 2 // remainder is 2
42 % 11 = 9 // remainder is 9

This approach seems to be able to generate hash values, but there is a problem. In the process of putting the remainders into the array, we find that if we handle a large enough amount of data, conflicts will occur. For example, in the diagram below, the keys of the two objects marked in red have the same remainder when divided by the prime number 11. Both 7 and 51 have a remainder of 7, which leads to a collision. A perfect hash table should not have collisions, but such a perfect hash table does not exist in reality. So, we can only try to minimize these situations as much as possible.

To minimize collisions, the industry has also tried other methods, such as combining ASCII code with prime numbers to generate hashes. However, this method, like prime number hashing, cannot completely avoid collisions even with the inclusion of ASCII, it can only reduce conflicts.

asciiHashCode(key) {
  if (typeof key === 'number') { 
    return key;
  }
  const tableKey = this.toStrFn(key); 
  let hash = 0; 
  for (let i = 0; i < tableKey.length; i++) {
    hash += tableKey.charCodeAt(i); 
  }
  return hash % 37; 
}

In addition to this, there is a classic algorithm called djb2 that can further reduce the occurrence of collisions. It works by using a prime number (5381) as the hash value and then iteratively multiplying it by 33 and adding the ASCII code of each character of the string. The result, modulated by the modulus (1013), gives us the final hash value.

You may wonder what the significance of the numbers 33 and 5381 is. Multiplying by 33 is easier for shifting and addition calculations. Using 33 allows the replication of most input bits in the accumulator and then disperses them. The shifts of 5 and 32 are mutually prime, which helps with avalanche effects. ASCII can be seen as a 4-bit character type selector, for example, the first four bits of a digit are all 0x3. So, 2, 4, and 8 bits can lead to similar interactions between bits, while 5 bits can strongly interact between many 4 low bits and 4 high bits within a character. That’s why 33 is chosen. As for the choice of 5381 as a prime number, it is mostly a matter of convention and can be replaced by other large prime numbers.

djb2HashCode(key) {
  const tableKey = this.toStrFn(key); 
  let hash = 5381; 
  for (let i = 0; i < tableKey.length; i++) { 
    hash = (hash * 33) + tableKey.charCodeAt(i); 
  }
  return hash % 1013; 
}

These methods only provide you with a concept. In reality, there are many other ways to implement a hash function, which may require an entire book to cover. However, here, we are only trying to understand its principles and concepts.

Through the hash function, we can basically find the existence of a word in the hash table. After solving the existence problem, the word will ask, “What is my meaning?” At this point, dictionaries come into play.

Dictionary: How to Find the Memory Address of an Object #

A hash table can have values without keys and can be seen as an extension of an array index. After discussing hash tables, let’s take a look at dictionaries. As the name suggests, this data structure is similar to the dictionaries we use in daily life. Its main purpose, just like an index, is to facilitate fast queries and searches. However, when we consult a dictionary, we not only care about whether a word exists, but more importantly, we want to know its corresponding meaning. Therefore, we need to use a set of key-value pairs to indicate their relationship.

Let’s imagine a very basic dictionary where each word has a hash as the key and its meaning as the value. These entries are then placed on each page to form a dictionary. Therefore, dictionaries usually consist of key-value pairs. As mentioned earlier, dictionaries, as a data structure, are also known as maps, symbol tables, or associative arrays. In JavaScript, we can actually regard objects as a type of hash table that can be used to construct dictionaries since objects contain key-value attributes.

Now, let’s come back to the initial question. In frontend development, the most common dictionary we use is the object reference. The browser and JS engine we use will reference objects in the call stack. But how does it find the actual location of an object in the heap? The association between the object’s reference in the stack and its actual storage in the heap is achieved through address mapping. This mapping relationship is stored in a dictionary. We can open any webpage in a browser, then open the developer tools. In the tools, we go to the “Memory” tab and select “Take Heap Snapshot”. After that, we can see a series of numbers after the “@” symbol next to each object, which represents the object’s memory address. Therefore, the dictionary here acts as an address book.

Map and Set: What problems do they solve #

Before ES6, JavaScript only had arrays and objects, and there was no data structure like a dictionary. Map and Set concepts were introduced in ES6.

In JavaScript, Map is the structure of a dictionary, which contains key-value pairs. You may ask, what is the difference between Map and an object? We have mentioned before that an object is a hash table that can be used as a dictionary with key-value pairs. The biggest difference between Map and object is that the keys in Map can be data structures other than strings, for example, an object can be a key name.

In JavaScript, Set is the structure of a set, which contains values and does not have keys. You may also ask, what is the difference between this structure and an array? Its main difference is that in JavaScript, a set is an unordered collection, and it cannot have the same elements.

JavaScript also provides WeakMap and WeakSet, and there are two main reasons for using them. First, they are weakly typed, meaning they represent strong references with no keys. Therefore, JavaScript can clean up the entire record during garbage collection. The second reason, which is also their characteristic, is that since WeakMap does not have key-value iteration, you can only retrieve related values through the keys, ensuring the encapsulation of private properties. This is why we mentioned in the previous Lesson 07 that when talking about private properties of an object, you can use WeakMap for storage.

Hash Collision: Ways to Resolve Hash Collisions #

There are several ways to resolve hash collisions that are worth knowing. Previously, we have introduced several methods to resolve hash collisions from the perspective of the hash function algorithm. Now, let’s take a look at how to resolve hash collisions using data structures. There are several basic methods, including linear probing, quadratic probing, and double hashing.

Linear Probing

Let’s first talk about linear probing. With this method, when a hash collision occurs, the program will continue to search for the next empty position. For example, in the previous example, if 7 is already occupied by “Nanchang”, then “Beijing” will be moved to position 8. This may not be a problem during storage, but it can cause some issues during search, as we need to iterate through the cluster to find the data we are looking for.

Quadratic Probing

Another method is quadratic probing. Quadratic probing uses the square number to replace the method of linear probing, which moves one position forward. This allows for a more evenly distributed distribution based on effective exponents.

Double Hashing

The third method is double hashing, which means hashing again based on the first hash result. In the formula below, x is the result of the first hash, and R is a number smaller than the hash table size. Assuming the iteration number is denoted as i, each hash collision is resolved by i * hash2(x).

hash2(x) = R - (x % R)

HashMap: How does Java resolve hash collisions? #

Let’s set JS aside for now. Actually, we can also delve deeper into the ways of resolving hash collisions through some more advanced data structures in the Java language. If you have studied Java, you may have used HashMap, LinkedHashMap, and TreeMap. So what are the advantages of these data structures in Java and how are they implemented? Let’s take a look.

The underlying logic of HashMap is implemented using linked lists and red-black trees. Its main purpose is to solve hash collisions. Let’s start with linked lists. The rule is that when there is a collision in the hash values generated by the hash function, the conflicting data is placed in a linked list to resolve the hash collision. You may ask, since the linked list has already solved this problem, why do we still need to use red-black trees? This is because when the length of the elements in the linked list is relatively small, the linked list performance is acceptable. However, when there are too many conflicting data, it will result in performance issues. At this time, it is more appropriate to use red-black trees for operations such as insertion, deletion, modification, and search.

Hashing with Linked List: Value Sorting based on Doubly Linked List #

After understanding HashMap, let’s take a look at LinkedHashMap. LinkedHashMap is built on top of HashMap and maintains a doubly linked list internally. By utilizing the performance characteristics of a doubly linked list, it can serve another important purpose, which is to preserve the order of inserted data elements.

TreeMap: Key-Value Sorting Based on Red-Black Tree #

In addition to HashMap and LinkedHashMap, TreeMap is also a type of dictionary implemented in Java based on red-black trees. However, it has a fundamental difference from HashMap - it is not based on a hash table. Instead, TreeMap is implemented based on a red-black tree. In comparison to the unordered nature of HashMap and the insertion order of LinkedHashMap, TreeMap implements key-value sorting. While its query efficiency is not as high as that of HashMap and LinkedHashMap, it is thread-safe compared to the previous two.

Summary #

Through this lesson, you should have learned how to find the memory address of a JS object through hash lookup. However, more importantly, I hope that through today’s learning, you also have a better understanding of hash, hash table, and dictionary, which are concepts that beginners often confuse. Finally, let’s summarize. We say that a dictionary is also called a map, symbol table, or associative array, and a hash table is one way to implement it. After ES6, with the introduction of the dictionary (Map) data structure, it can be used to implement a dictionary. A set is similar to a map, but the difference is that a set only saves values and not keys. For example, this is like a dictionary with only words and no explanations.

Thought Question #

Today’s thought question is, we know that Map was introduced in ES6. Before that, if people wanted to implement a data structure and functionality similar to a dictionary, they would use the object data type. Can you manually implement a dictionary data structure and its related methods using objects? Let’s give it a try.

Feel free to share your answers, exchange learning experiences, or ask questions in the comments section. If you feel that you have gained something, please share today’s content with more friends. See you in the next session!