Memory: Should Values Go on the Heap or the Stack? This is a Question #

Hi, I’m Chen Tian. Today we’re checking in for the first lesson of our Rust learning journey.

You must be eager to learn about Rust, but don’t rush. We won’t start by introducing syntax as usual, but rather start by reviewing some concepts you normally think of as very basic, such as memory and functions.

When it comes to basics, you may already be losing interest thinking, I know all this stuff, why waste time learning it again? But that’s not the case. Over the years, I’ve seen many senior engineers who didn’t fully grasp the basics. After working for years, they have to come back and take remedial courses.

Take memory, the most basic concept, as an example. Many people don’t actually understand when data should be put on the stack versus the heap. Not until actual problems occur at work do they realize the way data is stored can severely impact concurrency safety. They have no choice but to go back and relearn the basics, wasting a lot of time and energy.

As developers, we’ll experience many tools, frameworks, and languages over our lifetimes, but no matter how much they change, the underlying logic remains the same.

So today we need to rethink those concepts in programming that are familiar but not fully understood, and clarify the underlying logic. Also, these concepts are very important for us to learn and understand Rust’s knowledge points later on. Afterwards, we’ll delve deeper into explanations as needed.

The most basic concepts in code are variables and values, and the place that stores them is memory, so we’ll start with memory.

Memory #

Our programs interact with memory all the time. In the simple statement below assigning “hello world!” to s, there is deep interaction with the read-only data segment (RODATA), heap, and stack respectively:

let s = "hello world".to_string();

You can use this code snippet in the Rust playground to get a feel for how strings use memory.

First, “hello world” as a string literal is stored in the .RODATA segment (GCC) or .RDATA segment (VC++) of the executable file during compilation, and gets a fixed memory address when the program is loaded.

When executing “hello world”.to_string(), a new block of memory is allocated on the heap, and “hello world” is copied byte-by-byte over.

When we assign the data on the heap to s, s as a variable allocated on the stack needs to know the address of the memory on the heap. Also, since the data size on the heap is variable and can grow, we also need to know its length and current size.

Ultimately, to represent this string, we use three words: the first is a pointer, the second is the current string length (11), and the third is the total capacity of this memory (11). On a 64-bit system, three words take up 24 bytes.

You can also refer to the diagram below for a more intuitive understanding:

I just mentioned the string content is on the heap, while the pointer to the string and related info is on the stack. Now is the time to test your basic memory knowledge: When can data be put on the stack, and when does it need to be put on the heap?

For this question, developers using languages with automatic memory management like Java/Python may have some vague impressions or rules:

Primitive types are stored on the stack, objects are stored on the heap
Small amounts of data are stored on the stack, large amounts of data are stored on the heap

While these are correct, they don’t get to the crux of it. If you just apply formulas from memory at work, when encountering special cases you can easily get confused. But if you understand the derivations behind the formulas, even if you forget them, you can quickly find the answer again through simple reasoning. So next we’ll delve into the design principles of the heap and stack to understand how they actually work.

Stack #

The stack is fundamental for program execution. Whenever a function is called, a contiguous block of memory is allocated at the top of the stack, called a frame.

We know the stack grows downwards. At the very bottom of a program’s call stack, below the entry frame, is the frame corresponding to the main() function. As main() calls functions layer by layer, the stack expands layer by layer. When calls end, the stack unwinds layer by layer, releasing memory back.

During calls, a new frame allocates enough space to store the context of registers. General purpose registers used in the function have a copy saved on the stack. When the function returns, the original register context can be restored from the copy, as if nothing happened. In addition, local variables needed by the function are also reserved when the frame is allocated.

You can refer to the diagram below to understand the process:

So when a function runs, how is the size of the frame determined?

This is thanks to the compiler. When compiling and optimizing code, a function is the smallest compile unit.

Within the function, the compiler needs to know which registers are used, which local variables to put on the stack, and these must be determined at compile time. So the compiler needs to know the exact size of each local variable to reserve space.

Now we understand: At compile time, any data whose size is unknown or variable cannot be safely put on the stack, and is best put on the heap. For example, a function with a string parameter:

fn say_name(name: String) {}

// Call
say_name("Lindsey".to_string()); 
say_name("Rosie".to_string());

The string data structure has an unknown size at compile time. Only at runtime when executing the specific code do we know the size. As above, “Lindsey” and “Rosie” have different lengths. The say_name() function only knows the specific parameter length at runtime.

Therefore, we cannot put the string itself on the stack, we can only put it on the heap first, then allocate a corresponding pointer on the stack to reference the memory on the heap.

Problems with the Stack #

From the diagram above, you can also intuitively see that memory allocation on the stack is very efficient. Just modifying the stack pointer reserves the appropriate space. Reverting the stack pointer releases the reserved space. Reserving and releasing just involves modifying registers without extra computation or system calls, thus it’s highly efficient.

In theory, whenever possible, we should allocate variables to the stack to achieve better runtime speed.

Then why do we avoid allocating large amounts of data on the stack in actual work?

This is mainly to consider call stack size and avoid stack overflow. Once the program’s call stack exceeds the max stack space allowed by the system, and is unable to create a new frame to run the next function to execute, stack overflow occurs. At this point the program will be terminated by the system and crash information generated.

Excessively large stack memory allocation is one cause of stack overflow. A more well known cause is recursive functions not terminating properly. A recursive function keeps calling itself, forming a new frame on each call. If the recursive function cannot terminate, stack overflow will ultimately occur.

Heap #

While the stack is efficient to use, its limitations are also obvious. When we need dynamic sized memory, we can only use the heap, such as variable length arrays, lists, hash tables, dictionaries, etc., which are all allocated on the heap.

When allocating memory on the heap, some extra space is generally reserved as a best practice.

For example, if you create a list and add two values:

let mut arr = Vec::new();
arr.push(1); 
arr.push(2);

The actual reserved size of this list is 4, not equal to its length of 2. This is because heap memory allocation uses the malloc() function provided by libc, whose internal implementation makes a system call to the OS to allocate memory. System calls have expensive overhead, so we want to avoid frequent malloc() calls.

For the code above, if we allocated exactly the amount needed, then each time a new value is added to the list, a large new block of memory needs to be allocated, existing data copied over, then the new value added, and finally the old memory released. This is very inefficient. So when allocating heap memory, the reserved space of 4 is larger than the actual needed size of 2.

Apart from dynamic sized memory needing allocation on the heap, memory with dynamic lifetime also needs to be allocated on the heap.

As mentioned earlier, memory on the stack is reclaimed when function calls end and the frames are popped, along with the variables allocated in them. So the lifetime of stack memory is uncontrollable by the developer, and limited to the current call stack.

Memory allocated on the heap needs to be explicitly released. This gives heap memory a more flexible lifetime, allowing data to be shared across call stacks.

As shown in the diagram below:

Problems with the Heap #

However, this flexibility of heap memory also introduces many memory management challenges.

If managing heap memory manually, forgetting to release allocated memory will cause memory leaks. Once there are memory leaks, the program consumes more and more memory over time, eventually being terminated by the OS when memory is exhausted.

If heap memory is referenced from multiple threads’ call stacks, modifications to that memory need extra care by locking for exclusive access to avoid potential issues. For example, one thread is traversing a list while another thread is freeing an item in it, a wild pointer could be accessed leading to heap out of bounds, the number one memory safety issue.

If heap memory is freed but the corresponding pointer on the stack is not cleared, it’s possible to encounter use after free, which can lead to anything from crashes to security vulnerabilities. According to research by the Microsoft Security Response Center (MSRC), this is the number two memory safety issue.

How GC and ARC Solve These Issues #

To avoid the problems caused by manual heap memory management, a series of languages starting with Java adopted tracing garbage collection (Tracing GC) to automatically manage heap memory. This method periodically marks objects no longer referenced then sweeps them away to manage memory automatically, reducing developer burden.

Meanwhile, ObjC and Swift took a different path: automatic reference counting (Automatic Reference Counting). At compile time, it inserts retain/release statements for each function to automatically maintain reference counts for objects on the heap. When the reference count hits zero, the object is released by a release statement.

Let’s compare these two approaches.

In terms of efficiency, GC requires no extra operations for memory allocation and freeing, while ARC adds a lot of extra code to handle reference counting. So GC has higher efficiency and throughput.

However, the timing of GC memory reclamation is non-deterministic, and the STW (Stop The World) induced during reclamation also leads to non-deterministic latency in code execution. So languages carrying GC are generally unsuitable for embedded or real-time systems. Of course, Erlang VM is an exception, it lowers the GC granularity to each process, maximally solving the STW problem.

This is why we occasionally feel Android phones stutter while iOS phones feel smooth. Also when doing backend services, the p99 (99th percentile) response time of APIs or services can be negatively impacted by GC STW.

As an aside, the performance discussed above related to GC is different from the performance we normally discuss. Normally discussed performance refers to overall perceived throughput and latency, which can differ from actual performance. GC and ARC are typical examples. GC has higher efficiency and throughput for memory allocation and freeing compared to ARC, but occasional high latency leads to worse perceived performance, so it feels like GC performs worse than ARC.

Summary #

Today we reviewed basic concepts again and analyzed the characteristics of the stack and heap.

For values stored on the stack, their size needs to be fixed at compile time. Stack variables have lifetimes within the scope of the current call stack, and cannot be referenced across call stacks.

The heap can store data types of unknown or dynamic size. Heap variables have lifetimes starting from allocation until release, so they allow referencing across call stacks. But this also makes managing heap variables very complex. Manual management leads to many memory safety issues, while automatic management like GC and ARC have performance costs and other downsides.

To summarize: Data on the stack is static, fixed size, fixed lifetime; data on the heap is dynamic, variable size, variable lifetime.

Next time we’ll discuss basic concepts like values and types, pointers and references, functions, methods and closures, interfaces and vtables, concurrency and parallelism, synchronization and asynchrony, and Promise/async/await that we encounter when learning Rust or any language.

Review Questions #

Finally, we’ve reached the after-class review question section. Please feel free to share your thoughts in the comments.

If there is a data structure that needs to be accessed from multiple threads, can it be put on the stack? Why?
Can a pointer be used to reference a variable on the stack? If so, in what circumstances can this be done?

Also, all reference links in the article will be collated in the “Further Reading” section at the end, so I highly recommend you follow the flow of the article first. After you finish, if interested, you can check out the additional materials I’ve shared.

If you found this helpful, also feel free to share with your friends and invite them to discuss together. See you next time!