10 Lifecycle How Long Can the Value You Created Last

10 Life Cycles: How Long Can Your Created Values Last? #

Hello, I’m Chen Tian.

I mentioned before that in any language, values on the stack have their own life cycle, which is consistent with the life cycle of frames. Rust further clarifies this concept and introduces life cycles for heap memory as well.

As we know, in other languages, the life cycle of heap memory is uncertain or undefined. Therefore, either developers must manually maintain it, or the language performs additional checks at runtime. In Rust, unless you explicitly perform actions like Box::leak(), Box::into_raw(), ManualDrop, etc., generally speaking, the life cycle of heap memory is bound to the life cycle of its stack memory by default.

So, under this default scenario, within the scope of each function, the compiler can compare the life cycle of a value and its references to ensure that “the life cycle of the reference does not exceed the life cycle of the value.”

Have you ever thought about how the Rust compiler achieves this?

The Life Cycle of Values #

Before discussing further, let’s define the possible life cycle of values.

If a value’s life cycle spans the entire life cycle of the process, we call this a static life cycle.

When a value has a static life cycle, its reference also has a static life cycle. When expressing such references, we can use 'static to represent it. For example, &'static str means it’s a string reference with a static life cycle.

Generally speaking, global variables, static variables, string literals, etc., all have a static life cycle. The heap memory we mentioned earlier, if leaked using Box::leak, also has a static life cycle.

If a value is defined within a certain scope, that is, it is created on the stack or heap, then its life cycle is dynamic.

When the scope of the value ends, its life cycle also ends. For a dynamic life cycle, we use 'a, 'b, or 'hello with lowercase characters or strings to express it. ' followed by any specific name is not important; it represents a certain dynamic life cycle, where &'a str and &'b str indicate that the life cycles of these two string references may not be consistent.

Let’s summarize with a diagram:

  • Lifetime Visualization

    • Memory allocated on the heap and stack has respective scopes, and their life cycles are dynamic.
    • Global variables, static variables, string literals, and code, during compile-time, are compiled into BSS/Data/RoData/Text sections of the executable file, and then loaded into memory when the program is executed. Therefore, their life cycles are consistent with the life cycle of the process; thus, they are static.
    • Therefore, the life cycle of function pointers is also static since functions are in the Text section and their memory exists as long as the process is alive.

Having understood these basic concepts, let’s look at how the compiler identifies the life cycles of values and references.

How the Compiler Recognizes Life Cycles #

Let’s start with two of the most basic and simple examples.

In the left diagram of Example 1, x references the variable y created in an inner scope. Since the period from the start of the definition to the end of its scope is its life cycle, x’s life cycle 'a is greater than y’s life cycle 'b, and when x references y, the compiler throws an error.

In the right diagram of Example 2, y and x are in the same scope, x references y, and we can see that the life cycles of x 'a and y 'b almost end simultaneously, or 'a is less than or equal to 'b, so, x referencing y is feasible.

Lifetime Visualization

These two examples are easy to understand; let’s look at a slightly more complex one.

The example code creates two Strings in the main() function, which are then passed to the max() function for comparison. The max() function accepts two string references and returns a reference to the larger of the two strings (Example Code):

fn main() {
    let s1 = String::from("Lindsey");
    let s2 = String::from("Rosie");

    let result = max(&s1, &s2);

    println!("bigger one: {}", result);
}

fn max(s1: &str, s2: &str) -> &str {
    if s1 > s2 {
        s1
    } else {
        s2
    }
}

This code will not compile and will report an error “missing lifetime specifier,” which means the compiler cannot determine the lifetimes of s1, s2, and the return value when compiling the max() function.

You might wonder why, from our developer’s perspective, this code is very intuitive. In the main function, the lifetimes of s1 and s2 are consistent, their references are passed to the max() function, and no matter who is returned, their lifetimes will not exceed s1 or s2. So this should be a correct piece of code, right?

Why did the compiler throw an error and not allow it to compile successfully? Let’s slightly expand this code, and you’ll understand the compiler’s confusion.

In the earlier example code, we create a new function get_max(), which accepts a string reference and then compares it with the string literal “Cynthia.” As mentioned earlier, the lifetime of a string literal is static, while s1 is dynamic, so their lifetimes are clearly not the same (Code):

fn main() {
    let s1 = String::from("Lindsey");
    let s2 = String::from("Rosie");

    let result = max(&s1, &s2);

    println!("bigger one: {}", result);

    let result = get_max(&s1);
    println!("bigger one: {}", result);
}

fn get_max(s1: &str) -> &str {
    max(s1, "Cynthia")
}

fn max(s1: &str, s2: &str) -> &str {
    if s1 > s2 {
        s1
    } else {
        s2
    }
}

When there are multiple parameters with potentially inconsistent lifetimes, determining the lifetime of the return value becomes tricky. The compiler, when compiling a function, does not know who will call the function in the future or how it will be called. Therefore, the information carried by the function itself is all the information used by the compiler at compile time.

Based on this, let’s look at the example code again. When compiling the max() function, the relationship between the lifetimes of parameters s1 and s2, and between the return value and parameters, cannot be determined by the compiler.

At this point, we need to provide lifetime information in the function signature, which is lifetime annotation. In lifetime annotation, the parameters used are called lifetime parameters. Through lifetime annotation, we tell the compiler about the constraints on lifetimes among these references.

The description of lifetime parameters is the same as generic parameters, but only lowercase letters are used. Here, both input parameters s1 and s2 are constrained with 'a. Lifetime parameters describe the relationship between parameters and between the parameters and return value; they do not change the original lifetime.

After we have annotated the lifetimes, as long as s1 and s2 have a lifetime that outlives 'a, they meet the constraints of the parameters. Similarly, the lifetime of the return value also needs to outlive 'a.

When you run the sample code, the compiler already suggests that you can modify the max() function like this:

fn max<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if s1 > s2 {
        s1
    } else {
        s2
    }
}

When the main() function calls the max() function, s1 and s2 have the same lifetime 'a, so it meets the constraint (s1: &'a str, s2: &'a str). When the get_max() function calls the max() function, “Cynthia” has a static lifetime, which outlives s1’s lifetime 'a, so it can also meet the constraints of max().

Do Your References Need Extra Annotation? #

At this point, you may be confused as to why the code I wrote before, with many function parameters or return values using references, did not prompt me to add extra lifetime annotations?

This is because the compiler hopes to reduce the burden on developers as much as possible. In fact, all functions that use references need lifetime annotation, but the compiler will automatically do this, saving developers the trouble.

Take this example: the first() function accepts a string reference, finds the first word in it, and returns it (Code):

fn main() {
    let s1 = "Hello world";

    println!("first word of s1: {}", first(&s1));
}

fn first(s: &str) -> &str {
    let trimmed = s.trim();
    match trimmed.find(' ') {
        None => "",
        Some(pos) => &trimmed[..pos],
    }
}

Although we did not make any lifetime annotations, the compiler can automatically add annotations for the function through some simple rules:

  1. Every reference type parameter has an independent lifetime 'a, 'b, etc.
  2. If there is only one reference type input, its lifetime will be assigned to all outputs.
  3. If there are multiple reference type parameters and one of them is self, then its lifetime will be assigned to all outputs.

Rules 3 applies to trait or custom data types, and we’ll put that aside for now and talk about it in detail when we encounter it. The first() function example fits rules 1 and 2, and we can get a lifetime-tagged version (Code):

fn first<'a>(s: &'a str) -> &'a str {
    let trimmed = s.trim();
    match trimmed.find(' ') {
        None => "",
        Some(pos) => &trimmed[..pos],
    }
}

As you can see, all references can be correctly annotated without any conflict. So, compared to the previous example code that returns a larger string (Example Code), why can’t the compiler handle the max() function?

According to rule 1, we can annotate the parameters s1 and s2 of the max() function separately with 'a and 'b, but how to annotate the return value? Should it be 'a or 'b? The compiler is helpless in the face of these conflicts.

fn max<'a, 'b>(s1: &'a str, s2: &'b str) -> &'??? str

Therefore, only when we understand the logic of the code can we correctly annotate the relationship constraint between the parameters and the return value to compile successfully.

Reference Annotation Practice #

Okay, we have finished talking about Rust’s lifetime concept. Next, let’s try writing a strtok() function, and practice how to add reference annotations on the spot.

I believe that developers with C/C++ experience must have encountered the strtok() function. It can split a string by delimiter into a token and return it, then point the input string reference to the subsequent token.

It’s not difficult to implement in Rust. Since the s needs a mutable reference, it is a mutable reference to a string reference &mut &str (Exercise Code):

pub fn strtok(s: &mut &str, delimiter: char) -> &str {
    if let Some(i) = s.find(delimiter) {
        let prefix = &s[..i];
        // Since the delimiter can be utf8, we need to obtain its utf8 length,
        // directly using len returns the byte length, which will be problematic
        let suffix = &s[(i + delimiter.len_utf8())..];
        *s = suffix;
        prefix
    } else { // If not found, return the entire string and point the original string pointer s to an empty string
        let prefix = *s;
        *s = "";
        prefix
    }
}

fn main() {
    let s = "hello world".to_owned();
    let mut s1 = s.as_str();
    let hello = strtok(&mut s1, ' ');
    println!("hello is: {}, s1: {}, s: {}", hello, s1, s);
}

When we try to run this code, you will encounter compilation errors related to lifetimes. Like the example we discussed earlier, because according to the compiler’s rules, &mut &str will become &'b mut &'a str, which leads to a situation where the return value '&str cannot choose an appropriate lifetime.

To solve this problem, we first need to think about which lifetime the return value is related to. Is it the mutable reference to the string reference &mut, or the string reference &str itself?

Obviously, it’s the latter. So we can add a lifetime annotation for strtok:

pub fn strtok<'b, 'a>(s: &'b mut &'a str, delimiter: char) -> &'a str {...}

Because the lifetime of the return value is related to the string reference, we only need to add annotation to this part of the constraint, and let the compiler automatically add the rest. Thus, the code can also be simplified like this, allowing the compiler to expand it into the above form:

pub fn strtok<'a>(s: &mut &'a str, delimiter: char) -> &'a str {...}

Finally, the normal working code is as follows (Exercise_Code_Changed), which can be compiled successfully:

pub fn strtok<'a>(s: &mut &'a str, delimiter: char) -> &'a str {
    if let Some(i) = s.find(delimiter) {
        let prefix = &s[..i];
        let suffix = &s[(i + delimiter.len_utf8())..];
        *s = suffix;
        prefix
    } else {
        let prefix = *s;
        *s = "";
        prefix
    }
}

fn main() {
    let s = "hello world".to_owned();
    let mut s1 = s.as_str();
    let hello = strtok(&mut s1, ' ');
    println!("hello is: {}, s1: {}, s: {}", hello, s1, s);
}

To help you better understand the lifetime relationship of this function, I’ve drawn a diagram to show the relationship between each variable on the heap and the stack for your reference.

Lifetime Visualization

Here’s a tip for you: If you find a piece of code difficult to understand or analyze, you can also draw similar diagrams, starting from the most basic relationship of data on the heap and stack, and it will be easy to clarify the context.

When dealing with lifetimes, the compiler will automatically annotate the lifetimes according to certain rules. However, when automatic annotation leads to conflicts, we need to manually annotate.

The purpose of lifetime annotation is to establish connections or constraints between parameters and return values. When calling a function, the lifetime of the input parameters needs to outlive (be longer than) the annotated lifetime.

Once each function has been properly annotated with lifetimes, the compiler can analyze, in the context of the function call, whether the lifetime of the reference matches the lifetime required in the function signature. If they do not match, it violates the principle that “the lifetime of the reference cannot exceed the lifetime of the value,” and the compiler will issue an error.

If you understand the lifetime annotation of functions, then the lifetime annotation of data structures is similar. For example, in the case below, Employee’s name and title are two string references. Employee’s lifetime cannot exceed them; otherwise, it will access invalid memory, so we need to properly annotate them:

struct Employee<'a, 'b> {
  name: &'a str,
  title: &'b str,
  age: u8,
}

When using data structures, the lifetime of the data structure itself needs to be less than or equal to the lifetime of all references within its internal fields.

Summary #

Today we introduced the concepts of static and dynamic lifetimes, as well as how the compiler identifies the lifetimes of values and references.

  • Lifetime Visualization
  • Based on ownership rules, the lifetime of a value can be confirmed; it can live until the owner leaves the scope. However, the lifetime of a reference cannot exceed the lifetime of the value. This is obvious in the same scope. However, **when a function call occurs, the compiler needs to determine, through the function’s signature, the constraints