09 Can One Value Have Multiple Owners

09 Ownership: Can A Value Have Multiple Owners? #

Hello, I’m Chen Tian.

The single ownership rule introduced earlier can satisfy our memory allocation and use requirements in most scenarios, and the Rust borrow checker can perform static checking during compilation, without affecting runtime efficiency.

However, there are always exceptions to rules. How should we handle some special cases in daily work?

  • In a directed acyclic graph (DAG), a node may be pointed to by two or more nodes. How should this be expressed in the ownership model?
  • What should we do if multiple threads need to access the same shared memory?

We know that these problems are encountered only during program execution. During compilation, the static checks of ownership cannot handle them. Therefore, for better flexibility, Rust provides dynamic checks during runtime to meet the needs of special scenarios.

This is also the approach Rust takes to handle many problems: At compile time, handle most uses to ensure safety and efficiency; at runtime, handle scenarios that cannot be processed at compile time, sacrificing a bit of efficiency for flexibility. This approach is also reflected in static and dynamic dispatch and is well worth learning from.

So how does Rust perform dynamic checks at runtime? And how do dynamic checks at runtime align with static checks during compilation?

Rust’s answer is to use reference-counting smart pointers: Rc (Reference counter) and Arc (Atomic reference counter). It should be noted that Arc is not the same as the ARC (Automatic Reference Counting) found in ObjC/Swift, though they solve problems similarly through reference counting.

Rc #

Let’s look at Rc first. For some data structure T, we can create a reference count Rc, allowing it to have multiple owners. Rc places the corresponding data structure on the heap. As we mentioned in the second lecture, the heap is the only area where dynamically created data can be used everywhere.

use std::rc::Rc;
fn main() {    
  let a = Rc::new(1);
}

Afterwards, if you want to create more owners for the data, you can achieve this through clone().

Cloning an Rc structure does not copy its internal data; it only increases the reference count. And when an Rc structure leaves scope and is dropped, it just decreases the reference count until it reaches zero, at which point the corresponding memory is truly cleared.

use std::rc::Rc;
fn main() {
    let a = Rc::new(1);
    let b = a.clone();
    let c = a.clone();
}

In the code above, we created three Rc instances: a, b, and c. They all point to the same data on the heap, meaning the data on the heap now has three shared owners. When this code ends, c drops first, decreasing the reference count to 2, then b drops, and finally a drops, reducing the count to zero, and the memory on the heap is freed.

You might wonder why we created multiple owners of the same memory, but the compiler doesn’t complain about ownership conflicts?

Look closely at this code: First, a is the owner of Rc::new(1), which is certain; then b and c each call a.clone(), each getting a new Rc, so from the compiler’s perspective, a, b, and c each own one Rc. If you find the description a bit confusing, take a look at the implementation of Rc’s clone() function (source code):

fn clone(&self) -> Rc<T> {
    // Increase reference count
    self.inner().inc_strong();
    // Create a new Rc structure with self.ptr
    Self::from_inner(self.ptr)
}

So, Rc’s clone() does exactly what we said before: it does not copy the actual data, but merely increases the reference count.

You might further wonder how Rc is created on the heap, and why this heap memory is not controlled by the stack memory’s lifecycle?

Box::leak() Mechanism #

We mentioned in the previous lecture that, under the ownership model, the lifecycle of heap memory is kept consistent with that of the stack memory that created it. Thus, the implementation of Rc seems to be incompatible with this. Indeed, if strictly following the single ownership model from the previous lecture, Rust could not handle reference counting like Rc.

Rust must provide a mechanism that allows code to create heap memory that’s not controlled by stack memory, bypassing the ownership rules at compile time, like C/C++ does. Rust’s way is Box::leak().

Box is a smart pointer in Rust, forcing any data structure to be created on the heap, then placing a pointer on the stack pointing to this data structure. However, at this point, the heap memory’s lifecycle is still controlled and consistent with the stack’s pointer. We will go into more detail on smart pointers like Box later.

Box::leak(), as the name suggests, leaks objects on heap memory, free from stack memory control, resulting in a free object whose lifecycle can be as extensive as the lifecycle of the entire process.

So we deliberately created a loophole to allow memory leaks. Note that in C/C++, any heap memory you allocate with malloc is similar to Rust’s Box::leak(). I really like Rust’s design; it adheres to the Principle of Least Privilege, helping developers write secure code to the maximum extent.

With Box::leak(), we can break out of the Rust compiler’s static checks, ensuring that the heap memory pointed to by Rc has the longest lifecycle. Then, through reference counting, we ensure the lifecycle of this memory eventually ends. If interested, you can check the source code of Rc::new().

By the way, when learning a language, do not shy away from standard library source code because you’re a beginner. On the contrary, looking at the source code when something is unclear gives you first-hand knowledge. Once you understand it, your learning will be more profound and invaluable.

Having understood Rc, we’ve further understood how Rust performs both static and dynamic ownership checks:

  • Static checking is ensured by the compiler, guaranteeing that the code follows the ownership rules;
  • Dynamic checking allows the heap memory to have an unrestricted lifecycle through Box::leak and ensures that such heap memory is eventually freed during runtime.

Implementing DAG #

Now let’s use Rc to implement the DAG that we couldn’t before.

Assuming Node only contains an id and pointers to downstream (downstream), because a node in a DAG may be pointed to by other nodes, we use Rc<Node> to express it; a node may not have downstream nodes, so we use Option<Rc<Node>> to express it.

To build such a DAG, we need to provide the following methods for Node:

  • new(): Create a new Node.
  • update_downstream(): Set the Node’s downstream.
  • get_downstream(): Clone a copy of the downstream inside the Node.

With these methods, we can create a DAG with the relationships shown in the graph above (code 1):

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    id: usize,
    downstream: Option<Rc<Node>>,
}

impl Node {
    pub fn new(id: usize) -> Self {
        Self {
            id,
            downstream: None,
        }
    }

    pub fn update_downstream(&mut self, downstream: Rc<Node>) {
        self.downstream = Some(downstream);
    }

    pub fn get_downstream(&self) -> Option<Rc<Node>> {
        self.downstream.as_ref().map(|v| v.clone())
    }
}

fn main() {
    let mut node1 = Node::new(1);
    let mut node2 = Node::new(2);
    let mut node3 = Node::new(3);
    let node4 = Node::new(4);
    node3.update_downstream(Rc::new(node4));

    node1.update_downstream(Rc::new(node3));
    node2.update_downstream(node1.get_downstream().unwrap());
    println!("node1: {:?}, node2: {:?}", node1, node2);
}

RefCell #

When running the above code, you might be wondering if the entire DAG can still be modified after its creation?

In the simplest approach, we can add the following code after the main() function in code 1 to modify Node3 to point to a new Node5 (code 2):

let node5 = Node::new(5);
let node3 = node1.get_downstream().unwrap();
node3.update_downstream(Rc::new(node5));

println!("node1: {:?}, node2: {:?}", node1, node2);

However, it will not compile. The compiler will tell you “node3 cannot borrow as mutable”.

This is because Rc is a read-only reference counter, and you cannot get a mutable reference to its internal data to modify it. What can we do about this?

This is where we need RefCell.

Like Rc, RefCell also bypasses Rust compiler’s static checks, allowing us to mutably borrow read-only data at runtime. This involves another unique and slightly difficult concept in Rust: interior mutability.

Interior Mutability #

With interior mutability, it’s natural to think about exterior mutability, so let’s start with this simpler definition for comparison.

When we explicitly declare a value as mutable with let mut, or declare a mutable reference with &mut, the compiler can perform strict checks at compile time, ensuring that only mutable values or references can modify a value’s internal data. This is called exterior mutability, declared with the mut keyword.

However, this is not flexible enough as sometimes we want to bypass the compile-time checks and modify a value or reference that’s not declared mut. That is, from the compiler’s perspective, the value is read-only, but at runtime, this value can be mutably borrowed to modify its internal data. This is where RefCell comes into play.

Let’s look at a simple example (code 2):

use std::cell::RefCell;

fn main() {
    let data = RefCell::new(1);
    {
        // Obtain a mutable borrow of the RefCell's internal data
        let mut v = data.borrow_mut();
        *v += 1;
    }
    println!("data: {:?}", data.borrow());
}

In this example, data is a RefCell starting with a value of 1. Note that we’ve not declared data as a mutable variable. Afterward, we can use RefCell’s borrow_mut() method to obtain a mutable internal reference and perform an increment operation. Finally, we can use RefCell’s borrow() method to obtain an immutable internal reference. Since we’ve incremented by 1, its value is now 2.

You might wonder why we need to wrap the lines of code that get and operate on the mutable borrow within curly braces?

According to the ownership rules, in the same scope, we cannot have active mutable and immutable borrows at the same time. By using these braces, we explicitly limit the mutable borrow’s lifetime, preventing it from conflicting with subsequent immutable borrows.

Let’s think one step further: without these braces, would this code fail to compile, or would it encounter a runtime error (code 3)?

use std::cell::RefCell;

fn main() {
    let data = RefCell::new(1);

    let mut v = data.borrow_mut();
    *v += 1;

    println!("data: {:?}", data.borrow());
}

If you run code 3, there are no compilation problems, but you will get an error stating “already mutably borrowed: BorrowError” at line 9 during execution. This shows that borrowing rules still apply, but they are checked at runtime.

This is the key distinction between exterior and interior mutability, which we can summarize in the following table:

Implementing a Modifiable DAG #

Now that we have an intuitive understanding of RefCell, let’s see how to use it with Rc to make our DAG modifiable.

First, the data structure’s downstream needs to nest a RefCell within Rc, allowing the RefCell’s interior mutability to obtain a mutable borrow of the data, while Rc still allows the value to have multiple owners.

The complete code is provided here (code 4):

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
struct Node {
    id: usize,
    // Use Rc<RefCell<T>> to make the node modifiable
    downstream: Option<Rc<RefCell<Node>>>,
}

impl Node {
    pub fn new(id: usize) -> Self {
        Self {
            id,
            downstream: None,
        }
    }

    pub fn update_downstream(&mut self, downstream: Rc<RefCell<Node>>) {
        self.downstream = Some(downstream);
    }

    pub fn get_downstream(&self) -> Option<Rc<RefCell<Node>>> {
        self.downstream.as_ref().map(|v| v.clone())
    }
}

fn main() {
    let mut node1 = Node::new(1);
    let mut node2 = Node::new(2);
    let mut node3 = Node::new(3);
    let node4 = Node::new(4);

    node3.update_downstream(Rc::new(RefCell::new(node4)));
    node1.update_downstream(Rc::new(RefCell::new(node3)));
    node2.update_downstream(node1.get_downstream().unwrap());
    println!("node1: {:?}, node2: {:?}", node1, node2);

    let node5 = Node::new(5);
    let node3 = node1.get_downstream().unwrap();
    // Obtain a mutable reference to modify downstream
    node3.borrow_mut().downstream = Some(Rc::new(RefCell::new(node5)));

    println!("node1: {:?}, node2: {:?}", node1, node2);
}

As we can see, by using the Rc<RefCell<T>> nested structure, our DAG becomes modifiable.

Arc and Mutex/RwLock #

We solved the DAG problem using Rc and RefCell, but for the problem of multiple threads accessing the same memory mentioned at the beginning, can we use Rc to handle it?

No. Because Rc is not thread-safe for the sake of performance. Therefore, we need another reference counting smart pointer: Arc, which implements a thread-safe reference counter.

Arc’s internal reference counting uses Atomic Usize instead of regular usize. You can gauge from the name that Atomic Usize is an atomic type of usize, which uses special CPU instructions to ensure thread safety. If you’re interested in atomic types, you can check out the documentation for std::sync::atomic.

Rust implements two different sets of reference counting structures purely for performance reasons, reflecting Rust’s extreme pursuit of efficiency. Use Rc for very high efficiency in a single-threaded environment; use Arc when cross-thread access is necessary.

Similarly, RefCell is not thread-safe, and if we want to use interior mutability in a multi-threaded environment, Rust provides Mutex and RwLock.

You should be familiar with these data structures. Mutex is a mutual exclusion, where the thread obtaining the mutex has exclusive access to the data. RwLock is a read-write lock, where the thread acquiring the write lock has exclusive access to the data when there are no write locks, but multiple read locks are allowed when there are no write locks. The rules for read-write locks are very similar to Rust’s borrowing rules, so they can be learned by analogy.

Mutex and RwLock are used to protect access to shared data in a multi-threaded environment. If the DAG we just built is to be used in a multi-threaded environment, we need to replace Rc<RefCell<T>> with Arc<Mutex<T>> or Arc<RwLock<T>>. We will discuss more about Arc/Mutex/RwLock in the concurrency section.

Summary #

We have gained a deeper understanding of ownership and mastered the use of data structures such as Rc/Arc, RefCell/Mutex/RwLock.

To bypass the “one value, one owner” restriction, we can use Rc/Arc smart pointers with reference counting. Rc is very efficient but can only be used in a single-threaded environment; Arc uses atomic structures, slightly less efficient, but safe for multi-threaded use.

However, Rc/Arc is immutable. If you want to modify its internal data, you need to introduce interior mutability. In a single-threaded environment, use RefCell inside Rc; in a multi-threaded environment, you can use Arc nested with Mutex or RwLock.

You can review this table for a quick recap:

Thought Questions #

  1. Run the following code, observe the errors, and read the documentation of std::thread::spawn. Find the cause of the problem, then modify the code to make it compile:
fn main() {
    let arr = vec![1];

    std::thread::spawn(|| {
        println!("{:?}", arr);
    });
}
  1. Can you write some code in the main() function to generate a string, then use std::thread::spawn to create a thread that allows the main thread and the new thread to share this string? Hint: Use std::sync::Arc.

  2. We saw an implementation of the Rc’s clone() method:

fn clone(&self) -> Rc {
    // Increase the reference count
    self.inner().inc_strong();
    // Create a new Rc structure with self.ptr
    Self::from_inner(self.ptr)
}

Did you notice that the method’s parameter is &self, which is an immutable reference, yet it calls self.inner().inc_strong()? Just by the function name, it increases self’s reference count, but why can an immutable reference from self change self’s internal data?

Feel free to share your thoughts in the comments section. Congratulations on completing your ninth Rust learning check-in. If you found it beneficial, feel free to share it with friends and invite them to discuss it with you.

Reference Materials #

  1. Implementation source code of clone() function
  2. Principle of Least Privilege
  3. Source code of Rc::new()
  4. Internal reference counting of Arc uses Atomic Usize
  5. Atomic Usize is the atomic type of usize: Documentation for std::sync::atomic
  6. Interior Mutability: Besides RefCell, Rust also provides Cell. If you want to learn more about RefCell and Cell, check out the Rust standard library documentation on cell.