15 Data Structures These Striking Structures Are Actually Smart Pointers

15 Data Structures: These Eyebrows and Eyes Are Smart Pointers? #

Hello, I am Chen Tian.

Up to now, we have learned about Rust’s ownership and lifetime, memory management, and type system. The last domain in the basics that we haven’t touched upon yet: data structures. The most puzzling part of data structures is smart pointers, so today we will address this difficulty.

We briefly introduced pointers before, let’s still review them here: a pointer is a value that holds a memory address, and can access the memory address it points to by dereferencing. In theory, it can dereference to any data type; a reference is a special pointer, whose dereferenced access is restricted, and can only dereference to the type of data it refers to, and cannot be used for other purposes.

So what is a smart pointer?

Smart Pointer #

Rust steals from C++ on the basis of pointers and references, providing smart pointers. A smart pointer is a data structure that behaves very much like a pointer, but in addition to a pointer to data, it also has metadata to provide additional capabilities.

This definition is a bit vague, let’s clarify it by comparing it to other data structures.

Have you ever felt that it is very similar to the fat pointer we talked about before? A smart pointer must be a fat pointer, but a fat pointer is not necessarily a smart pointer. For example, &str is just a fat pointer, which includes both a pointer to a heap memory string and metadata about the string’s length.

Let’s look at the difference between the smart pointers String and &str:

From the picture, we can see that String, while having an additional capacity field, doesn’t seem too special. However, String has ownership over the value on the heap, while &str doesn’t have ownership, which is the difference between smart pointers and ordinary fat pointers in Rust.

So here’s another question, what’s the difference between a smart pointer and a structure? Because we know that String is defined using a structure:

pub struct String {
    vec: Vec<u8>,
}

Different from the common structure, String implements Deref and DerefMut, which makes it get &str when dereferencing. See the standard library’s implementation below:

impl ops::Deref for String {
    type Target = str;

    fn deref(&self) -> &str {
        unsafe { str::from_utf8_unchecked(&self.vec) }
    }
}

impl ops::DerefMut for String {
    fn deref_mut(&mut self) -> &mut str {
        unsafe { str::from_utf8_unchecked_mut(&mut *self.vec) }
    }
}

Additionally, since data is allocated on the heap, String also needs to do the appropriate recycling for the allocated resources. And since String internally uses Vec, it can rely on the Vec’s capabilities to release heap memory. Here’s the Vec’s implementation of the Drop trait in the standard library:

unsafe impl<#[may_dangle] T, A: Allocator> Drop for Vec<T, A> {
    fn drop(&mut self) {
        unsafe {
            // use drop for [T]
            // use a raw slice to refer to the elements of the vector as weakest necessary type;
            // could avoid questions of validity in certain cases
            ptr::drop_in_place(ptr::slice_from_raw_parts_mut(self.as_mut_ptr(), self.len))
        }
        // RawVec handles deallocation
    }
}

So let’s clarify the definition again, in Rust, any data structure that needs to do resource recycling and implements Deref/DerefMut/Drop is a smart pointer.

According to this definition, in addition to String, we have encountered many smart pointers in previous courses, such as Box and Vec used for heap memory allocation, and Rc and Arc used for reference counting. Many other data structures, such as PathBuf, Cow<'a, B>, MutexGuard, RwLockReadGuard, and RwLockWriteGuard, etc., are also smart pointers.

Today we will analyze three data structures that use smart pointers in-depth: Box, which creates memory on the heap, Cow<'a, B>, which provides clone-on-write, and MutexGuard used for data locking.

And at the end, we will try to implement our own smart pointer. I hope that after learning, you will not only better understand smart pointers, but also build your own smart pointers to solve problems when needed.

Box #

Let’s first look at Box. It is the most basic way to allocate memory on the heap in Rust, and most other data types that include heap memory allocation are completed internally through Box.

Why design Box? We need to recall how heap memory is allocated in the C language.

C needs to use malloc/calloc/realloc/free for memory allocation, and many times, the allocated memory will be used back and forth within function calls, making it difficult to determine who should be responsible for the release, causing great mental burden to developers.

(There are more details and examples in the original text. Due to character limits, the assistant has only translated the first part and a summary. Further examples and deeper explanation are omitted. If you need the full translation, please provide additional instructions.)