14 Types of Essential Traits to Master

Type System: Essential Traits to Master #

Hello, I am Chen Tian.

When developing software systems, we clarify requirements and carry out architectural analysis and design of those requirements. In this process, rationally defining and using traits can make the code structure highly extensible and make the system very flexible.

Previously, in the get hands dirty series, we’ve witnessed the tremendous power of traits, using the From/TryFrom trait for type conversion ([Lecture 5]), and the Deref trait ([Lecture 6]) to expose methods of the internal structure without revealing its internal code.

After the last two lectures, I believe you now have a deeper understanding of traits. In the process of solving practical issues, leveraging these traits will give you a clearer code structure, making it more in line with the habits of the Rust ecosystem in terms of reading and using. For example, if a data structure implements the Debug trait, then you can use {:?} to print the data structure; if your data structure implements From, then you can use the into() method for data conversion.

Traits #

The Rust standard library defines a large number of standard traits. Let’s count the ones we have already learned and see what we’ve accumulated:

  • Clone/Copy trait, defines the behavior of deep and shallow copying of data;
  • Read/Write trait, defines the behavior of I/O reading and writing;
  • Iterator, defines the behavior of iterators;
  • Debug, defines how to display data in debug mode;
  • Default, defines how default values of data types are generated;
  • From/TryFrom, defines the behavior of data conversion between types.

We will also learn about several important traits, including those related to memory allocation and release, marker traits that differentiate types and assist the compiler in type safety checks, traits for type conversion, operator-related traits, and Debug/Display/Default.

While studying these traits, you can also combine previous content to consciously think about why Rust is designed this way. This will deepen your understanding of the language and help you write more elegant Rust code.

First, let’s look at the memory-related Clone/Copy/Drop. We have already learned about these three traits when introducing ownership. Here, let’s study their definitions and use cases in more depth.

Clone trait #

Let’s start with Clone:

pub trait Clone {
  fn clone(&self) -> Self;

  fn clone_from(&mut self, source: &Self) {
    *self = source.clone()
  }
}

Clone trait has two methods, clone() and clone_from(), the latter has a default implementation, so usually we only need to implement the clone() method. You might wonder what the clone_from() method is for? It seems that a.clone_from(&b) is equivalent to a = b.clone().

Actually, it’s not. If a already exists and memory allocation occurs during cloning, using a.clone_from(&b) can avoid memory allocation and improve efficiency.

Clone trait can be implemented directly via a derive macro, which can simplify a lot of code. If every field in your data structure has implemented the Clone trait, you can use #[derive(Clone)], see the code below that defines the Developer structure and Language enum:

#[derive(Clone, Debug)]
struct Developer {
  name: String,
  age: u8,
  lang: Language
}

#[allow(dead_code)]
#[derive(Clone, Debug)]
enum Language {
  Rust,
  TypeScript,
  Elixir,
  Haskell
}

fn main() {
    let dev = Developer {
        name: "Tyr".to_string(),
        age: 18,
        lang: Language::Rust
    };
    let dev1 = dev.clone();
    println!("dev: {:?}, addr of dev name: {:p}", dev, dev.name.as_str());
    println!("dev1: {:?}, addr of dev1 name: {:p}", dev1, dev1.name.as_str())
}

If Language is not implemented for Clone, the Clone derive macro for Developer will result in a compilation error. Running this code shows that for the name, that is the String type’s Clone, its heap memory was also Cloned, so Clone is a deep copy, copying both stack and heap memory.

It’s worth noting that the interface of the clone method is &self, which is applicable to most situations; we only need a read-only reference of the existing data when cloning. But for Rc, which maintains reference counts during clone(), the clone() process changes itself, so structures like Cell that offer interior mutability are used to make changes; if you have similar requirements, you can reference this.

Copy trait #

Different from the Clone trait, the Copy trait has no additional methods; it’s just a marker trait. Its trait definition is:

pub trait Copy: Clone {}

Looking at this definition, if you want to implement the Copy trait, you must have implemented the Clone trait first, then implement an empty Copy trait. You may be confused: what’s the use of a trait without any behavior?

Such a trait may not have any behavior, but it can be used as a trait bound to perform type safety checks; hence, we call it a marker trait.

Like Clone, if all fields of a data structure have implemented Copy, you can also use the #[derive(Copy)] macro to implement Copy for the data structure. Try adding Copy to Developer and Language:

#[derive(Clone, Copy, Debug)]
struct Developer {
  name: String,
  age: u8,
  lang: Language
}

#[derive(Clone, Copy, Debug)]
enum Language {
  Rust,
  TypeScript,
  Elixir,
  Haskell
}

This code will fail because the String type does not implement Copy. Therefore, the Developer data structure can only clone, not copy. We know that if a type implements Copy, then its value will be copied during assignment and function calls; otherwise, ownership will be moved.

So in the above code, the Developer type will execute Move semantics during parameter passing, while Language will execute Copy semantics.

When talking about the ownership of mutable/immutable references, we mentioned that immutable references implemented Copy, while mutable references &mut T did not. Why is that?

Because if mutable references implemented the Copy trait, generating a mutable reference and assigning it to another variable would violate the ownership rule: only one mutable reference can be in the same scope. As seen, the Rust standard library carefully considers which structures can Copy and which cannot.

Drop trait #

We’ve talked in detail about the Drop trait in memory management. Let’s look at its definition again:

pub trait Drop {
    fn drop(&mut self);
}

Most scenarios don’t need to provide a Drop trait for data structures; the system will default to dropping each field of the data structure one by one. But there are two cases where you might need to implement Drop manually.

The first is when you want to do something when the data reaches the end of its lifecycle, such as logging.

The second is in scenarios where resource recycling is needed. The compiler doesn’t know which resources you used additionally, so it cannot help you drop them. For example, the release of lock resources is implemented in MutexGuard’s Drop to release the lock resources:

impl<T: ?Sized> Drop for MutexGuard<'_, T> {
    #[inline]
    fn drop(&mut self) {
        unsafe {
            self.lock.poison.done(&self.poison);
            self.lock.inner.raw_unlock();
        }
    }
}

It’s important to note that the Copy trait and the Drop trait are mutually exclusive; they cannot coexist. When you attempt to implement Copy and Drop for the same data type, the compiler will generate an error. This is easy to understand: Copy does a shallow copy bit by bit, so it’s assumed that the copied data has no resources to release; Drop, however, is created to release additional resources.

Let’s write a piece of code to aid understanding. In the code, we forcefully use Box::into_raw to get a pointer to the heap memory and put it into the RawBuffer structure, thus taking over the release of that heap memory.

Although RawBuffer can implement the Copy trait, this way Drop trait cannot be implemented. If the program insists on writing this way, it will lead to memory leaks because the heap memory that should be released hasn’t been released.

But this operation does not breach Rust’s correctness guarantee: even if you Copy N copies of RawBuffer, since you cannot implement Drop trait, that same piece of heap memory pointed to by RawBuffer will not be released, so there will be no use after free memory safety issues (code):

use std::{fmt, slice};

// Note that we implemented Copy here since *mut u8/usize supports Copy
#[derive(Clone, Copy)]
struct RawBuffer {
    // Raw pointers are indicated with *const/*mut, different from references &
    ptr: *mut u8,
    len: usize,
}

impl From<Vec<u8>> for RawBuffer {
    fn from(vec: Vec<u8>) -> Self {
        let slice = vec.into_boxed_slice();
        Self {
            len: slice.len(),
            // After into_raw, Box stops managing this memory, RawBuffer needs to handle the release
            ptr: Box::into_raw(slice) as *mut u8,
        }
    }
}

// If RawBuffer implemented the Drop trait, it could free heap memory when the owner exits
// Then, the Drop trait would conflict with the Copy trait; either not implementing Copy or not implementing Drop
// If Drop is not implemented, it will lead to memory leaks, but it will not harm correctness
// For example, there will not be issues like use after free.
// You can try removing the following comment to see what issues arise
// impl Drop for RawBuffer {
//     #[inline]
//     fn drop(&mut self) {
//         let data = unsafe { Box::from_raw(slice::from_raw_parts_mut(self.ptr, self.len)) };
//         drop(data)
//     }
// }

impl fmt::Debug for RawBuffer {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let data = self.as_ref();
        write!(f, "{:p}: {:?}", self.ptr, data)
    }
}

impl AsRef<[u8]> for RawBuffer {
    fn as_ref(&self) -> &[u8] {
        unsafe { slice::from_raw_parts(self.ptr, self.len) }
    }
}

fn main() {
    let data = vec![1, 2, 3, 4];

    let buf: RawBuffer = data.into();

    // Because buf allows Copy, a copy is made here
    use_buffer(buf);

    // buf is still usable
    println!("buf: {:?}", buf);
}

fn use_buffer(buf: RawBuffer) {
    println!("buf to die: {:?}", buf);

    // No need to drop specifically here, written out just to illustrate that the Copy'd out buf was Dropped
    drop(buf)
}

In terms of code safety, which has greater harm, memory leaks or use after free? Certainly, the latter. Rust’s baseline is memory safety, so it opts for the lesser of two evils.

In reality, no programming language can guarantee no occurrence of man-made memory leaks. For example, during program runtime, developers may neglect only adding but not removing from a hash table would result in memory leaks. But Rust guarantees that even if developers are negligent, no memory safety issues will occur.

I suggest you read the code comments carefully, try to uncomment the Drop trait that was commented out, then modify the code to compile successfully, and think about the great care taken by Rust in this design.

Marker Traits: Sized/Send/Sync/Unpin #

Okay, after discussing the main traits related to memory, let’s look at marker traits.

We’ve seen a marker trait: Copy. Rust also supports several other marker traits: Sized/Send/Sync/Unpin.

Sized trait is used to mark types with a specific size. When using generic parameters, the Rust compiler automatically adds a Sized constraint to the generic parameters, like the Data structure and the process_data function that deals with Data:

struct Data<T> {
    inner: T,
}

fn process_data<T>(data: Data<T>) {
    todo!();
}

This is equivalent to:

struct Data<T: Sized> {
    inner: T,
}

fn process_data<T: Sized>(data: Data<T>) {
    todo!();
}

Most of the time, we hope to automatically add such a constraint because it makes generics a fixed size at compile time and can be passed to functions as parameters. Without this constraint, T is a type of variable size and process_data function would not compile.

But sometimes this automatic constraint is not suitable. In rare cases, we need T to be a variable type. How to do? Rust provides ?Sized to get rid of this constraint.

If a developer explicitly defines T: ?Sized, then T can have any size. If you remember what was said about Cow in ([Lecture 12]), you might recall that Cow’s generic parameter B is constrained with ?Sized:

pub enum Cow<'a, B: ?Sized + 'a> where B: ToOwned,
{
    // Borrowed data
    Borrowed(&'a B),
    // Owned data
    Owned(<B as ToOwned>::Owned),
}

This allows B to be [T] or str types, both are variable in size. Note that Borrowed(&‘a B) size is fixed because it’s a reference to B, and the size of a reference is fixed.

Send/Sync #

After discussing Sized, let’s look at Send/Sync, defined as:

pub unsafe auto trait Send {}
pub unsafe auto trait Sync {}

Both of these traits are unsafe auto traits, where auto means that the compiler will automatically implement them on structures under appropriate circumstances, and unsafe represents that implementing these traits might violate Rust’s memory safety principles. If developers manually implement these two traits, they are responsible for their safety themselves.

Send/Sync is the foundation of Rust’s concurrency safety:

  • If a type T implements the Send trait, it means T can be safely moved from one thread to another, that is, ownership can move between threads.
  • If a type T implements the Sync trait, it means &T can be safely shared among multiple threads. A type T satisfies the Sync trait if and only if &T satisfies the Send trait.

Looking at the roles of Send/Sync in thread safety, we can see, if a type T: Send, then T’s exclusive access within a thread is thread-safe; if a type T: Sync, then T’s read-only sharing between threads is safe.

For our own defined data structures, if all fields inside implement Send/Sync, then this data structure will automatically be marked as Send/Sync. Almost all primitive data structures support Send/Sync, which means most self-defined data structures meet Send/Sync. In the standard library, the data structures that do not support Send/Sync mainly include:

  • Raw pointers *const T/*mut T. They are unsafe and are neither Send nor Sync.
  • UnsafeCell does not support Sync. That is, any data structure that used Cell or RefCell does not support Sync.
  • Reference counting Rc does not support Send nor Sync. Therefore Rc cannot cross threads.

We’ve previously introduced Rc/RefCell ([Lecture 9]), and let’s see what happens if we try to use Rc/RefCell across threads. In Rust, to create a new thread, you need to use std::thread::spawn:

pub fn spawn<F, T>(f: F) -> JoinHandle<T> 
where
    F: FnOnce() -> T,
    F: Send + 'static,
    T: Send + 'static,

Its parameter is a closure (to be discussed later), which requires Send + ‘static:

  • ‘static means the closure’s captured free variables must be a type with ownership or a reference with a static lifetime;
  • Send means the ownership of these captured free variables can move from one thread to another.

From this interface, we can conclude: if you pass Rc between threads, it won’t compile because Rc’s implementation does not support Send and Sync. Let’s verify with code (code):

// Rc is neither Send nor Sync
fn rc_is_not_send_and_sync() {
    let a = Rc::new(1);
    let b = a.clone();
    let c = a.clone();
    thread::spawn(move || {
        println!("c= {:?}", c);
    });
}

As expected, this code does not pass. - Error message indicating that Rc<!-- raw HTML omitted --> cannot be sent between threads

So, can RefCell be moved between threads? RefCell implements Send but not Sync, so apparently, it works (code):

fn refcell_is_send() {
    let a = RefCell::new(1);
    thread::spawn(move || {
        println!("a= {:?}", a);
    });
}

Checking, this is OK.

Since Rc cannot Send, we cannot use Rc> data across threads. What about using Arc which supports Send/Sync? Can we use Arc> to obtain a type that can be shared and modified between multiple threads, is that possible (code)?

// RefCell is now held by multiple Arcs, although Arc is Send/Sync, but RefCell is not Sync
fn refcell_is_not_sync() {
    let a = Arc::new(RefCell::new(1));
    let b = a.clone();
    let c = a.clone();
    thread::spawn(move || {
        println!("c= {:?}", c);
    });
}

It does not work.

Because Arc’s internal data is shared and requires