13 How to Use Trait to Define Interface in Type System

13 Type System: How to Use Traits to Define Interfaces? #

Hello, I am Chen Tian.

Through the learning of the last lecture, we have understood the essence of the Rust type system. As a tool for defining, checking, and processing types, the type system ensures that the data type processed by a certain operation is what we hope for.

With Rust’s powerful generic support, we can easily define and use generic data structures and generic functions, and use them to handle parameter polymorphism, making input and output parameter types more flexible and enhancing code reusability.

Today we continue to talk about the other two ways in polymorphism: ad-hoc polymorphism and subtype polymorphism, to see what problems they can solve, how they are implemented, and how to use them.

If you don’t quite remember the definitions of these two polymorphisms, let’s review them briefly: ad-hoc polymorphism includes operator overloading, which means the same behavior has many different implementations; using a subtype as a parent type, such as using Cat as Animal, belongs to subtype polymorphism.

The implementation of these two polymorphisms in Rust is both related to traits, so we have to understand what a trait is first, and then see how to use traits to handle these two kinds of polymorphism.

What is a trait? #

A trait in Rust is an interface, which defines the behavior of types that use this interface. You can compare it to the languages you are familiar with, where a trait in Rust is like an interface in Java, a protocol in Swift, or a type class in Haskell.

When developing complex systems, we often emphasize the separation of interface and implementation. Because this is a good design habit, it isolates the caller from the implementer, and as long as both sides develop according to the interface, they can be immune to each other’s internal changes.

Traits do just that. It can extract the behavior in the data structure separately, making it shareable among multiple types; it can also serve as a constraint, in generic programming, limiting the parametric type must fit the behavior it stipulates.

Basic traits #

Let’s look at how to define a basic trait. Here, take the trait std::io::Write from the standard library as an example, you can see that this trait defines a series of method interfaces:

pub trait Write {
    fn write(&mut self, buf: &[u8]) -> Result<usize>;
    fn flush(&mut self) -> Result<()>;
    fn write_vectored(&mut self, bufs: &[IoSlice<'_>]) -> Result<usize> { ... }
    fn is_write_vectored(&self) -> bool { ... }
    fn write_all(&mut self, buf: &[u8]) -> Result<()> { ... }
    fn write_all_vectored(&mut self, bufs: &mut [IoSlice<'_>]) -> Result<()> { ... }
    fn write_fmt(&mut self, fmt: Arguments<'_>) -> Result<()> { ... }
    fn by_ref(&mut self) -> &mut Self where Self: Sized { ... }
}

These methods are also known as associated functions. In a trait, methods can have default implementations. For this Write trait, you only need to implement two methods, write and flush, the rest have default implementations.

If you compare a trait to a parent class, the type that implements a trait is like a subclass, then methods with default implementations are like methods in a subclass that can be overridden but are not required to be.

When defining methods just now, we saw two special keywords: Self and self.

Self represents the current type, for example, if the File type implements Write, then the Self used during implementation refers to File.
self, when used as the first parameter of a method, is actually a shorthand for self: Self, so &self is self: &Self, and &mut self is self: &mut Self.

Simply talking about the definition doesn’t give a deep understanding; let’s build a BufBuilder structure that implements the Write trait and explain with code (Write trait code):

use std::fmt;
use std::io::Write;

struct BufBuilder {
    buf: Vec<u8>,
}

impl BufBuilder {
    pub fn new() -> Self {
        Self {
            buf: Vec::with_capacity(1024),
        }
    }
}

// Implementing the Debug trait to print a string
impl fmt::Debug for BufBuilder {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", String::from_utf8_lossy(&self.buf))
    }
}

impl Write for BufBuilder {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        // Append buf to the end of BufBuilder
        self.buf.extend_from_slice(buf);
        Ok(buf.len())
    }

    fn flush(&mut self) -> std::io::Result<()> {
        // Since it is operations in memory, flush is not necessary
        Ok(())
    }
}

fn main() {
    let mut buf = BufBuilder::new();
    buf.write_all(b"Hello world!").unwrap();
    println!("{:?}", buf);
}

From the code, we can see that we have implemented the write and flush methods, and the rest are used with default implementations, so the implementation of the BufBuilder for the Write trait is complete. If write or flush are not implemented, the Rust compiler will report an error, you can try it yourself.

Once a data structure implements a certain trait, all the methods inside that trait can be used, such as here we called buf.write_all().

So how is write_all() called? Let’s go back and see the signature of write_all:

fn write_all(&mut self, buf: &[u8]) -> Result<()>

It accepts two parameters: &mut self and &[u8], the first parameter passes the mutable reference of the buf variable, and the second parameter passes b”Hello world!“.

Basic trait exercise #

Alright, after we understand the basic definition and use of traits, let’s try defining a trait to consolidate.

Assuming we are making a string parser that can parse certain parts of the string into a certain type, then we might define the trait like this: it has a method parse, which accepts a string reference and returns Self.

pub trait Parse {
  fn parse(s: &str) -> Self;
}

This parse method is a static method of the trait because the first parameter has nothing to do with self, so it should be called using T::parse(str).

Let’s try implementing parse for the u8 data structure, for example, “123abc” would parse out the number 123, while “abcd” would parse out 0.

To achieve this, we need to introduce a new library Regex to extract the necessary content using regular expressions. Furthermore, we need to use str::parse function to convert a string containing numbers into a number.

The complete code is as follows (Parse trait exercise code):

use regex::Regex;
pub trait Parse {
    fn parse(s: &str) -> Self;
}

impl Parse for u8 {
    fn parse(s: &str) -> Self {
        let re: Regex = Regex::new(r"^[0-9]+").unwrap();
        if let Some(captures) = re.captures(s) {
            // Take the first match and convert its captured digits to u8
            captures
                .get(0)
                .map_or(0, |s| s.as_str().parse().unwrap_or(0))
        } else {
            0
        }
    }
}

#[test]
fn parse_should_work() {
    assert_eq!(u8::parse("123abcd"), 123);
    assert_eq!(u8::parse("1234abcd"), 0);
    assert_eq!(u8::parse("abcd"), 0);
}

fn main() {
    println!("result: {}", u8::parse("255 hello world"));
}

This implementation is not difficult; if you are interested, you can try implementing this Parse trait for f64, for example, “123.45abcd” needs to be parsed into 123.45.

In implementing f64, do you feel that other than a slight variation in the type and regex for capturing, the entire code is basically repetitive with the above code? As developers, we hope not to repeat ourselves (DRY), so such code feels awkward and uncomfortable. Is there a better way?

Yes! We introduced generic programming in the last lecture, so when implementing traits, we can also use generic parameters. Note that certain restrictions must be placed on generic parameters.

First, not every type can be parsed out of a string. In the example, we can only handle numerical types, and the type must also be handled by str::parse.

Looking at the documentation, str::parse is a generic function returning any type that implements the FromStr trait, so the first restriction on generic parameters is that they must implement the FromStr trait.

Second, in the above code, when strings cannot be correctly parsed, 0 is directly returned, indicating that it cannot be handled, but after we use generic parameters, we can’t return 0 because 0 may not be a value in a particular type that conforms to generic parameters. What to do?

The purpose of returning 0 is to handle cases that cannot be processed, and to return a default value. In Rust’s standard library, there is a Default trait, which is implemented by almost all types to provide default values for data structures. So another limitation of the generic parameter is Default.

Alright, we have a basic idea, so let’s take a look at the code (Parse trait DRY code):

use std::str::FromStr;

use regex::Regex;
pub trait Parse {
    fn parse(s: &str) -> Self;
}

// We constrain T to must implement both FromStr and Default
// In doing so, we can use the methods of these two traits
impl<T> Parse for T
where
    T: FromStr + Default,
{
    fn parse(s: &str) -> Self {
        let re: Regex = Regex::new(r"^[0-9]+(.[0-9]+)?").unwrap();
        // Create a closure that generates a default value, mainly to simplify subsequent code
        // The type returned by Default::default() can be inferred from context, it's Self
        // And we agreed that Self, that is T, needs to implement the Default trait
        let d = || Default::default();
        if let Some(captures) = re.captures(s) {
            captures
                .get(0)
                .map_or(d(), |s| s.as_str().parse().unwrap_or(d()))
        } else {
            d()
        }
    }
}

#[test]
fn parse_should_work() {
    assert_eq!(u32::parse("123abcd"), 123);
    assert_eq!(u32::parse("123.45abcd"), 0);
    assert_eq!(f64::parse("123.45abcd"), 123.45);
    assert_eq!(f64::parse("abcd"), 0f64);
}

fn main() {
    println!("result: {}", u8::parse("255 hello world"));
}

By implementing a trait for a generic parameter with constraints, one piece of code implements the Parse trait for types such as u32/f64, very concise. However, are there any problems with looking at this code? When strings cannot be correctly parsed, we return a default value. Isn’t it supposed to return an error?

Yes. If a default value is returned here, it will be confused with the case of parsing “0abcd”, it is not known whether the parsed 0 is due to an error or whether 0 should indeed be parsed.

So a better way is for the parse function to return a Result:

pub trait Parse {
    fn parse(s: &str) -> Result<Self, E>;
}

But here, the Result’s E is baffling: The error information to be returned is not determined at the time of the trait definition, different implementers can use different error types, and the definer of the trait is best able to leave this flexibility to the implementer of the trait. What to do?

Think about it, since the trait allows internal methods, that is, associate functions, can it further include associated types? The answer is affirmative.

Trait with Associated Types #

Rust allows traits to include associated types, and when implementing, it also needs to implement the associated types. Let’s see how to add an associated type to the Parse trait:

pub trait Parse {
    type Error;
    fn parse(s: &str) -> Result<Self, Self::Error>;
}

With the associated type Error, the Parse trait can now return a reasonable error when an error occurs, see the modified code (Parse trait DRY.2 code):

use std::str::FromStr;

use regex::Regex;
pub trait Parse {
    type Error;
    fn parse(s: &str) -> Result<Self, Self::Error>
    where
        Self: Sized;
}

impl<T> Parse for T
where
    T: FromStr + Default,
{
    // Define the associated type Error as String
    type Error = String;
    fn parse(s: &str) -> Result<Self, Self::Error> {
        let re: Regex = Regex::new(r"^[0-9]+(.[0-9]+)?").unwrap();
        if let Some(captures) = re.captures(s) {
            // When there is an error, we return Err(String)
            captures
                .get(0)
                .map_or(Err("failed to capture".to_string()), |s| {
                    s.as_str()
                        .parse()
                        .map_err(|_err| "failed to parse captured string".to_string())
                })
        } else {
            Err("failed to parse string".to_string())
        }
    }
}

#[test]
fn parse_should_work() {
    assert_eq!(u32::parse("123abcd"), Ok(123));
    assert_eq!(
        u32::parse("123.45abcd"),
        Err("failed to parse captured string".into())
    );
    assert_eq!(f64::parse("123.45abcd"), Ok(123.45));
    assert!(f64::parse("abcd").is_err());
}

fn main() {
    println!("result: {:?}", u8::parse("255 hello world"));
}

In the code above, we allow users to delay the determination of the error type until the trait is implemented. This kind of trait with associated types is more flexible and abstract than ordinary traits.

The parameters or return values in the methods of traits can be expressed with associated types, and when implementing traits with associated types, only need to provide the concrete type for the associated type.

Generic Traits #

So far, we have step by step understood the definition and use of basic traits, as well as more complex and flexible traits with associated types. So combined with generics introduced in the last lecture, have you thought of this question: Can the definition of traits also support generics?

For example, to define a Concat trait that allows data structures to be concatenated, then naturally, we hope that String can be concatenated with String, with &str, and even with any data structure that can be converted to String. At this time, Traits also need to support generics.

Let’s take a look at how the operators are overloaded in the standard library, take std::ops::Add as an example, which is a trait for providing addition operation:

pub trait Add<Rhs = Self> {
    type Output;
    #[must_use]
    fn add(self, rhs: Rhs) -> Self::Output;
}

This trait has a generic parameter Rhs, representing the value on the right side of the plus sign, which is used at the second parameter spot of the add method. Here Rhs defaults to Self, which means if you use the Add trait and do not provide a generic parameter, then both the right value and the left value of the addition must be of the same type.

Let’s define a complex number type and try using this trait (Add trait exercise code 1):

use std::ops::Add;

#[derive(Debug)]
struct Complex {
    real: f64,
    imagine: f64,
}

impl Complex {
    pub fn new(real: f64, imagine: f64) -> Self {
        Self { real, imagine }
    }
}

// Implementation for the Complex type
impl Add for Complex {
    type Output = Self;

    // Note the first parameter add is self, which will move ownership
    fn add(self, rhs: Self) -> Self::Output {
        let real = self.real + rhs.real;
        let imagine = self.imagine + rhs.imagine;
        Self::new(real, imagine)
    }
}

fn main() {
    let c1 = Complex::new(1.0, 1f64);
    let c2 = Complex::new(2 as f64, 3.0);
    println!("{:?}", c1 + c2);
    // c1 and c2 have been moved, so the following sentence can't compile
    // println!("{:?}", c1 + c2);
}

A complex number has a real part and an imaginary part, the real parts of two complex numbers are added, and the imaginary parts are added to get a new complex number. Note that the first parameter of add is self, which will move ownership, so after completing the addition of two complex numbers c1 and c2, according to the ownership rules, they can no longer be used.

Thus, the Add trait is convenient to use for types that implement the Copy trait such as u32, f64, etc., but for our defined Complex type