23 How to Use Generics in Practical Programming With Type Systems

Type System: How to Use Generics in Practice? #

Hello, I’m Chen Tian.

From this lecture on, we’ve arrived at the advanced section. In the advanced section, we’ll further solidify our understanding of the type system before moving on to topics such as network handling, Unsafe Rust, and FFI.

Why do we consider the type system as the cornerstone of the advanced section? As you may recall from the discussion of the rgrep code, when building a system that’s more readable, flexible, and testable, we have to use traits and generic programming to varying extents.

Thus, it can be said that in Rust development, generic programming is a skill we must master. Whenever you build a data structure or function, you should ask yourself: Do I need to fix the type right now? Can I delay this decision to as late as possible to leave room for the future?

In “Clean Architecture,” Uncle Bob Said: The job of an architect is not to make decisions, but to delay decisions for as long as possible, building a program without making major decisions now so that they can be made later with enough information. So, if we can use generics to delay decisions, the architecture of the system can be flexible enough to better handle future changes.

Today, we’ll talk about how to use generic programming in practice to delay decisions. If you’re not yet confident with Rust’s generic programming, I recommend reviewing lectures [12] and [13], or read Chapter 10 of The Rust Programming Language as supplementary material.

Gradual Constraints on Generic Data Structures #

Before diving into the main topic, let’s quickly review how using generic parameters in the definition and implementation of data structures offers certain benefits, taking the standard library’s BufReader as an example.

Consider this small example:

pub struct BufReader<R> {
    inner: R,
    buf: Box<[u8]>,
    pos: usize,
    cap: usize,
}

BufReader abstracts the R to be read generically. That is to say, it doesn’t matter whether R is currently a File, a Cursor, or directly a Vec. When defining the struct, we didn’t further limit R, which is the most common way to use generics.

At the implementation stage, we can place different restrictions on R as needed. To what extent should this restriction be detailed? Just add just enough restriction to meet implementation needs.

For example, when providing capacity() and buffer() which don’t need any special capabilities of R, you can impose no constraints:

impl<R> BufReader<R> {
    pub fn capacity(&self) -> usize { ... }
    pub fn buffer(&self) -> &[u8] { ... }
}

However, when implementing new(), since Read trait’s methods are used, it’s necessary to specify that the passed R satisfies the Read constraint:

impl<R: Read> BufReader<R> {
    pub fn new(inner: R) -> BufReader<R> { ... }
    pub fn with_capacity(capacity: usize, inner: R) -> BufReader<R> { ... }
}

Similarly, when implementing Debug, it’s possible to demand that R satisfies the Debug trait constraint:

impl<R> fmt::Debug for BufReader<R>
where
    R: fmt::Debug
{
    fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result { ... }
}

If you spend more time reviewing all the implementations of BufReader’s interfaces in bufreader.rs, you’ll also find that BufReader used the Seek trait during its implementation.

Overall, the code that impls BufReader is divided into different blocks according to different constraints. This is a very typical way to implement generic code and is worth learning so that we can apply this method in our own code.

By using generic parameters, BufReader leaves the decision to the users. We also saw this in the midterm exam implementation of rgrep in the last lecture, where we observed how different types are provided for BufReader to meet different usage scenarios in both the test and rgrep implementation code.

Three Common Scenarios for Generic Parameter Usage #

Generic parameter usage is briefly reviewed here, as it seems you’ve already mastered it well. Today, let’s dive into the main event and learn how to use generics in practice.

Let’s start with generic parameters. There are three common scenarios:

  • Using generic parameters to delay the binding of data structures;
  • Using generic parameters and PhantomData to declare types that are not directly used in the data structure but required during the implementation;
  • Using generic parameters to allow the same data structure to have different implementations for the same trait.

Using Generic Parameters for Delayed Binding #

Let’s begin with the familiar concept of using generic parameters for delayed binding. In the [previous part] of the KV server, I constructed a Service data structure:

/// Service data structure
pub struct Service<Store = MemTable> {
    inner: Arc<ServiceInner<Store>>,
}

It employs a generic parameter Store, which has a default value of MemTable. The advantage of specifying a generic parameter default value is that you can omit the generic parameter when using it, relying on the default value instead. This generic parameter can later be gradually constrained in subsequent implementations:

impl<Store> Service<Store> {
    pub fn new(store: Store) -> Self { ... }
}

impl<Store: Storage> Service<Store> {
    pub fn execute(&self, cmd: CommandRequest) -> CommandResponse { ... }
}

Similarly, in generic functions, you can constrain using impl Storage or <Store: Storage>:

pub fn dispatch(cmd: CommandRequest, store: &impl Storage) -> CommandResponse { ... }
// Equivalent to
pub fn dispatch<Store: Storage>(cmd: CommandRequest, store: &Store) -> CommandResponse { ... }

This use of generic parameters for type delay binding is now presumably very familiar to you, allowing you to integrate such delayed type binding into your development process.

Using Generic Parameters and Phantom Data (PhantomData) to Provide Additional Types #

After familiarizing yourself with the basic uses of generic parameters, let’s ask you a question: You need to design a User and Product data structure, both of which have a u64 type id. However, you want each data structure’s id to only be comparable with ids of the same type, meaning that if user.id and product.id are compared, the compiler should be able to automatically report an error, rejecting such behavior. How would you achieve this?

Take a moment to think about it.

The immediate thought might be to use a custom data structure called Identifier to represent id:

pub struct Identifier<T> {
    inner: u64,
}

Then, in User and Product, you would use Identifier to bind Identifier to their types, ensuring ids of different types cannot be compared. With this idea in mind, you could quickly write up code (code):

#[derive(Debug, Default, PartialEq, Eq)]
pub struct Identifier<T> {
    inner: u64,
}

#[derive(Debug, Default, PartialEq, Eq)]
pub struct User {
    id: Identifier<Self>,
}

#[derive(Debug, Default, PartialEq, Eq)]
pub struct Product {
    id: Identifier<Self>,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn id_should_not_be_the_same() {
        let user = User::default();
        let product = Product::default();

        // Two ids cannot be compared because they belong to different types
        // assert_ne!(user.id, product.id);

        assert_eq!(user.id.inner, product.id.inner);
    }
}

However, it won’t compile. Why is that?

That’s because Identifier does not use the generic parameter T in its definition. The compiler considers T to be redundant, so it compels the T to be removed to compile successfully. However, removing T means User and Product ids can now be compared, thwarting our goal. What to do? Right when we were confident about using generics to rule, we face such a trivial issue that leaves us dismayed.

Stay calm. If you’ve worked with any other language that supports generics, whether it’s Java, Swift, or TypeScript, you might have come across the concept of Phantom Type. The method we thought of earlier would work in Swift/TypeScript because their compilers automatically treat extra generic parameters as Phantom types, like the following TypeScript example, which compiles:

// NotUsed is allowed
class MyNumber<T, NotUsed> {
    inner: T;
    add: (x: T, y: T) => T;
}

But Rust has a much stricter type system. Rust doesn’t want redundant generic parameters when defining types, so the Rust compiler ruthlessly rejects such attempts, slamming the door shut.

Don’t worry, though. Having experienced the need for Phantom Types, Rust offers a window called PhantomData: allowing us to hold Phantom Types with PhantomData. PhantomData is usually translated as “Ghost Data” in Chinese, a name that carries an eerie charm, but it is widely used to handle additional, currently unused but implementing-process-needed generic parameters.

Let’s give it a try:

use std::marker::PhantomData;

#[derive(Debug, Default, PartialEq, Eq)]
pub struct Identifier<T> {
    inner: u64,
    _tag: PhantomData<T>,
}

#[derive(Debug, Default, PartialEq, Eq)]
pub struct User {
    id: Identifier<Self>,
}

#[derive(Debug, Default, PartialEq, Eq)]
pub struct Product {
    id: Identifier<Self>,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn id_should_not_be_the_same() {
        let user = User::default();
        let product = Product::default();

        // Two ids cannot be compared because they belong to different types
        // assert_ne!(user.id, product.id);

        assert_eq!(user.id.inner, product.id.inner);
    }
}

Bingo! It compiles. By using PhantomData, the compiler permits the existence of the generic parameter T.

Now we’ve confirmed that when defining data structures, additional generic parameters that aren’t needed immediately should be ‘owned’ with PhantomData to prevent compiler errors. PhantomData, true to its name, actually has zero length—it’s a Zero-Sized Type (ZST), as if it does not exist. Its only purpose is as a type marker.

Let’s write another example to deepen our understanding of PhantomData (code):

use std::{
    marker::PhantomData,
    sync::atomic::{AtomicU64, Ordering},
};

static NEXT_ID: AtomicU64 = AtomicU64::new(1);

pub struct Customer<T> {
    id: u64,
    name: String,
    _type: PhantomData<T>,
}

pub trait Free {
    fn feature1(&self);
    fn feature2(&self);
}

pub trait Personal: Free {
    fn advance_feature(&self);
}

impl<T> Free for Customer<T> {
    fn feature1(&self) {
        println!("feature 1 for {}", self.name);
    }

    fn feature2(&self) {
        println!("feature 2 for {}", self.name);
    }
}

impl Personal for Customer<PersonalPlan> {
    fn advance_feature(&self) {
        println!(
            "Dear {}(as our valuable customer {}), enjoy this advanced feature!",
            self.name, self.id
        );
    }
}

pub struct FreePlan;
pub struct PersonalPlan(f32);

impl<T> Customer<T> {
    pub fn new(name: String) -> Self {
        Self {
            id: NEXT_ID.fetch_add(1, Ordering::Relaxed),
            name,
            _type: PhantomData::default(),
        }
    }
}

impl From<Customer<FreePlan>> for Customer<PersonalPlan> {
    fn from(c: Customer<FreePlan>) -> Self {
        Self::new(c.name)
    }
}

/// Subscribe to become a paid user
pub fn subscribe(customer: Customer<FreePlan>, payment: f32) -> Customer<PersonalPlan> {
    let _plan = PersonalPlan(payment);
    // Store plan to DB
    // ...
    customer.into()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_customer() {
        // Initially a free user
        let customer = Customer::<FreePlan>::new("Tyr".into());
        // Using free features
        customer.feature1();
        customer.feature2();
        // After some time, finds the product satisfactory and is willing to pay
        let customer = subscribe(customer, 6.99);
        customer.feature1();
        customer.feature1();
        // Paid users unlock new skills
        customer.advance_feature();
    }
}

In this example, Customer has an additional type T.

By using type T, we can classify users into different levels, such as a free user being Customer<FreePlan> and a paid user being Customer<PersonalPlan>. Free users can be converted into paid users, unlocking more benefits. Handling such states with PhantomData allows for compile-time state checking, avoiding the burden and potential errors of runtime checks.

Using Generic Parameters to Provide Multiple Implementations #

The uses of generic parameters to delay binding and combined with PhantomData to provide additional types are common in generic parameter usage.

Sometimes, we might want different implementations for the same trait. For example, an equation could be linear or quadratic, and we might want to implement different Iterators for different types. We can do it like this (code):

use std::marker::PhantomData;

#[derive(Debug, Default)]
pub struct Equation<IterMethod> {
    current: u32,
    _method: PhantomData<IterMethod>,
}

// Linear growth
#[derive(Debug, Default)]
pub struct Linear;

// Quadratic growth
#[derive(Debug, Default)]
pub struct Quadratic;

impl Iterator for Equation<Linear> {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        self.current += 1;
        if self.current >= u32::MAX {
            return None;
        }

        Some(self.current)
    }
}

impl Iterator for Equation<Quadratic> {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        self.current += 1;
        if self.current >= u16::MAX as u32 {
            return None;
        }

        Some(self.current * self.current)
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_linear() {
        let mut equation = Equation::<Linear>::default();
        assert_eq!(Some(1), equation.next());
        assert_eq!(Some(2), equation.next());
        assert_eq!(Some(3), equation.next());
    }

    #[test]
    fn test_quadratic() {
        let mut equation = Equation::<Quadratic>::default();
        assert_eq!(Some(1), equation.next());
        assert_eq!(Some(4), equation.next());
        assert_eq!(Some(9), equation.next());
    }
}

This code is straightforward to understand, but you might wonder: What’s the benefit? Why not have two separate data structures, LinearEquation and QuadraticEquation, each implementing Iterator?

The point is, the Equation itself doesn’t share much code in this example. However, if the Equation has a lot of shared code aside from the Iterator implementation logic, and we want to support cubic, quartic equations in the future…, then using a generic data structure to unify the same logic and use the specific type of generic parameter to handle the varying logic is vital.

Let’s look at a real-life example, AsyncProstReader, from the async-prost library we’ve previously used in the KV server. The async-prost library can process the data transmitted in TCP or other protocol streams into individual frames. AsyncProstReader provides different implementations for AsyncDestination and AsyncFrameDestination, which you don’t need to understand in detail, only study the design of the interface (code):

/// A marker that indicates that the wrapping type is compatible with `AsyncProstReader` with Prost support.
#[derive(Debug)]
pub struct AsyncDestination;

/// a marker that indicates that the wrapper type is compatible with `AsyncProstReader` with Framed support.
#[derive(Debug)]
pub struct AsyncFrameDestination;

/// A wrapper around an async reader that produces an asynchronous stream of prost-decoded values
#[derive(Debug)]
pub struct AsyncProstReader<R, T, D> {
    reader: R,
    pub(crate) buffer: BytesMut,
    into: PhantomData<T>,
    dest: PhantomData<D>,
}

Although this data structure uses three generic parameters, the structure actually only uses one, R, which can be any structure that implements AsyncRead (which we’ll see later). The other two generic parameters, T and D, are not needed in the data structure definition but will be needed in the constraints during implementation. Among them,

  • T is the type deserialized from the data read from R, constrained as prost::Message in implementation.
  • D is a type placeholder that will be specified as either AsyncDestination or AsyncFrameDestination, as needed.

We can first imagine how the type parameter D will be used. When implementing AsyncProstReader, we hope to provide one implementation when using AsyncDestination and another one when using AsyncFrameDestination. That is, type parameter D will be specified as a certain type during the impl.

With this idea in mind, let’s see how AsyncProstReader implements Stream for D. You don’t need to worry about what Stream is or how it’s implemented. The implementation is not important