20 4 Steps How to Read Rust Source Code Better

20 4 Steps: How to Read Rust Source Code More Effectively? #

Hello, I’m Chen Tian.

Up to now, we’ve pretty much covered the basics of Rust. This is not to say that we’ve sifted through the basics in detail since I can only offer you an approach to learning Rust and clear the barriers to entry. As the saying goes, “the master leads the way, but practice is up to the individual”. To level up and battle through the world of Rust, you’ll need to explore and make efforts on your own.

Although I can’t fight those battles for you, we can talk about some basic techniques for battling. That’s why, before we start heavily incorporating a lot of new third-party libraries, it’s very necessary to discuss this very important skill: how to read source code better.

Actually, the ability to read source code is a lifelong beneficial development skill, yet it is often overlooked. Most of the programming books I’ve read rarely teach how to read code, just as there are countless books in the world, but topics like “how to read a book” are extremely rare.

Of course, before we address the “how”, we need to understand “why”.

Why Read Source Code? #

If you’ve paid attention to every lecture of the course, you will find that we’re constantly referencing the source code of the standard library, allowing us to not only learn the basics but also to discuss the primary source materials, namely the source code.

If others’ summaries of knowledge are like fruits, then the source code is the seed of these fruits. Consuming only the fruits is a very passive act of accepting others’ offerings and it’s not easy to discern the quality of the fruits; however, if you bear your own fruits from the bare seeds of the source code, indeed it requires patience and nurturing in the early stages, but at the time of harvest, everything is within your control.

As developers, we deal with code every day. After years of basic education and professional training, we all know how to “write” code, or at least how to copy and amend code. However, not many of us can read code, and even fewer can actually understand the source code of some large projects.

This odd situation, if we were to investigate, is because all the education and training we have received emphasized how to write code rather than how to read it. Moreover, once we enter the workforce, most scenarios are quite isolated— we only need to understand a part of the system to carry out our work, reading code that’s irrelevant to our work seems pointless.

So what’s the problem if we haven’t read much code, as work seems to proceed normally? Let’s compare it to writing, which shares many similarities with coding.

As kids, we all went through the process of reading texts, reciting them, and composing essays. Beyond learning grammar and syntax, we read various works of great writers over the years and went through various writing exercises to build up our writing abilities. Hence, writing is built on extensive reading.

However, the process of writing code is quite different. After learning basic syntax and trying out several examples, we skip the stage of reading masterpieces and rocket straight to writing business code on our own.

Skipping extensive code reading poses three problems:

First, without enough accumulation, we tend to develop a habit of writing code driven by StackOverflow.

When encountering code that we’re not sure how to write, we scour the web for ready-made answers, look for a highly voted one to copy-paste and tweak to make do, meeting the immediate need first. While coding, encountering problems triggers debugging mode, either setting countless breakpoints to trace step by step or printing information everywhere in an attempt to patch up the faulty code, resulting in a tearful history of debugging the entire coding process.

Second, because our fundamentals are not solid, our progress from learning while coding is the slowest. The reason is simple, the experiences and lessons learned by predecessors have to be retraced step by step in the slowest possible way.

Lastly, there’s a very easy-to-overlook ceiling problem: the upper limit of development skills of the strongest engineer around you becomes your limit.

But if you value reading source code and accumulating knowledge over time, as well as mastering certain reading techniques, these three problems can be easily solved. It’s like describing a beauty in your writing; you can immediately think of phrases like “skin as white as snow, eyes as bright as stars, teeth as lustrous as pearls, atmosphere as serene as orchids…”, instead of being stuck with a mere “wow, pretty”. Reading source code is crucial to advance beyond the beginner stage of only being able to say “wow, pretty” when it comes to coding.

Three Major Benefits #

The first benefit of reading source code is that you have the source of knowledge, which allows you to discern truth from belief, rather than blindly following authority. For instance, when discussing Rc previously ([Lecture 9]), we used the source code to introduce Box::leak, answering why Rc can break Rust’s shackles of single ownership; when talking about FnOnce ([Lecture 19]), the source code clarified why FnOnce can only be called once.

In the future, when you share with others, you can answer these questions confidently, instead of referring to what is said in “Chen Tian’s First Lesson in Rust”, addressing the first problem we discussed earlier.

Through the source code, we also learned many tricks. For example, how Rc::clone() uses internal mutability to maintain the immutability constraint of the Clone trait ([Lecture 9]); how Iterator methods support lazy evaluation by continuously constructing new Iterator data structures ([Lecture 16]).

In the future, when you code, you can use these techniques, advancing from the elementary level of “wow, pretty” to being able to articulate thoughts like “a smile that topples cities, another smile that topples nations”. This is the second benefit of reading source code: by seeing others’ code and accumulating materials, it broadens your thinking, enabling you to write code with inspiration, as if the writing just flows effortlessly.

The last issue that can be addressed is breaking through the ceiling. Accumulating materials is the foundation, and inspired ideas turn these materials into a coherent thread of knowledge.

The more excellent code you read, the more it stimulates thought, which leads to more reading, forming a flywheel effect that enriches your knowledge. And the integration and mastery of that knowledge ultimately forms the third major benefit of reading code: by understanding and absorbing others’ ideas, you can distill the essence and ultimately form your own thoughts or wisdom.

Of course, transitioning from materials to knowledge, and then to wisdom, requires long-term accumulation and is not an overnight feat. Understanding “why” gives us three directions for learning, so now we move on to address “how” next, sharing my methodology to give some momentum to your accumulation.

How Do We Read Source Code? #

Let’s take the third-party library Bytes as an example to see how to read source code. I hope you follow today’s rhythm, regardless of whether you’re concerned about the implementation of bytes or not, first using it as a blueprint to familiarize yourself with the basic methods, then extending your reading to more code, such as hyper, nom, tokio, tonic, etc.

Bytes is a highly efficient library under Tokio for handling network data, with simple code totaling 3.5k LoC (not including 2.1k LoC of comments), and including tests making up 5.3k. The code structure is straightforward:

❯ tree src
src
├── buf
│   ├── buf_impl.rs
│   ├── buf_mut.rs
│   ├── chain.rs
│   ├── iter.rs
│   ├── limit.rs
│   ├── mod.rs
│   ├── reader.rs
│   ├── take.rs
│   ├── uninit_slice.rs
│   ├── vec_deque.rs
│   └── writer.rs
├── bytes.rs
├── bytes_mut.rs
├── fmt
│   ├── debug.rs
│   ├── hex.rs
│   └── mod.rs
├── lib.rs
├── loom.rs
└── serde.rs

The structure is clear and easy to read.

First, let me explain the order for reading Rust code: start with the outline of the crate, first understand what the target code can do and how to use it, then learn the core traits, understand the main data structures, start writing some example code, and finally delve into the code related to your interests.

As for why to read it in this way, we can specify while reading.

Step 1: Start with the Outline #

We start with the outline from the documentation. Rust’s documentation system is among the top tier of all programming languages, if not the best, it’s one of the best. The documents are tightly integrated with the code, allowing for easy navigation back and forth.

Nearly all Rust library documentation is under docs.rs, such as the Bytes documentation accessible through docs.rs/bytes: Image

First, read the documentation of the crate so you can quickly understand what the crate is for, like reading a book’s preface. In addition, you can also check README.md in the source code root directory for supplementary information.

After getting a general understanding, you can delve deeper into the content of your interest. Let’s see it from the perspective of a beginner.

For Bytes, we see that it has two traits Buf/BufMut, as well as two data structures Bytes/BytesMut, without any crate-level functions. The next step is to read in-depth.

My usual order of examining is: trait → struct → functions/methods. This parallels our approach to thinking when writing code:

  • Initially define the system’s behaviors from the process requirements, determining what interfaces or traits are needed;
  • Then consider what states the system has, defining the relevant data structures or structs;
  • Finally get into the implementation details, including how to implement traits for data structures, what algorithms the structures themselves have, how to string the whole process together, etc.

Step 2: Familiarize Yourself with Core Traits’ Behaviors #

Let’s first look at traits; taking the Buf trait as an example. Click into it, and the main page presents the definition and a usage example of this trait. Image

Notice the “Required Methods” and “Provided Methods” on the left navigation bar. The former are the methods that need to be implemented for this trait, while the latter are default methods. This means once you implement the three methods: advance(), chunk(), and remaining() for this trait, you automatically get all of the default methods implemented. Of course, you may also override certain default methods.

Continuing down the nav bar, you can see for which “foreign types” the bytes implemented the Buf trait and what its implementors are. This is valuable information, revealing the ecosystem around this trait: Image

For the other data types (foreign type):

  • Slices &[u8], VecDeque both implemented the Buf trait;
  • If T satisfies the Buf trait, then &mut T, Box also implemented the Buf trait;
  • If T implements AsRef<[u8]>, then Cursor also has the Buf trait.

So, thinking back, it makes sense that the example given in the previous image where you can use methods from the Buf trait on &[u8]:

use bytes::Buf;

let mut buf = &b"hello world"[..];

assert_eq!(b'h', buf.get_u8());
assert_eq!(b'e', buf.get_u8());
assert_eq!(b'l', buf.get_u8());

let mut rest = [0; 8];
buf.copy_to_slice(&mut rest);

assert_eq!(&rest[..], &b"lo world"[..]);

Moreover, this also tells us, if in the future someone’s data structure T implements the Buf trait, they need not go to the trouble of implementing Buf trait for Box, &mut T in various scenarios.

Looking at this, we’ve not delved into the source code, but we can learn some insights on defining a trait:

  • Once a trait is defined, consider standard library data structures that might implement the trait.
  • If someone’s type T implements your trait, consider whether the derivative types like &T, &mut T, Box can automatically implement this trait.

Okay, continue looking at “implementors” in the left nav bar. Bytes, BytesMut, Chain, Take all implemented the Buf trait, letting us know what data structures within this crate implemented this trait so that we’ll know what they can do when we encounter them.

Now, we have a basic understanding of the Buf trait and its ecosystem. After this, you can delve into learning a few directions:

  • How is a default method of the Buf trait implemented, such as get_u8()?
  • How do other types implement the Buf trait, such as &[u8]?

You don’t even need to clone bytes’ source code, docs.rs allows you to complete these code readings directly, which is very convenient.

Step 3: Mastering the Main Structs #

After scanning the basic functionalities of traits, let’s look at data structures. Taking Bytes as an example: Image

Generally, a good document will provide an introduction to the data structure, how to use it, things to pay attention to when using it, and some code examples. After grasping the basic introduction of the structure, continue looking into its internal structure:

/// ```text
///
///    Arc ptrs                   +---------+
///    ________________________/| Bytes 2 |
///  /                          +---------+
/// /         +-----------+     |         |
/// |_________/ |  Bytes 1  |     |         |
/// |           +-----------+     |         |
/// |           |           | ___/ data     | tail
/// |      data |      tail |/              |
/// v           v           v               v
/// +-----+---------------------------------+-----+
/// | Arc |     |           |               |     |
/// +-----+---------------------------------+-----+
/// ```
pub struct Bytes {
    ptr: *const u8,
    len: usize,
    // inlined "trait object"
    data: AtomicPtr<()>,
    vtable: &'static Vtable,
}

pub(crate) struct Vtable {
    /// fn(data, ptr, len)
    pub clone: unsafe fn(&AtomicPtr<()>, *const u8, usize) -> Bytes,
    /// fn(data, ptr, len)
    pub drop: unsafe fn(&mut AtomicPtr<()>, *const u8, usize),
}

The code of data structures often contains some comments that help you understand its design. For Bytes, following the code:

  • It uses a raw pointer and length, resembling a slice, pointing to a contiguous memory segment;
  • It also uses AtomicPtr and a manually built Vtable to simulate the behavior of a trait object.
  • Seeing the structure of Vtable, one can infer that the clone() and drop() behaviors of Bytes are dynamic, which is an interesting discovery.

However, don’t rush to explore how it implements this behavior and continue with the document.

Similar to traits, there is valuable information on the left navigation bar (both the above image and the following one) about the data structure: what methods it has (Methods), what traits it implements (Trait implementations), as well as the implementations of Auto and Blanket traits. Image

As you can see, Bytes have implemented quite a few standard traits besides the Buf trait we’ve just discussed.

This provides a new insight: our own data structures should also implement necessary standard traits as much as possible, including but not limited to: AsRef, Borrow, Clone, Debug, Default, Deref, Drop, PartialEq/Eq, From, Hash, IntoIterator (if it’s a collection type), PartialOrd/Ord, etc.

Notice that aside from these traits, Bytes also implemented Send/Sync. If you look at many data structures we’ve come across, like Vec, Send/Sync are automatically implemented, but Bytes needs to implement them manually:

unsafe impl Send for Bytes {}
unsafe impl Sync for Bytes {}

This is because, as we discussed earlier, if your data structure contains types that don’t support Send/Sync, the compiler defaults to not allowing the structure to be used safely across threads, and Send/Sync traits aren’t automatically added. However, if you can ensure cross-thread safety, you can manually implement them using unsafe impl.

Understanding which traits a data structure has implemented is very helpful to understand how it can be used. Thus, we must learn the major traits in the standard library well and use them frequently, preferably forming muscle memory. That way, when learning other people’s code, the efficiency will be high. For example, when I look at the Bytes data structure, I scan to see which traits it has implemented, and I can basically know:

  • Which data structures can be converted into Bytes, i.e., how to construct the Bytes structure;
  • Who Bytes can be compared with;
  • If Bytes can be used across threads;
  • In its use, which behavior Bytes is more similar to (look at the Deref trait).

This is the advantage of muscle memory. You can browse different libraries under the Data structures category on crates.io, like IndexMap, and see which standard traits are implemented by it. If you encounter unknown ones, you can check the documentation of those traits, and you might want to review [Lecture 14] (