43 Production Environment Real World Rust Project Components

43 Production Environment: What Elements Does a Real-World Rust Project Contain? #

Hello, I’m Chen Tian.

As our practical project, the KV server, is coming to an end, the course is also reaching its conclusion. Mastering the features of a language and being able to write code that applies these features to solve some minor problems is just the beginning, like practicing surfing in a swimming pool; to truly master a language, you need to experience the trials of the rough seas. Therefore, in the next three articles, we will focus on understanding the actual Rust application environment and look at how to build complex software systems with Rust.

Today, let’s first learn about the elements that a real-world Rust project should contain. We will mainly introduce content related to the development stage, including: code repository management, testing and continuous integration, documentation, feature management, compile-time processing, logging and monitoring, and we will also briefly introduce how to control the size of the executable file compiled from Rust code.

Code Repository Management #

Let’s start with the structure and management of a code repository. As mentioned before, Rust supports workspaces, which can hold many crates under one workspace. Have you noticed that the repo for this course on GitHub organizes the code for each class into separate crates within the same workspace?

When building an application or service, we should try to clearly divide the various modules and then implement them using different crates. This way, first, the efficiency of incremental compilation is higher (unchanged crates do not need to be recompiled), and second, crates can enforce boundaries for modules, clearly delineating the public and private interfaces.

Generally speaking, in addition to having a README.md in the root directory of the code repository, it is also best for each crate under the workspace to have its own README.md and examples to help users understand how to use the crate. If your project’s build process is not simply done through cargo build, it is advisable to provide a Makefile or similar scripts to automate the local build process.

When we commit code to the repository, we should perform basic checks locally, including code style checks, compilation checks, static checks, and unit tests. This ensures that each commit is complete and free from basic errors.

If you use Git to manage your code repository, you can use pre-commit hooks. Generally, we don’t need to write our own pre-commit hook scripts; we can use the pre-commit tool. Below is the pre-commit configuration I use in tyrchen/geektime-rust for your reference:

❯ cat .pre-commit-config.yaml
fail_fast: false
repos:
  - repo: <https://github.com/pre-commit/pre-commit-hooks>
    rev: v2.3.0
    hooks:
      - id: check-byte-order-marker
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: check-symlinks
      - id: check-yaml
      - id: end-of-file-fixer
      - id: mixed-line-ending
      - id: trailing-whitespace
  - repo: <https://github.com/psf/black>
    rev: 19.3b0
    hooks:
      - id: black
  - repo: local
    hooks:
      - id: cargo-fmt
        name: cargo fmt
        description: Format files with rustfmt.
        entry: bash -c 'cargo fmt -- --check'
        language: rust
        files: \.rs$
        args: []
      - id: cargo-check
        name: cargo check
        description: Check the package for errors.
        entry: bash -c 'cargo check --all'
        language: rust
        files: \.rs$
        pass_filenames: false
      - id: cargo-clippy
        name: cargo clippy
        description: Lint rust sources
        entry: bash -c 'cargo clippy --all-targets --all-features --tests --benches -- -D warnings'
        language: rust
        files: \.rs$
        pass_filenames: false
      - id: cargo-test
        name: cargo test
        description: unit test for the project
        entry: bash -c 'cargo test --all-features --all'
        language: rust
        files: \.rs$
        pass_filenames: false

After generating .pre-commit-config.yaml in the root directory and running pre-commit install, every time you run git commit, these series of checks will automatically be performed to ensure the basic correctness of the submitted code.

In addition, it is best for your code repository to declare a deny.toml in the root directory and use cargo-deny to ensure that the third-party dependencies you use do not have inappropriate licenses (such as not using any GPL/AGPL code), do not have suspicious sources (such as not being from a commit under a forked GitHub repo), and do not include versions with security vulnerabilities.

cargo-deny is very important for production code because modern software relies on too many dependencies, and dependency trees are too complex to be scrutinized by the human eye. By using cargo-deny, we can avoid a lot of risky third-party libraries.

Testing and Continuous Integration #

In the course, we have constantly emphasized the importance of unit testing in projects. In addition to being a necessary means to ensure software quality, unit tests are also the best auxiliary tool for interface design and iteration.

Good architecture and clear interface segregation will inevitably make unit testing straightforward and intuitive; conversely, cumbersome and difficult-to-write unit tests are warning you that there is a problem with the architecture or design of the software: either the coupling between modules is too strong (state is entangled), or the interface design is difficult to use.

Writing unit tests in Rust is very intuitive. The test code and module code are placed in the same file, making it easy to read and verify each other. We have already written numerous such unit tests.

However, there is another type of unit test that is included with the documentation, doctest. If you have become accustomed to looking at the source code when encountering problems during the course, you will have seen many doctests similar to the one below for the HashMap::get method:

/// Returns a reference to the value corresponding to the key.
///
/// The key may be any borrowed form of the map's key type, but
/// [`Hash`] and [`Eq`] on the borrowed form *must* match those for
/// the key type.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert(1, "a");
/// assert_eq!(map.get(&1), Some(&"a"));
/// assert_eq!(map.get(&2), None);
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where
    K: Borrow<Q>,
    Q: Hash + Eq,
{
    self.base.get(k)
}

Although we have not explicitly introduced documentation comments before, I’m sure you already know that they can be written with “///” for data structures, traits, methods, and functions.

Such comments can be written in markdown format, which can then be compiled with “cargo doc” into documentation similar to what you see on docs.rs. The code in the markdown will be compiled into doctest and then tested during “cargo test”.

In addition to unit tests, we often need integration tests and performance tests. In the subsequent implementation of the KV server, we will introduce integration tests to test the server’s basic functions and performance tests to test the performance of pub/sub. We will discuss these in more detail when we encounter them.

Introducing continuous integration early in a project is very necessary, even if there is not yet comprehensive test coverage.

If pre-commit is a guard for every person submitting code to prevent some basic errors from entering the code repository, so that when the team collaborates on code reviews, there is no need to focus on basic code formatting, then continuous integration is a guard in the team collaboration process to ensure that the code added to PRs or merged into the master branch is also without problems in a specific environment.

If you use GitHub to manage your code repository, you can use a GitHub workflow to perform continuous integration, such as the following basic Rust GitHub workflow definition:

❯ cat .github/workflows/build.yml
name: build

on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

jobs:
  build-rust:
    strategy:
      matrix:
        platform: [ubuntu-latest, windows-latest]
    runs-on: ${{ matrix.platform }}
    steps:
      - uses: actions/checkout@v2
      - name: Cache cargo registry
        uses: actions/cache@v1
        with:
          path: ~/.cargo/registry
          key: ${{ runner.os }}-cargo-registry
      - name: Cache cargo index
        uses: actions/cache@v1
        with:
          path: ~/.cargo/git
          key: ${{ runner.os }}-cargo-index
      - name: Cache cargo build
        uses: actions/cache@v1
        with:
          path: target
          key: ${{ runner.os }}-cargo-build-target
      - name: Install stable
        uses: actions-rs/toolchain@v1
        with:
          profile: minimal
          toolchain: stable
          override: true
      - name: Check code format
        run: cargo fmt -- --check
      - name: Check the package for errors
        run: cargo check --all
      - name: Lint rust sources
        run: cargo clippy --all-targets --all-features --tests --benches -- -D warnings
      - name: Run tests
        run: cargo test --all-features -- --test-threads=1 --nocapture
      - name: Generate docs
        run: cargo doc --all-features --no-deps

We handle code formatting, basic static checks, unit tests and integration testing, and generate documentation.

Documentation #

As mentioned before, Rust code documentation comments can be marked with “///”. For the code of our KV server discussed in the previous lecture, you can run “cargo doc” to generate the corresponding documentation.

Note that when you run cargo doc, not only the documentation of the crate you wrote will be generated, but the documentation of all the crates used in the dependencies will also be generated. So, if you want to refer to some used crate documentation without internet access, you can view the locally generated documentation. The image below is a screenshot of the documentation for the previous lecture’s KV server:

Most of the time, you only need to use “///” to write documentation, which is enough. However, if you need to write crate-level documentation, i.e., content that will appear on the crate documentation’s home page, you can use “//!”, like this, at the beginning of lib.rs or main.rs:

//! This is crate documentation

If you want to force yourself to document every public interface to maintain good documentation coverage of the system, then you can use #![deny(missing_docs)]. This way, anytime you forget to write documentation, it will cause a compilation error. If you feel a compilation error is too strict, you can also use a compilation warning: #![warn(missing_docs)]. We previously read the source code of the bytes crate; you can go back and look at the beginning of its lib.rs.- When discussing testing, we mentioned documentation tests.

Writing example code in the documentation and ensuring that this code can run properly is very important because when users view your crate’s documentation, they will often refer to your example code first to understand how to use the interface. Most of the time, you can write your sample code as you see fit, but when handling asynchronous processes and error handling, you need to do a little extra work.

Let’s look at an example of asynchronous handling in the documentation (code):

use std::task::Poll;
use futures::{prelude::*, stream::poll_fn};

/// Fibonacci algorithm
/// Example:
/// ```
/// use futures::prelude::*;
/// use playground::fib; // playground crate is called playground
/// # futures::executor::block_on(async {
/// let mut st = fib(10);
/// assert_eq!(Some(2), st.next().await);
/// # });
/// ```
pub fn fib(mut n: usize) -> impl Stream<Item = i32> {
    let mut a = 1;
    let mut b = 1;
    poll_fn(move |_cx| -> Poll<Option<i32>> {
        if n == 0 {
            return Poll::Ready(None);
        }
        n -= 1;
        let c = a + b;
        a = b;
        b = c;
        Poll::Ready(Some(b))
    })
}

Note the following two comment lines in the code:

/// # futures::executor::block_on(async {
/// ...
/// # });

A # appears after ///, indicating that this sentence will not appear in the example but will be included in the generated test code. The reason a block_on is needed is that our test code requires await, so we need to use the asynchronous runtime to run it.

In fact, the document test is equivalent to:

fn main() {
    fn _doctest_main_xxx() {
        use futures::prelude::*;
        use playground::fib; // playground crate is called playground

        futures::executor::block_on(async {
            let mut st = fib(10);
            assert_eq!(Some(2), st.next().await);
        });
    }
    _doctest_main_xxx()
}

Let’s look at another example of error handling in the documentation (code):

use std::io;
use std::fs;

/// Writing to a file
/// Example:
/// ```
/// use playground::write_file;
/// write_file("/tmp/dummy_test", "hello world")?;
/// # Ok::<_, std::io::Error>(())
/// ```
pub fn write_file(name: &str, contents: &str) -> Result<(), io::Error> {
    fs::write(name, contents)
}

In this example, we used ? for error handling, so we need to add a line of Ok::<_, io::Error> at the end to clarify the type of error returned.

If you want to learn more about Rust documentation, you can read the rustdoc book.

Feature Management #

As a compiled language, Rust supports conditional compilation.

Through conditional compilation, we can support different features within the same crate to meet various needs. For example, reqwest by default uses asynchronous interfaces, but if you need synchronous interfaces, you can use its “blocking” feature.

Reasonably using features in a production environment can make the core functions of a crate introduce fewer dependencies, and only bring in certain dependencies when a certain feature is enabled, making the final compiled library or executable as small as possible.

Features are advanced tools and are not covered in this course. If you are interested, you can read the cargo book to learn how to use features in your crate and how to use corresponding macros for conditional compilation in the code-writing process.

Compile-Time Processing #

When developing a software system, we need to consider which things need to be handled at compile-time, which things at load-time, and which things at run-time.

Some things we do not necessarily need to handle at runtime, we can do some preprocessing at compile-time to let the data be used in a better form during run-time.

For example, when performing Chinese character simplification and traditional conversion, you can pre-read the character correspondence table from a file, process it into Vec<(char, char)>, and then store it as bincode into the executable file. Let’s look at this example (code):

use std::io::{self, BufRead};
use std::{env, fs::File, path::Path};

fn main() {
    // If build.rs or the traditional-to-simplified correspondence table file changes, recompile
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=src/t2s.txt");

    // Generate OUT_DIR/map.bin for lib.rs to access
    let out_dir = env::var_os("OUT_DIR").unwrap();
    let out_file = Path::new(&out_dir).join("map.bin");
    let f = File::create(&out_file).unwrap();
    let v = get_kv("src/t2s.txt");
    bincode::serialize_into(f, &v).unwrap();
}

// Convert the split '&str' to 'char'
fn s2c(s: &str) -> char {
    let mut chars = s.chars();
    let c = chars.next().unwrap();
    assert!(chars.next().is_none());
    assert!(c.len_utf8() == 3);
    c
}

// Read file, convert each line of traditional and simplified correspondence strings to Vec<(char, char)>
fn get_kv(filename: &str) -> Vec<(char, char)> {
    let f = File::open(filename).unwrap();
    let lines = io::BufReader::new(f).lines();
    let mut v = Vec::with_capacity(4096);
    for line in lines {
        let line = line.unwrap();
        let kv: Vec<_> = line.split(' ').collect();
        v.push((s2c(kv[0]), s2c(kv[1])));
    }

    v
}

By doing this, we spend some extra time at compile-time, but significantly simplify the