46 Software Architecture How to Use Rust to Build Complex Systems

46 Software Architecture: How to Design Complex Systems with Rust? #

Hello, I’m Chen Tian.

Different departments care about different aspects of a software system. Product, operations, and sales departments are concerned with the product’s features, while the testing department is focused on the product’s defects. In addition to developing features and solving defects, the engineering department must continuously maintain and optimize the system’s architecture and reduce technical debt accumulated from the past.

In the long run, defects and technical debt have a negative effect on software systems, while features and architecture have a positive effect.

In terms of visibility to users, architecture and technical debt are less visible compared to features and defects. They are often neglected by company decision-makers for various reasons, especially when their KPI/OKR are full of numbers that seek instant benefit, with every quarter or half financial year being a do-or-die battle (win or go home). As long as they can achieve functional short-term goals, they are willing to sacrifice anything. Invisible and hard-to-directly-profit architectural design is often the first to be sacrificed.

However, architecture and architecture-related work can bring long-term returns.

Because when we add new features to the system, it inevitably increases the system’s defects, potentially introduces new technical debt, and disrupts the original stable architecture. This is a process of increasing entropy. Defects will drag down the performance of features, further worsening the technical debt in the system; and technical debt will slow down the introduction of new features, magnify existing and future defects, and destroy the current architecture. If this continues indefinitely, the entire system will enter a downward spiral until it can no longer sustain itself.

To avoid this from happening, we need to perform maintenance work on the architecture to reduce defects, fix technical debt, improve features, and ultimately bring the entire system back into an upward trend.

In my view, the outcome of a software system is a result of the interplay and tug-of-war among architecture, features, defects, and technical debt.

At the beginning of a project, in order to quickly achieve product-market fit, it is optimal to introduce technical debt to maximize the speed of construction. But this doesn’t mean we can neglect the design of architecture and bury our heads in coding.

Over the past twenty years, the Agile Manifesto and Lean Startup have had a significant negative impact on the software community. A large number of non-experts or those without a deep understanding of software engineering have overemphasized speed and misinterpreted MVP (Minimum Viable Product), overlooking the essential architecture and design work needed before starting, causing most technical debt to actually be the debt of architecture and design stages.

But at the outset of the product, how do we architect the system when the direction is unclear?

A waterfall-model-like iteration method, spending a lot of effort on architecture and design at the beginning of the product, often leads to overdesign and introduces unnecessary complications and “ingenious” structures that may never be used. However, the excessive pursuit of agility, acting first and worrying later, will quickly accumulate technical debt beyond sustainable levels.

So, for such a scenario, we should adopt progressive architectural design, find the core elements of architecture within the MVP requirements, build an original but complete structure (primitive whole), and then evolve around the core elements. For example (image source: Wikipedia):

Today, let’s talk about how to consider architectural design and how to build typical architectural styles with Rust. I hope that after you finish this lesson, your biggest takeaway is: before doing any development, make it a habit to do the necessary architecture and design first.

How to Think About Architectural Design? #

Architectural design is a very broad concept, and it’s hard to capture it in a nutshell. In the book “Fundamentals of Software Architecture,” the authors discuss architecture from four dimensions:

Structure: The style and structure of architecture, such as MVVM, microservices
Characteristics: The main indicators of architecture, such as scalability, fault tolerance, and performance
Decisions: The rigid rules of architecture, such as service calls that can only be done through interfaces
Design Principles: The architectural design principles, such as prioritizing message communication

You can understand it by referring to the following image (source: Fundamentals of Software Architecture):

Structure of Architecture #

First, let’s look at the style of architecture. Our KV server, which we have been iterating on in practice, adopts a layered structure that separates the network layer, business layer, and storage layer.

Although what the network layer would look like was unclear at first, this layering allowed later iterations of the network layer without affecting the business layer, whether it was adding TLS support or using yamux for multiplexing.

A complex, large system can often be handled using the principle of divide and conquer. We have shown such a diagram before: the most basic and common structure of an internet application:

From a high-level business perspective, we can deal with it layer by layer, with each layer being able to choose different structures such as microservice architecture, event-driven architecture, pipeline architecture, etc., and then each split component can be layered internally, such as separating data layer, business logic layer, and interface layer. This layering can be extended until the results can be executed within “days”.

During execution, we can choose the path related to MVP for development and examine the architectural design continuously, making corresponding modifications. If you look back at the evolution of the KV server, from its initial construction to the current almost complete version, you can appreciate the importance of starting with a complete but primitive structure and then evolving around the core.

Characteristics of Architecture #

Let’s take a look at the main indicators of architecture. As the image shows, a system has many indicators to measure its success, including but not limited to: high performance, availability, reliability, testability, scalability, security, flexibility, fault tolerance, self-healing, readability, and more.

However, these indicators are not equally important; different systems have different priorities.

For the KV server, we care about system performance/security/testability, so we used the most basic in-memory hashmap to ensure query performance, TCP + yamux for network performance, channels and dashmap for concurrency performance, and TLS for security. At the same time, we focus on clear interfaces and testability.

As you can see, once we have made decisions on architectural indicators, further design will prioritize the needs of these indicators.

Decisions of Architecture #

During the architectural design process, introducing rigid constraints or principles is very important. It’s like the “basic law” of architecture—untouchable. Many times, when you introduce a structure, you also bring in the constraints it entails, such as a microservices structure; its constraint is that all inter-service access must be completed through publicly exposed interfaces, with no private agreements between services.

This decision, which seems easy to understand now, was a shocking shout two decades ago. In 2002, Amazon was still a small company, and Jeff Bezos was several Bill Gates away from becoming the richest person. As an MBA who wasn’t particularly tech-savvy, he wrote a groundbreaking memo and enforced it at Amazon. The memo was simple, here’s the original text:

All teams will henceforth expose their data and functionality through service interfaces.

Teams must communicate with each other through these interfaces.

There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter.

All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

Anyone who doesn’t do this will be fired.

Thank you; have a nice day!

This memo paved the way for AWS, the giant cloud service empire. Bezos’ architectural vision still amazes me today. He accurately “saw” the future of cloud services and used architectural constraints to promote three points: independent services, inter-service interface calls, and making service interfaces available to outside developers.

Design Principles of Architecture #

Finally, let’s briefly talk about architectural design principles. Unlike architectural constraints, design principles are more of recommended practices, not no-go zones. When building a system, we need to leave room for flexibility so that during development and iteration, we can choose the right design based on the situation.

For example, for the KV server, it’s recommended to use TCP/yamux for network handling, but that doesn’t mean gRPC or even QUIC cannot be used; it’s recommended to use binary protobuf for client/server data transmission, but in some scenarios, if a text-based transmission method or a non-protobuf binary transmission method (like flatbuffer) is more appropriate, that part of the design can be fully replaced in the future.

How to Build Typical Architectural Styles with Rust? #

Let’s review the four aspects of architectural design mentioned just now:

Structure of architecture and structure
Characteristics of architecture’s main indicators
Decisions of architecture’s rigid rules
Design Principles of architecture’s design principles

Among these, the latter three points, the indicators of architecture, rigid rules, and design principles, are highly related to specific projects, and we don’t have patternized tools to apply them. But there are many fixed approaches to architectural styles. These approaches are often gradually formed in the practice of software development through trial and error.

The more common architectural styles in use include layered structure, pipeline structure, plugin structure, microservice structure, event-driven structure, etc.

Microservice structure is familiar to many, so I won’t elaborate here; an event-driven structure can be implemented using channels, and the pub/sub we built in the KV server includes aspects of the event-driven style, but a high-performance event-driven structure requires third-party message queues for support, such as kafka, nats, etc., which you can explore on your own with their recommended event-driven models.

However, no matter what kind of distributed architecture you use, the architecture within each service will still use layered structure, pipeline structure, and plugin structure. Let’s briefly talk about these three.

Layered Structure #

We’ve already talked about layering at the beginning; this is the most primitive and practical architectural style. A proverb in the software industry states:

All problems in computer science can be solved by another level of indirection.

This method of using layers to elegantly solve problems pervades the entire software industry.

The operating system is a middle layer between application software and hardware; virtual memory is a middle layer between linear memory and physical memory; virtual machines are middle layers between operating systems and bare metal; containers are middle layers between application programs and operating systems; the ISO’s OSI model divides the network into 7 layers, which still benefits us from a network structure designed decades ago.

Layering means defining the scope of responsibilities and interfaces for each layer. Once we have clear layering and the rigid rule that layers can only call each other through public interfaces and cannot cross-call, then the system has strong flexibility: the internal implementation of a layer can be completely replaced by different implementations without worrying about affecting the upstream and downstream.

In Rust, we can use traits for interface definition and interfaces for layering. Just like the KV server shows, separate the network layer and the business layer, so that iterations of the network layer or business layer do not affect each other’s behavior.

Pipeline Structure #

Most systems’ processing can be described with a pipeline structure. We can make each element in the processing a component with a consistent interface and single function, and according to different inputs, select the appropriate components to organize into a complete pipeline, then execute them sequentially.

The advantage of this is that during the execution, we don’t need to judge the input to decide what code to execute, as the code to be executed is already included in the pipeline. And the pipeline can be pre-built for the most common processes (fast path) during compilation or loading time. Only inputs that are not so common need to run-time construction of the appropriate pipeline (slow path). Once a new pipeline is constructed, it can be cached, so that next time it can be executed directly (fast path).

Let’s take a look at a typical pipeline processing structure:

This structure is very useful in practice, like Elixir’s Plug for handling network processes. The following is a pipeline structure I designed when processing blockchain TX:

Pipelines can be macro pipeline architectures or micro pipeline functions. Their greatest advantage is in fulfilling various complex and mutable requirements through the combination of different basic functions. Like Lego bricks, the most basic brick components are limited, but we can create infinitely many combinations.

Creating a pipeline structure with Rust is not complicated; you can take advantage of enums/traits to construct it. For example, the instance below (code):

use std::fmt;

pub use async_trait::async_trait;
pub type BoxedError = Box<dyn std::error::Error>;

/// rerun more than 5 times, consider it a failure
const MAX_RERUN: usize = 5;

/// plug execution result
#[must_use]
pub enum PlugResult<Ctx> {
    Continue,
    Rerun,
    Terminate,
    NewPipe(Vec<Box<dyn Plug<Ctx>>>),
    Err(BoxedError),
}

/// plug trait, any component in the pipeline needs to implement this trait
#[async_trait]
pub trait Plug<Ctx>: fmt::Display {
    async fn call(&self, ctx: &mut Ctx) -> PlugResult<Ctx>;
}

/// pipeline structure
#[derive(Default)]
pub struct Pipeline<Ctx> {
    plugs: Vec<Box<dyn Plug<Ctx>>>,
    pos: usize,
    rerun: usize,
    executed: Vec<String>,
}

impl<Ctx> Pipeline<Ctx> {
    /// create a new pipeline
    pub fn new(plugs: Vec<Box<dyn Plug<Ctx>>>) -> Self {
        Self {
            plugs,
            pos: 0,
            rerun: 0,
            executed: Vec::with_capacity(16),
        }
    }

    /// execute the entire pipeline, either to completion or error out
    pub async fn execute(&mut self, ctx: &mut Ctx) -> Result<(), BoxedError> {
        while self.pos < self.plugs.len() {
            self.add_execution_log();
            let plug = &self.plugs[self.pos];

            match plug.call(ctx).await {
                PlugResult::Continue => {
                    self.pos += 1;
                    self.rerun = 0;
                }
                PlugResult::Rerun => {
                    // pos stays the same, re-execute the current component, start accumulating rerun
                    self.rerun += 1;
                }
                PlugResult::Terminate => {
                    break;
                }
                PlugResult::NewPipe(v) => {
                    self.pos = 0;
                    self.rerun = 0;
                    self.plugs = v;
                }
                PlugResult::Err(e) => return Err(e),
            }

            // if rerun 5 times, return an error
            if self.rerun >= MAX_RERUN {
                return Err(anyhow::anyhow!("max rerun").into());
            }
        }

        Ok(())
    }

    pub fn get_execution_log(&self) -> &[String] {
        &self.executed
    }

    fn add_execution_log(&mut self) {
        self.executed.push(self.plugs[self.pos].to_string());
    }
}

You can run the full example code in this playground example.

At the beginning, a pipeline is initialized with two components [SecurityChecker, Normalizer]. During the execution of SecurityChecker, the pipeline is updated to [CacheLoader, DataLoader, CacheWriter]. Then, it exits with an error during DataLoader. So, the entire execution flow is as shown in the diagram below: -

Plugin (Microkernel) Structure #

The plugin structure (Plugin Architecture), also known as the microkernel structure (Microkernel Architecture), can give your system a small enough core and then add new features around this core in the form of plugins.

VS Code, which we use every day, is a classic plugin structure. Its core function is text editing, but with various plugins, it can support code syntax highlighting, error checking, formatting, and more.

When building a plugin structure, we need to design a stable set of interfaces to ensure interactions between the plugin and the core; we also need a registration mechanism to allow plugins to be registered or removed from the system.

In Rust, besides the normal use of traits and trait objects to build plugin mechanisms for internal system use, you can also allow third parties to expand the system’s capabilities via plugins through WebAssembly (via wasmer or wasmtime) or embedded scripting like rhai:

Summary #

Architecture is a complex matter, full of trade-offs. I highly admire a quote by Rich Hickey, the creator of Clojure, which roughly says, “You only talk about trade-offs when you have enough alternatives.”

When developing software, don’t rush to start coding. Let the requirements settle in your brain first, think about how this requirement relates to existing ones,