18 Why Rust's Error Handling Is Different

18 Error Handling: Why is Rust’s Error Handling Unique? #

Hello, I’m Chen Tian.

As developers who have been battered by online services, we all have a deep understanding of Murphy’s Law. In any system, given enough time or a sufficient user base, errors that have a very low probability of occurring are bound to happen. For example, the disk on the host may be filled up, the database system may experience split-brain scenarios, upstream services such as CDNs may go down, or even the hardware carrying the service itself may become damaged, etc.

Because when we write practice code, we generally only pay attention to the normal path and can ignore the error path that occurs with a small probability; but in an actual production environment, any error that is not properly handled can be a hidden danger for the system, affecting the user experience of developers in the lighter case and bringing safety problems to the system in the severe case, which must not be neglected.

In a programming language, control flow is the core process of the language, and error handling is an important part of control flow.

A language with excellent error handling capabilities will greatly reduce the damage of error handling to the overall process, making our coding smoother and also reducing the cognitive load when reading. -

For us developers, error handling includes these parts:

When an error occurs, use the appropriate error type to capture the error.
After the error is captured, it can be processed immediately, or it can be delayed to the place where it must be dealt with later, which involves error propagation.
Finally, based on different types of errors, return a suitable error message to users that help them understand the issue.

As a programming language that pays extreme attention to user experience, Rust has absorbed the essence of error handling from other excellent languages, especially Haskell, and presents it in its own unique way.

The Mainstream Methods of Error Handling #

Before we go into the details of Rust’s error handling methods, let’s slow down a bit and look at the three mainstream methods of error handling and how other languages apply these methods.

Using Return Values (Error Codes) #

Using return values to represent errors is the oldest and most practical way. Its use is widespread, from function return values to operating system syscall error codes errno, process exit error codes retval and even HTTP API status codes, you can see this method in action.

For example, in the C language, if fopen(filename) cannot open a file, it will return NULL, and the caller needs to check whether the return value is NULL to handle the error accordingly.

Let’s look at another example:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)

Just looking at this interface, it’s hard to understand how errors are returned when reading a file. From the documentation, we know that if the returned size_t is inconsistent with the input size_t, it’s either an error or EOF (End of File), and the caller has to use ferror to get more detailed error information.

In C, carrying error information through return values has many limitations. The return value has its original semantics, and forcibly embedding the error type into the original semantics of the return value requires up-to-date and comprehensive documentation to ensure that developers can correctly distinguish between the normal return and error return.

Therefore, Go has extended this approach, allowing functions to carry a separate error object when returning. For example, the fread mentioned above can be defined as follows in Go:

func Fread(file *File, b []byte) (n int, err error)

With such an approach in Go, differentiating error returns from normal returns is a big step forward compared to C. - However, the fundamental problem with using return values remains: The error must be handled or explicitly propagated at the time of the call.

If function A calls function B, when A returns an error, it has to convert B’s error into A’s error and display it. The following illustration shows this: -

The code written in this way can be very verbose and not very user-friendly for us developers. If not handled, the error information will be lost, causing a hidden danger.

Moreover, most errors in a production environment are nested. An error thrown during an SQL execution might be a server error, but the deeper cause might be an abnormal state of the TLS session connecting to the database server.

In fact, in addition to knowing the surface error of the server error, we need to be more clear about the underlying reasons for the server error. Because the surface error of the server will be provided to the end-user, while the deep reasons for the error need to be provided to us, the maintainers of the service. But such nested errors are difficult to perfectly describe in both C and Go.

Using Exceptions #

Due to the limitations of return values in propagating errors and their many restrictions, many languages, including Java, use exceptions to handle errors.

You can think of exceptions as a kind of Separation of Concerns: The production of errors and the handling of errors are completely separated, callers do not need to worry about errors, and the called party does not insist that callers care about errors.

Any place in the program that may go wrong can throw an exception; and exceptions can be automatically passed layer by layer through stack unwinding until they encounter a place that captures exceptions. If it backtraces to the main function and no one captures it, the program will crash. The following illustration shows this: -

Using exceptions to return errors can greatly simplify the error-handling process, solving the propagation problem of return values.

However, the process of exception return shown in the above figure seems straightforward, just like how a transaction in a database is rolled back in its entirety when an error occurs. But in reality, this process is far more complex than you might imagine, and it requires extra attention to exception safety.

Let’s look at the following (pseudo) code used to switch background images:

void transition(...) {
  lock(&mutex);
  delete background;
  ++changed;
  background = new Background(...);
  unlock(&mutex);
}

Imagine if the creation of the new background fails and throws an exception, bypassing the subsequent processes and backtracking all the way to the try-catch code, then the mutex that is locked here cannot be released. The existing background is cleared, and the new background is not created, putting the program into a strange state.

Indeed, in most cases, it is easier to write code with exceptions, but when exception safety cannot be guaranteed, the correctness of the program is greatly challenged. Therefore, when using exception handling, you need to pay special attention to exception safety, especially in a concurrent environment.

Ironically, the first principle to ensure exception safety is: avoid throwing exceptions. This is also why Go, in its language design, avoided conventional exceptions and reverted to the old practice of using return values.

Another serious problem with exception handling is that developers will abuse exceptions. No matter the error, no matter how serious or recoverable it is, they just throw an exception. In the necessary place, capture it and be done with it. What they don’t realize is that the overhead of exception handling is much greater than handling return values, and abuse can lead to a lot of extra overhead.

Using the Type System #

The third method of error handling is using the type system. In fact, when using return values to handle errors, we already saw the embryonic form of the type system.

Since error information can be carried through existing types or provided through multiple return values, then representing errors through types and using a composite type that internally contains the normal return type and the error return type to enforce error handling and propagation through the type system could reach a better effect, couldn’t it?

Indeed. This approach is used extensively in functional programming languages with strong type system support, such as Haskell/Scala/Swift. The most typical composite type that contains an error type is Haskell’s Maybe and Either types.

Maybe type allows data to contain a value (Just) or no value (Nothing), which is useful for simple errors that do not require types. To use the file opening example again, if we only care about the successfully opened file handle, then Maybe is enough.

For more complex error handling, we can use the Either type. It allows data to be Left a or Right b. Here, a represents the error data type, and b can be a successful data type. -

We can see that this method is still returning errors through return values, but the error is encapsulated in a complete, mandatory handling type, which is safer than Go’s approach.

As mentioned earlier, a significant disadvantage of returning errors through return values is that the error needs to be immediately handled or explicitly propagated by the caller. However, the advantage of using types like Maybe/Either to handle errors is that we can use functional programming methods to simplify error handling, such as map, fold, and other functions, to make the code relatively less redundant.

It’s worth noting that many unrecoverable errors, such as the error “disk is full, unable to write,” can avoid error propagation layer by layer using exception handling, making the code concise and efficient. Therefore, most languages that use the type system to handle errors also use exception handling as a supplement.

Rust’s Error Handling #

Since Rust was born later, it has had the opportunity to learn from existing languages about various ways of handling errors. For Rust, the best approach compared to the current ones is to use the type system to build the main error handling process.

Rust borrowed from Haskell, constructing the Option type corresponding to Maybe, and the Result type corresponding to Either. -

Option and Result #

Option is an enum, defined as follows:

pub enum Option<T> {
    None,
    Some(T),
}

It can carry the simplest error types of having/no value. - Result is a more complex enum, defined as follows:

#[must_use = "this `Result` may be an `Err` variant, which should be handled"]
pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

When a function fails, it can return Err(E); otherwise, Ok(T).

We see that there’s a must_use annotation in the Result type declaration. The compiler will treat all types with the must_use annotation especially: if the value corresponding to the type is not explicitly used, a warning will be issued. This ensures that errors are properly dealt with. As shown in the following illustration: -

Here, if we call the read_file function and discard the return value, due to the #[must_use] annotation, the Rust compiler issues a warning, requiring us to use its return value.

Although this can greatly prevent the neglect of explicit error handling, if we do not care about the error and only need to propagate the error, we will still write relatively verbose code like in C or Go. What to do?

? Operator #

Fortunately, in addition to having a powerful type system, Rust also has metaprogramming capabilities. Rust originally provided the try! macro to simplify explicit error handling, and later evolved try! into the ? operator to further enhance user experience.

So in Rust code, if you just want to propagate errors without handling them on-site, you can use the ? operator, for example (code):

use std::fs::File;
use std::io::Read;

fn read_file(name: &str) -> Result<String, std::io::Error> {
  let mut f = File::open(name)?;
  let mut contents = String::new();
  f.read_to_string(&mut contents)?;
  Ok(contents)
}

With the ? operator, Rust makes the cost of error propagation as negligible as exception handling, while avoiding many problems of exception handling.

The ? operator is internally expanded into code like this:

match result {
  Ok(v) => v,
  Err(e) => return Err(e.into())
}

So, we can easily write code like this, which is simple and easy to understand and highly readable:

fut
  .await?
  .process()?
  .next()
  .await?;

The entire execution process is as follows: -

Although the ? operator is very convenient to use, you should note that it cannot be used directly between different error types. Instead, you need to implement the From trait to build a bridge for conversion between the two, which can bring additional trouble. We’ll set this issue aside for now and talk about the solution later.

Functional Error Handling #

Rust also provides a lot of auxiliary functions for Option and Result, such as map/map_err/and_then, allowing you to handle part of the data structure conveniently. As shown in the following illustration: -

With these functions, you can easily introduce the Railroad oriented programming paradigm into error handling. For example, in the process of user registration, you need to verify user inputs, process data, transform, and then store them in the database. You can write this process like this:

Ok(data)
  .and_then(validate)
  .and_then(process)
  .map(transform)
  .and_then(store)
  .map_error(...)

The execution process is shown in the following illustration: -

In addition, the conversion between Option and Result is also very easy, which is also thanks to Rust’s strong functional programming capabilities.

We can see that whether it is through the ? operator or functional programming to handle errors, Rust strives to make error handling flexible and efficient, making it simple and intuitive for developers to use.

panic! and catch_unwind #

Using Option and Result is the preferred way to handle errors in Rust, and we should use them most of the time, but Rust also provides special exception handling capabilities.

In Rust’s view, once you need to throw an exception, it must be a serious error. So, like Go, Rust uses words like panic! to warn developers: think clearly before you use me. developers can also unwrap() or expect() Option and Result types, forcing the conversion to T. If such a conversion cannot be made, it will also panic!.

Generally, panic! is for errors that are unrecoverable or do not want to be recovered, and we hope that at this moment, the program will stop running and get crash information. For example, the following code parses the noise protocol variable:

let params: NoiseParams = "Noise_XX_25519_AESGCM_SHA256".parse().unwrap();

If the developer accidentally writes the protocol variable incorrectly, it is best to immediately panic! to expose the error immediately in order to resolve the issue.

In some scenarios, we also hope to be able to backtrace like exception handling, restoring the environment to the context of capturing exceptions. Rust standard library provides catch_unwind(), which rolls back the call stack to the moment of catch_unwind, working just like the try {…} catch {…} in other languages. See the following code:

use std::panic;

fn main() {
    let result = panic::catch_unwind(|| {
        println!("hello!");
    });
    assert!(result.is_ok());
    let result = panic::catch_unwind(|| {
        panic!("oh no!");
    });
    assert!(result.is_err());
    println!("panic captured: {:#?}", result);
}

Of course, like exception handling, it does not mean that you can abuse this feature. I guess this is also why Rust calls throwing exceptions panic! and calls capturing exceptions catch_unwind, making beginners wary and afraid to use it lightly. This is also a good user experience.

catch_unwind is very useful in some scenarios, such as when you are using Rust to write NIF for the Erlang VM, you do not want any panic! in Rust code to cause the Erlang VM to crash. Because a crash is a very bad experience, it goes against Erlang’s design principle: processes can let it crash, but error code should not cause the VM to crash.

At this point, you can wrap the entire Rust code in the closure required by the catch_unwind() function. That way, any code that could lead to panic!, including third-party crates code, will be captured and converted into a Result.

Error trait and Error Type Conversion #

In the earlier section, we mentioned that E in Result is a data type representing an error. To standardize the behavior of this error-representing data type, Rust defines the Error trait:

pub trait Error: Debug + Display {
    fn source(&self) -> Option<&(dyn Error + 'static)> { ... }
    fn backtrace(&self) -> Option<&Backtrace> { ... }
    fn description(&self) -> &str { ... }
    fn cause(&self) -> Option<&dyn Error> { ... }
}

We can define our own data type and then implement the Error trait for it.

However, this work has been simplified for us: we can use thiserror and anyhow to simplify this step. thiserror provides a derive macro to simplify the definition of error types, for example:

use thiserror::Error;