Big Shot Support the Current Status, Opportunities, and Challenges of Rust Down the Slope of Enlightenment

Influential Figures Assist with Enlightenment Slope (Part 2): Rust’s Current State, Opportunities, and Challenges #

Hello, I am Zhang Handong.

In the previous article, we talked about the current state and opportunities of the Rust language, systematically explaining the relatively mature current state of Rust’s development in terms of the maturity of the language itself, the ecosystem of the language, the application scenarios, and the sustainable development capabilities of the language.

Although Rust, as an emerging language, is currently very popular, it still faces many challenges. Today, let’s talk about this topic.

The challenges mainly come from two aspects:

  1. Domain choice. No matter how well a language performs, it is useless if it is not applied. The current challenge Rust faces is its application in domains. Currently, the most notable concern is Rust’s entrance into Linux kernel development. If successful, it would be epoch-making.
  2. Evolution of language features. Rust still has many features that need support and evolution. Some of these pending features will be listed later.

Progress and Predictions for Rust for Linux #

Since June 2020, the topic of Rust entering Linux has become a heated discussion. Linus Torvalds, the creator of Linux, spoke about the issue of finding future maintainers for the open-source kernel at that year’s Open Source Summit and Embedded Linux Conference.

Let’s briefly discuss the background.

Linus mentioned: “The kernel is boring, at least most people think it’s boring. Many new technologies should be more interesting to many people. It turns out that it is difficult to find maintainers for the open-source kernel. Although there are many people writing code, it is difficult to find someone who can stand upstream and Review other people’s code. This trust is not only from other maintainers but also from everyone who writes code… This just takes time.”

As Rust is a naturally secure language and an alternative to C, it is very helpful in building trust among kernel developers. Two-thirds of Linux kernel security vulnerabilities (PDF) come from memory safety issues, and introducing Rust into Linux will make it more secure, which has basically reached a consensus.

Moreover, at this year’s (2021) Open Source Summit, Linus said: “I think C is a great language, for me, C is indeed a way to control hardware at a very low level. Therefore, when I see C code, I can guess the compiler’s work closely, it is so close to the hardware that you can use it to do anything.”

“But, the subtle type interactions in C are not always logical and are traps for almost everyone, they are easy to be overlooked, and in the kernel, this is not always a good thing.”

The Rust language is the first language I have seen that looks like it can really solve problems. People have been talking about the application of Rust in the kernel for a long time now, but it has not been completed. Maybe next year, we will start to see some fearless modules written in Rust for the first time, perhaps integrated into the mainline kernel.”

Linus believes that one of the important cornerstones of Linux’s evergreen status is fun (Fun), and fun is also something he has always pursued. When people discuss the possibility of writing some Linux kernel modules in Rust, fun appears.

Progress of the Conference #

At the Linux Plumbers Conference in September 2021, the progress of Rust’s entry into the Linux kernel was discussed again.

First was the issue of Rust’s participation role.

Miguel Ojedal, the main developer of Rust for Linux, asserted that if Rust enters the kernel, it should take on the role of a first-class citizen. Linus responded that the kernel community would almost certainly experiment with the language.

The issue of reviewing Rust code was also briefly discussed.

Entering the kernel will require some maintainers to learn the language to review Rust code. Linus stated that Rust is not difficult to understand and that anyone in the kernel community capable of reviewing patches should master the Rust language to the extent that they can Review its code.

Additionally, there are some issues regarding the stability of Rust’s own features:

  1. Currently, kernel work still utilizes some Unstable Rust features, leading to poor compatibility, which does not ensure future updates of the Rust compiler will compile related code normally.

Ojedal mentioned, however, that if Rust enters the Linux kernel, it will change this situation. For some Unstable Rust features, the official Rust team will also consider making them stable. This is a driving force, and sooner or later, a kernel using only the stable version of Rust will be established, eliminating compatibility issues.

  1. Another kernel developer, Thomas Gleixner, is concerned that Rust does not officially support memory ordering, which could be problematic.

However, Paul McKenney, a Linux kernel maintainer with thirty years of experience in C++ concurrency programming, wrote a series of articles exploring how the Rust community should properly handle the Rust into Linux kernel issue in terms of the memory order model. I also wrote another article on this topic: What memory model should Rust language use?.

  1. Regarding Rust’s support for GCC, among them, rustc_codegen_gcc is making the most rapid progress; it has passed some rustc tests. rustc_codegen_llvm is the main development project at present. The Rust GCC is expected to be completed within 1~2 years.

There were two conclusions from this conference:

  1. Rust will definitely carry out an epoch-making experiment in the Linux kernel.
  2. The entry of Rust into the Linux kernel holds significant strategic importance in pushing the evolution of Rust.

Latest News #

On November 11, 2021, another recorded web conference was released on the Linux Foundation website: Rust for Linux: Writing Abstractions and Drivers, where Miguel Ojedal presented how Rust works within the kernel, including the overall infrastructure, compilation model, documentation, testing, and coding guidelines.

I have made a brief summary of the content of this video. You can refer to the key points below and find the information you need to look at it.

  1. Introduction to Unsafe Rust and Safe Rust.
  2. The use of Rust in the Linux kernel, adopting a concept: encapsulating Unsafe operations and providing a safe abstraction for kernel developers to use. This safe abstraction is located in the kernel module at https://github.com/Rust-for-Linux/linux/tree/rust/rust.
  3. Provide a simple example to illustrate how to write kernel drivers.
  4. Compare with the C language example, explaining what behaviors are Safety in Rust.
  5. Introduction to documentation, testing, and adhered coding guidelines.

Considering the information we have learned, it can be speculated that Rust for Linux will enter Linux for experimentation in the near future, and this experiment will be epoch-making. If the experiment is successful, it means that Rust has officially taken the baton of the era from C language.

Improvement of Rust Language Features #

Next, let’s talk about which features of the Rust language have been improved recently. To emphasize, these are advanced concepts and challenges for Rust, so you may not understand these knowledge points right now, but don’t be afraid, these are things that Rust must improve on the road of evolution, and the improvements are only to make Rust better. These do not currently affect your learning and use of Rust.

We will talk about four improved features, and finally also introduce some pending features for your reference.

Image

Safe I/O Issues #

Recently, Rust officials merged an RFC, introducing the concept of I/O safety and a set of new types and traits to provide assurances about their raw resource handles to users of AsRawFd and related traits, filling Rust’s encapsulation boundary gaps.

Previously, the Rust standard library added I/O safety guarantees, ensuring that the program holds private raw handles that other parts cannot access.

However, FromRawFd::from_raw_fd is Unsafe, so it is impossible in Safe Rust to do something like File::from_raw(7) and perform I/O operations on this file descriptor, while the file descriptor may be privately held by other parts of the program.

Moreover, many APIs perform I/O operations by accepting raw handles:

pub fn do_some_io<FD: AsRawFd>(input: &FD) -> io::Result<()> {
    some_syscall(input.as_raw_fd())
}

AsRawFd does not limit the return value of as_raw_fd, so do_some_io can ultimately perform I/O operations on any RawFd value. It can even write do_some_io(&7) because RawFd itself implements AsRawFd. This could lead to the program accessing incorrect resources and even creating private handle aliases in other parts to break the encapsulation boundaries, resulting in weird action at a distance effects.

Action at a distance is a programming anti-pattern that refers to a part of the program whose behavior is widely affected by other parts of the program, and it is difficult to find the instructions that affect other parts of the program, or even impossible to do so.

In some special cases, violating I/O safety can even lead to memory safety.

Therefore, Rust adds the OwnedFd and BorrowedFd<'fd> types to replace RawFd, giving handle values ownership semantics and representing the ownership and borrowing of handle values. OwnedFd owns an fd and will close it upon destruction. The lifetime parameter in BorrowedFd<'fd>' indicates how long access to this fd` is borrowed.

For Windows, there are similar types, but in the form of Handle and Socket.

Image

Compared to other types, I/O types do not distinguish between mutable and immutable. Operating system resources can be shared outside of Rust's control in various ways, so I/O can be considered to use internal mutability.

Then three concepts were introduced, AsFd, Into<OwnedFd>, and From<OwnedFd>.

These three concepts are AsRawFd::as_raw_fd, IntoRawFd::into_raw_fd, and FromRawFd::from_raw_fd, which are used in a respectful manner in most cases. They work with OwnedFd and BorrowedFd, so they automatically enforce their I/O safety invariance.

pub fn do_some_io<FD: AsFd>(input: &FD) -> io::Result<()> {
    some_syscall(input.as_fd())
}

Using this type will avoid the previous problem. Since AsFd is implemented only for those types that appropriately own or borrow their file descriptors, this version of do_some_io does not need to worry about fake or dangling file descriptors being passed.

Improved Error Handling with Try #

Currently, Rust allows the use of the ? operator to automatically return Result<T,E>’s Err(e), but Ok(o) still requires manual wrapping.

For instance:

fn foo() -> Result<PathBuf, io::Error> {
    let base = env::current_dir()?;
    Ok(base.join("foo"))
}

Then this leads to a term: Ok-Wrapping. Obviously, this wording is not elegant enough and still has plenty of room for improvement.

Therefore, a Rust official member withoutboats developed a library fehler, introducing a throw syntax. The usage is as follows:

#[throws(i32)]
fn foo(x: bool) -> i32 {
    if x {
        0
    } else {
        throw!(1);
    }
}

// The error handling of the foo function above is equivalent to the bar function below

fn bar(x: bool) -> Result<i32, i32> {
    if x {
        Ok(0)
    } else {
        Err(1)
    }
}

With the throw macro syntax, it helps developers omit the manual operation of Ok-wrapping and Err-wrapping. This library caused some discussions in the community at the time, and it is also promoting the improvement of Rust’s error handling experience.

Therefore, error handling developed along two paths: Ok-wrapping and Err-wrapping. How to design syntax elegantly became the focus of discussion.

After a long, long discussion, the try-trait-v2 RFC was merged, meaning that a certain plan has emerged. In this solution, a new type ControlFlow and a new trait FromResidual were introduced.

The source code of ControlFlow:

enum ControlFlow<B, C = ()> {
    /// Exit the operation without running subsequent phases.
    Break(B),
    /// Move on to the next phase of the operation as normal.
    Continue(C),
}

impl<B, C> ControlFlow<B, C> {
    fn is_break(&self) -> bool;
    fn is_continue(&self) -> bool;
    fn break_value(self) -> Option<B>;
    fn continue_value(self) -> Option<C>;
}

ControlFlow contains two values:

  • ControlFlow::Break, indicating an early exit. But it’s not necessarily the case of an Error; it could be Ok as well.
  • ControlFlow::Continue, indicating to continue.

The new trait FromResidual:

trait FromResidual<Residual = <Self as Try>::Residual> {
    fn from_residual(r: Residual) -> Self;
}

Residual means “remaining,” because the intention is to split the types like Result/Option/ControlFlow into two parts (two paths), which makes the word easier to understand.

And the Try trait inherits from the FromResidual trait:

pub trait Try: FromResidual {
    /// The type of the value consumed or produced when not short-circuiting.
    type Output;

    /// A type that "colours" the short-circuit value so it can stay associated
    /// with the type constructor from which it came.
    type Residual;

    /// Used in `try{}` blocks to wrap the result of the block.
    fn from_output(x: Self::Output) -> Self;

    /// Determine whether to short-circuit (by returning `ControlFlow::Break`)
    /// or continue executing (by returning `ControlFlow::Continue`).
    fn branch(self) -> ControlFlow<Self::Residual, Self::Output>;
}

pub trait FromResidual<Residual = <Self as Try>::Residual> {
    /// Recreate the type implementing `Try` from a related residual
    fn from_residual(x: Residual) -> Self;
}

Therefore, the Try trait has two associated types:

  • Output, which corresponds to Ok-wrapping if it’s Result.
  • Residual, which corresponds to Err-wrapping if it’s Result.

So, the behavior of the ? operator now becomes:

match Try::branch(x) {
    ControlFlow::Continue(v) => v,
    ControlFlow::Break(r) => return FromResidual::from_residual(r),
}

Then internally implement Try for Rusult:

impl<T, E> ops::Try for Result<T, E> {
    type Output = T;
    type Residual = Result<!, E>;

    #[inline]
    fn from_output(c: T) -> Self {
        Ok(c)
    }

    #[inline]
    fn branch(self) -> ControlFlow<Self::Residual, T> {
        match self {
            Ok(c) => ControlFlow::Continue(c),
            Err(e) => ControlFlow::Break(Err(e)),
        }
    }
}

impl<T, E, F: From<E>> ops::FromResidual<Result<!, E>> for Result<T, F> {
    fn from_residual(x: Result<!, E>) -> Self {
        match x {
            Err(e) => Err(From::from(e)),
        }
    }
}

After also implementing Try for Option/Poll, it is able to achieve the unification of error handling.

Generic Associated Types (GAT) #

Generic Associated Types are defined in RFC 1598. This feature is often compared to Haskell’s Higher Kinded Type (HKT).

Despite the similarity, Rust does not replicate Haskell’s HKT as is but proposes the concept of GAT based on Rust’s features. The progress of GAT support can be tracked in issues #44265 and may be stabilized within the year.

What are generic associated types? See the code below:

trait Iterable {
    type Item<'a>; // 'a is also a generic parameter
}

trait Foo {
    type Bar<T>;
}

Such simple syntax allows us to participate in type building in associated types - implementing this is quite complex.

No matter how complex, this feature is essential for Rust and is very useful. The most typical use case is implementing streaming iterators:

trait StreamingIterator {
    type Item<'a>;
    fn next<'a>(&'a mut self) -> Option<Self::Item<'a>>;
}

Rust does not currently support this syntax. Such syntax can solve problems with current slow iterator performance. For example, the std::io::lines method in the standard library can generate an iterator for io::BufRead types, but it currently can only return io::Result<Vec<u8>>, which means it will allocate memory for each line, producing a new Vec<u8> and leading to slow iterator performance. There’s a discussion and optimization solution to this problem on StackOverflow.

But if GAT is supported, the solution becomes very simple:

trait Iterator {
    type Item<'s>;
    fn next(&mut self) -> Option<Self::Item<'_>>;
}

impl<B: BufRead> Iterator for Lines<B> {
    type Item<'s> = io::Result<&'s str>;
    fn next(&mut self) -> Option<Self::Item<'_>> {  }
}

The implementation of GAT will also advance the support for “asynchronous traits.” Rust’s limitations to asynchronous are still many, such as the inability of traits to support async methods, also due to the lack of completeness of GAT’s features.

Special #