30 Unsafe Rust Opening Rust the C Way

30 Unsafe Rust: How to Approach Rust the C++ Way? #

Hello, I am Chen Tian.

Up to this point, all the code we’ve written has remained a law-abiding citizen within the carefully constructed realm of Rust’s memory safety. By adhering to rules of ownership, borrow checking, lifetimes, and so on, our code can declare with confidence that it’s safe once it compiles!

However, safe Rust is not adaptable to all scenarios.

Firstly, for memory safety, Rust’s rules are often meant to be universal, and the compiler will stringently halt any dubious behaviors. But this meticulous, ruthless approach may sometimes be overly strict, leading to false positives.

It’s like saying “the owner of the house only uses keys to open doors, if someone tries to pick the lock, they must be a criminal,” which under normal circumstances, makes sense. All lock-picking thieves would be apprehended (resulting in compilation errors); however, there are times when the owner loses their keys and must resort to calling a locksmith (using unsafe code). In these instances, it’s perfectly legitimate and lenient.

Moreover, no matter how pure and perfect Rust constructs its internal world, it inevitably has to interact with the impure and imperfect external world, be it hardware or software.

Computer hardware is inherently unsafe, such as performing I/O operations on peripherals, or using assembly instructions for special operations (operating GPUs or using the SSE instruction set). For such operations, the compiler cannot guarantee memory safety, which is why we need unsafe to tell the compiler to make an exception to the law.

Similarly, when Rust accesses libraries from other languages like C/C++, since they do not meet Rust’s safety requirements, this cross-language FFI (Foreign Function Interface) is also unsafe.

These two ways of using unsafe Rust are unavoidable, hence they are justifiable reasons why we need to resort to unsafe Rust.

There’s also a major use of unsafe Rust purely for performance purposes, such as omitting boundary checks or using uninitialized memory. We should minimize the use of this kind of unsafe, unless through benchmarking we find that unsafe can address certain performance bottlenecks. Otherwise, the risks outweigh the benefits. Becuase, when using unsafe code, we’re reducing the memory safety of Rust down to the same level as C++.

Scenarios Where Unsafe can be Used #

Now that we understand why we need unsafe Rust, let’s look at specific places in our day-to-day work where unsafe Rust might be used.

Let’s first take a look at situations where it’s acceptable, and even recommended, to use unsafe. By order of importance/usability, these include: implementing unsafe traits, mainly being Send and Sync, calling existing unsafe interfaces, dereferencing raw pointers, and using FFI.

Implementing Unsafe Traits #

In Rust, the most well-known unsafe code probably pertains to the Send and Sync traits:

pub unsafe auto trait Send {}
pub unsafe auto trait Sync {}

I believe you are already familiar with these two traits. They are often used in code related to concurrency, especially when declaring interface types to add constraints with Send/Sync. We also know that most data structures implement Send/Sync, although there are exceptions like Rc, RefCell, and raw pointers.

Because Send/Sync are auto traits, in most cases, you don’t need to implement Send/Sync for your own data structures. However, when you use raw pointers within your data structure, since raw pointers do not implement Send/Sync, your data structure also does not implement Send/Sync. Yet, it is very likely that your structure is thread-safe, and you need it to be thread-safe.

At that point, if you can ensure it’s safe to move it across threads, you can implement Send; if it’s safe to share across threads, you can implement Sync. Previously we discussed Bytes, which, while using raw pointers, implements Send/Sync:

pub struct Bytes {
    ptr: *const u8,
    len: usize,
    // inlined "trait object"
    data: AtomicPtr<()>,
    vtable: &'static Vtable,
}

// Vtable must enforce this behavior
unsafe impl Send for Bytes {}
unsafe impl Sync for Bytes {}

But be careful when implementing Send/Sync. If you can’t ensure the thread-safety of the data structure, wrongly implementing Send/Sync can lead to mysterious crashes that are difficult to replicate.

For example, the following code forcefully implements Send for Evil, which contains an Rc that does not allow for Send. This code circumvents Rust’s concurrency safety checks by implementing Send, allowing it to compile (code):

use std::{cell::RefCell, rc::Rc, thread};

#[derive(Debug, Default, Clone)]
struct Evil {
    data: Rc<RefCell<usize>>,
}

// Forcefully implement `Send` for `Evil`, this will muddle `Rc`
unsafe impl Send for Evil {}

fn main() {
    let v = Evil::default();
    let v1 = v.clone();
    let v2 = v.clone();

    let t1 = thread::spawn(move || {
        let v3 = v.clone();
        let mut data = v3.data.borrow_mut();
        *data += 1;
        println!("v3: {:?}", data);
    });

    let t2 = thread::spawn(move || {
        let v4 = v1.clone();
        let mut data = v4.data.borrow_mut();
        *data += 1;
        println!("v4: {:?}", data);
    });

    t2.join().unwrap();
    t1.join().unwrap();

    let mut data = v2.data.borrow_mut();
    *data += 1;

    println!("v2: {:?}", data);
}

However, there’s a chance of crashing when it runs:

❯ cargo run --example rc_send
v4: 1
v3: 2
v2: 3

❯ cargo run --example rc_send
v4: 1
thread '<unnamed>' panicked at 'already borrowed: BorrowMutError', examples/rc_send.rs:18:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any { .. }', examples/rc_send.rs:31:15

Therefore, it’s not advisable to carelessly implement Send/Sync.

Since we mentioned unsafe trait just now, you might be curious: What traits would be considered unsafe? Besides Send/Sync, are there other unsafe traits? Of course, there are.

Any trait, as long as it’s declared as unsafe, is an unsafe trait. And a regular trait can also contain unsafe functions, as we can see in the following example (code):

// Implementers must guarantee the implementation is memory-safe
unsafe trait Foo {
    fn foo(&self);
}

trait Bar {
    // Callers are responsible for safety
    unsafe fn bar(&self);
}

struct Nonsense;

unsafe impl Foo for Nonsense {
    fn foo(&self) {
        println!("foo!");
    }
}

impl Bar for Nonsense {
    unsafe fn bar(&self) {
        println!("bar!");
    }
}

fn main() {
    let nonsense = Nonsense;
    // Callers don't need to worry about safety
    nonsense.foo();

    // Callers need to be responsible for safety
    unsafe { nonsense.bar() };
}

As you can see, an unsafe trait is a constraint for the implementer of the trait, telling them to be cautious and to ensure memory safety, so it requires the unsafe keyword when implemented.

But for the caller, unsafe trait can be called normally, without any unsafe block, because this safety is already assured by the implementer; after all, if the implementer didn’t ensure it, nothing the caller does can either, just as we use Send/Sync.

However, an unsafe fn is a constraint for the caller of the function, telling them if they use it carelessly, there can be memory safety issues. Please use it properly, so calling an unsafe fn requires an unsafe block to remind others to be careful.

Let’s look at another trait where both implementation and calling are unsafe: GlobalAlloc.

The following code that we’ve seen in the lecture on smart pointers allows us to implement our own memory allocator. As memory allocators can significantly impact memory safety, implementers need to ensure each implementation is memory-safe. Simultaneously, methods like alloc/dealloc, if not called correctly, can also cause memory safety issues, which is why they’re also unsafe:

use std::alloc::{GlobalAlloc, Layout, System};

struct MyAllocator;

unsafe impl GlobalAlloc for MyAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let data = System.alloc(layout);
        eprintln!("ALLOC: {:p}, size {}", data, layout.size());
        data
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout);
        eprintln!("FREE: {:p}, size {}", ptr, layout.size());
    }
}

#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;

Okay, that’s enough about unsafe trait; if you want to learn more details, you can look at Rust RFC 2585. If you want to see the process of defining and implementing a complete unsafe trait, you can see BufMut.

Calling Existing Unsafe Functions #

Next, let’s talk about unsafe functions. Sometimes, you’ll notice that a function offered by the standard library or third-party libraries is marked as unsafe. For example, the transmute function we previously used to print the structure of HashMap:

use std::collections::HashMap;

fn main() {
    let map = HashMap::new();
    let mut map = explain("empty", map);

    map.insert(String::from("a"), 1);
    explain("added 1", map);
}

// Structure of HashMap has two u64 RandomState, followed by four usizes, 
// representing bucket_mask, ctrl, growth_left, and items.
// We transmute to print, then transmute back.
fn explain<K, V>(name: &str, map: HashMap<K, V>) -> HashMap<K, V> {
    let arr: [usize; 6] = unsafe { std::mem::transmute(map) };
    println!(
        "{}: bucket_mask 0x{:x}, ctrl 0x{:x}, growth_left: {}, items: {}",
        name, arr[2], arr[3], arr[4], arr[5]
    );

    // Use of std:mem::transmute is unsafe, hence requiring an unsafe block
    unsafe { std::mem::transmute(arr) }
}

As mentioned earlier, to call an unsafe function, you need to wrap it in an unsafe block. This effectively alerts everyone to be mindful that there’s unsafe code here!

Another way to call unsafe functions is to define an unsafe fn and then call other unsafe functions within that unsafe fn.

If you read some of the code from the standard library, you’ll notice that sometimes Rust provides both unsafe and safe versions of the same functionality. For example, converting &[u8] to a string (source):

// Safe version, validates data. If invalid, returns an error.
pub fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error> {
    run_utf8_validation(v)?;
    // SAFETY: Just ran validation.
    Ok(unsafe { from_utf8_unchecked(v) })
}

// Unsafe version, does not validate data. Caller must ensure &[u8] contains valid characters.
pub const unsafe fn from_utf8_unchecked(v: &[u8]) -> &str {
    // SAFETY: the caller must guarantee that the bytes `v` are valid UTF-8.
    // Also relies on `&str` and `&[u8]` having the same layout.
    unsafe { mem::transmute(v) }
}

The safe str::from_utf8() function, after conducting some checks, actually calls str::from_utf8_unchecked(). If we do not need to perform these checks, this call can be much more efficient (possibly by an order of magnitude) because the unsafe version is essentially just a type conversion.

So with two versions of an interface like this, which one should we call?

Unless you’re absolutely sure, always call the safe version. Don’t call the unsafe version for the sake of performance gain. If you’re certain that the &[u8] has been checked before, or it came from a &str and now you’re just converting it back, then you can call the unsafe version and indicate in a comment why it’s considered safe here.

Dereferencing Raw Pointers #

That’s all for now on unsafe traits and unsafe functions; let’s look at raw pointers. Many times, if we need to do some special handling, we’ll convert obtained data structures into raw pointers, like the Bytes we saw earlier.

Raw pointers can be created without unsafe because they don’t perform any unsafe memory operations, but dereferencing raw pointers is unsafe as they potentially risk, and they need unsafe to clearly tell the compiler as well as the code reader; in other words, it requires wrapping in an unsafe block.

Here’s a snippet that dereferences a raw pointer (code):

fn main() {
    let mut age = 18;

    // Immutable pointer
    let r1 = &age as *const i32;
    // Mutable pointer
    let r2 = &mut age as *mut i32;

    // Using raw pointers, we can bypass the immutable/mutable borrow rule.

    // However, dereferencing a pointer needs to be unsafe.
    unsafe {
        println!("r1: {}, r2: {}", *r1, *r2);
    }
}

fn immutable_mutable_cant_coexist() {
    let mut age = 18;
    let r1 = &age;
    // Compilation error
    let r2 = &mut age;

    println!("r1: {}, r2: {}", *r1, *r2);
}

We can see that with raw pointers, mutable and immutable pointers can coexist, unlike mutable and immutable references. This is because any memory operation with raw pointers, whether ptr::read/ptr::write, or dereferencing, is unsafe. So anyone reading or writing memory with raw pointers needs to be responsible for memory safety.

You might wonder why this example requires unsafe since there is no apparent unsafe memory operation. Yes, while in this example the raw pointers are from a trusted memory address and all the code is safe, the following code is unsafe and will lead to a segment fault (code):

fn main() {
    // Raw pointer points to a problematic address.
    let r1 = 0xdeadbeef as *mut u32;

    println!("so far so good!");

    unsafe {
        // Program crashes.
        *r1 += 1;
        println!("r1: {}", *r1);
    }
}

This is why when writing unsafe Rust code, we need to exercise extreme caution and add sufficient comments in the unsafe code to explain why we believe it is safe.

When using raw pointers, most operations are unsafe (indicated by the exclamation mark in the image): Unsafe operations with raw pointers If you’re interested, you can consult the std::ptr documentation.

Using FFI #

The last scenario where unsafe can be used is FFI.

When Rust needs to leverage capabilities from other languages, the Rust compiler cannot guarantee the memory safety of those languages, which is why interfaces that interact with third-party languages must use unsafe. For instance, calling libc to perform the C-language familiar malloc/free (code):

use std::mem::transmute;

fn main() {
    let data = unsafe {
        let p = libc::malloc(8);
        let arr: &mut [u8; 8] = transmute(p);
        arr
    };

    data.copy_from_slice(&[1, 2, 3, 4, 5, 6, 7, 8]);

    println!("data: {:?}", data);

    unsafe { libc::free(transmute(data)) };
}

From the code, we can see that all calls to libc functions require an unsafe block. Next lesson we’ll spend time talking about how Rust does FFI, which we’ll delve into in more detail then.

So far, we’ve discussed scenarios where unsafe is plausible. There are also cases where unsafe can be used, but I do not recommend them. For instance, handling uninitialized data, accessing mutable static variables, or improving performance using unsafe.

Although not recommended, these uses still appear in standard and third-party libraries, so even if we don’t write them ourselves, we should be able to understand them when we come across them.

Accessing or Modifying Mutable Static Variables #

First are mutable static variables. In previous lessons, we’ve encountered global static variables, as well as declaring complex static variables with [lazy_static](https://docs.rs/l