Extra Meal Summit of Ignorance Your Rust Learning Frequently Asked Questions Compilation

Extra Meal: The Summit of Ignorance: Your Rust Learning FAQ Summarized #

Hello, I’m Chen Tian.

So far, we have learned a lot of Rust knowledge, such as basic syntax, memory management, ownership, lifetimes, etc. We also showcased three very representative example projects to give you a sense of what Rust code looks like in nearly real application scenarios.

Despite having learned so much, do you still feel like “easy to learn, but fail when coding”? Don’t worry, as the saying goes, eat your food one bite at a time. Learning any new knowledge doesn’t happen overnight—we’ll let our “bullets” fly for a while longer. You can also pat yourself on the back for completing so many check-ins; just keep it up.

In today’s extra meal, we’ll take a short break to adjust our learning pace and discuss some common questions in Rust development, hoping to solve some of your confusions.

Ownership Issues #

Q: How should I handle creating a doubly linked list?

Rust’s standard library has LinkedList, which is an implementation of a doubly linked list. But when you need to use a linked list, consider first whether the same need can be implemented using a Vec list or a circular buffer VecDeque. This is because linked lists are very cache unfriendly and perform much worse.

If you’re just curious about how to implement a doubly linked list, then you can use Rc/RefCell (see [Lecture 9]). For the list’s next pointer, you can use Rc; for the prev pointer, use Weak.

Weak is like a weakened version of Rc that doesn’t participate in reference counting. However, Weak can be upgraded to Rc for use. If you have used reference counting data structures in other languages, you should be familiar with Weak; it can help us break a cyclic reference. Interested friends can try to implement it themselves, and then compare it with this reference implementation.

You may wonder why the standard library’s LinkedList doesn’t use Rc/Weak. That’s because the standard library directly uses NonNull pointers and unsafe code.

Q: The compiler always tells me there’s a “use of moved value” error. How can I fix it?

This is an error that we often encounter when first learning Rust, which means you’re trying to access a variable that has had its ownership moved away.

For such errors, first determine whether the variable really needs to be moved to another scope. If it doesn’t need to be, can borrowing be used instead? (see [Lecture 8]). If it really needs to be moved to another scope:

  1. If you need multiple owners to share the same data, use Rc/Arc with Cell/RefCell/Mutex/RwLock. (see [Lecture 9])
  2. If you don’t need multiple owners to share, consider implementing Clone or even Copy. (see [Lecture 7])

Lifetime Issues #

Q: Why does the compiler always make it difficult for me when my function returns a reference?

When a function returns a reference, unless it’s a static reference, this reference definitely has something to do with an input parameter that has a reference. The input parameter could be &self, &mut self, or &T/&mut T. We need to establish the correct relationship between the input and the return value, and this relationship has nothing to do with the internal implementation of the function but only with the function’s signature.

Take for example the HashMap’s get() method:

pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq

We do not need to implement it or know how it is implemented to know what Option<&V> is related to. There are only two options either &self or k: &Q. Obviously, it’s &self since HashMap holds the data and k is just a key for querying inside the HashMap.

Why is there no need to use lifetime parameters here? Because of the rule we mentioned before: when &self/&mut self appears, the lifetime of the return value is associated with it. (see [Lecture 10]). This is a great rule because for most methods, if they return a reference, it’s likely referring to some data inside &self.

If you understand this relationship, it will be easier to handle lifetime errors that occur when the function returns a reference.

When you need to return data that was created or acquired during the function’s execution and is unrelated to parameters, whether it’s a data with ownership or a reference, you can only return data with ownership. For references, this means calling clone() or to_owned() to obtain ownership from the reference.

Data Structure Issues #

Q: Why are Rust strings so confusing, with String, &String, &str all having different representations?

I must admit, this is a misleading question because it generalizes in a messy way, easily leading people astray.

First of all, any data structure T can have a reference pointing to it, &T, so the difference between String and &String, as well as String and &str, are completely two different questions.

A better question is: why, if we have String, do we still need &str? Or, more generally: why do containers like String, Vec that hold continuous data, still need the concept of slices?

Once the question hits the nail on the head, the answer is self-evident because slices are a very common data structure.

Those who have used Python would know:

s = "hello world"
slice1 = s[:5] # Can slice strings
slice2 = slice1[1:3] # Can re-slice slices
print(slice1, slice2) # Prints hello, el

This is strikingly similar to Rust’s String slicing:

let s = "hello world".to_string();
let slice1 = &s[..5]; // Can slice strings
let slice2 = &slice1[1..3]; // Can re-slice slices
println!("{} {}", slice1, slice2); // Prints hello el

So &str is a slice of String, and it can also be a slice of &str. It’s no different than &[T], just a fat pointer with length pointing to a continuous memory area.

You can think of it like this: slices are to Vec/String data just like a view is to a table in a database. We will go into more detail about Rust’s data structures later in the course.

Q: In the course’s example code, unwrap() is used a lot. Is this okay?

When we need to extract data from Option or Result, we can use unwrap(), which is why unwrap() appears in the example code.

If we are only writing some learning-oriented code, then unwrap() is acceptable. However, in a production environment, unless you can ensure that unwrap() will not cause panic!(), you should use pattern matching to handle data or use the error-handling ? operator. We will have a dedicated lecture later on to discuss Rust’s error handling.

When can we be sure that unwrap() will not panic? If the Option or Result already contains an appropriate value (Some(T) or Ok(T)) before doing unwrap(), then you can perform unwrap(). Like this code:

// Assume v is a Vec<T>
if v.is_empty() {
    return None;
}

// Now we are sure there's at least one data, so unwrap is safe
let first = v.pop().unwrap();

Q: Why do standard library data structures like Rc/Vec use so much unsafe, but people always tell me unsafe is not good?

Good question. C language developers also think asm is not good, but many C libraries also make extensive use of asm.

The standard library’s responsibility is to implement the required functionalities in the most efficient way possible, even at the expense of readability, while ensuring safety. At the same time, it provides users of the standard library with an elegant, high-level abstraction that makes it possible to write beautiful code without dealing with the ugly aspects in most cases.

In Rust, the correctness and security of unsafe code are the responsibility of the developer. The standard library developers have spent a lot of time and testing to ensure this correctness and safety. When we write our own unsafe code, unless reviewed by experienced developers, we might overlook concurrency considerations, leading to problematic code.

Therefore, unless necessary, it is advised not to write unsafe code. After all, most of the issues we deal with can be solved through good design, appropriate data structures, and algorithms.

Q: How do I declare global variables in Rust?

In [Lecture 3], we discussed const and static, which can both be used to declare global variables. However, note that, unless using unsafe, static cannot be used as mut because it means it could be modified under multiple threads, which is unsafe:

static mut COUNTER: u64 = 0; 

fn main() {
    COUNTER += 1; // Compilation will fail, the compiler tells you it needs to use unsafe
}

If you really want to use writable global variables, you can use Mutex, but it’s complex to initialize. In this case, you can use a library lazy_static. For example (see code):

use lazy_static::lazy_static;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};

lazy_static! {
    static ref HASHMAP: Arc<Mutex<HashMap<u32, &'static str>>> = {
        let mut m = HashMap::new();
        m.insert(0, "foo");
        m.insert(1, "bar");
        m.insert(2, "baz");
        Arc::new(Mutex::new(m))
    };
}

fn main() {
    let mut map = HASHMAP.lock().unwrap();
    map.insert(3, "waz");

    println!("map: {:?}", map);
}

Debugging Tools #

Q: Generally, how do you debug applications in Rust?

Personally, I use tracing to log information, and for simpler example code, I use println!/dbg! to check the state of the data structures at a given moment. However, in my regular development, I rarely use a debugger to set breakpoints and step through.

Instead of wasting time on debugging, I prefer to spend more time on design. Implement with sufficient clear logs and write appropriate unit tests to ensure the logical correctness of the code. If you find yourself always needing debugging tools to understand the program’s state, it indicates that the code is not well designed and too complex.

When I was learning Rust, I would use debugging tools to view memory information. In later lessons, we will see these tools used when analyzing certain data structures.

In Rust, we can use rust-gdb or rust-lldb, which provides some Rust-friendly pretty-print functionality. They are also installed when you install Rust. I personally prefer gdb, but rust-gdb is suitable for Linux. On OS X, it has some issues, so I tend to switch to an Ubuntu virtual machine to use rust-gdb.

Other Questions #

Q: Why are binaries compiled by Rust so large? Why does Rust code run so slowly?

If you compile with cargo build, it’s normal because it’s a debug build with a ton of debugging information. You can use cargo build –release to compile an optimized version, which will be much smaller. Additionally, there are many ways to further optimize the size of the binary. If you are interested in this, you can refer to this document.

Many Rust libraries, if not compiled with –release, do not perform any optimization and sometimes even seem slower than your Node.js code. So when you want to apply your code to production, you must use a release build.

Q: What Rust version is used for this course? Will it be updated with the 2021 edition?

Yes. Rust is a constantly evolving language, with a new version born every six weeks bringing many new features. For example, const generics (see code):

#[derive(Debug)]
struct Packet<const N: usize> {
    data: [u8; N],
}

fn main() {
    let ip = Packet { data: [0u8; 20] };
    let udp = Packet { data: [0u8; 8] };

    println!("ip: {:?}, udp: {:?}", ip, udp);
}

And the newly released 1.55 supports open range pattern (see code):

fn main() {
    println!("{}", match_range(10001));
}

fn match_range(v: usize) -> &'static str {
    match v {
        0..=99 => "good",
        100..=9999 => "unbelievable",
        10000.. => "beyond expectation",
        _ => unreachable!(),
    }
}

In a little over a month, Rust will release the 2021 edition. Due to Rust’s good backward compatibility, I suggest keeping up with the latest version of Rust. Once the 2021 edition is released, I will update the codebase to the 2021 edition, and the corresponding code in the manuscript will also be updated.

Thinking Question #

Here’s a simple thinking question that integrates what we’ve learned so far. The code shows a lifecycle issue—can you find the reason? (see code)

use std::str::Chars;

// Why is this wrong?
fn lifetime1() -> &str {
    let name = "Tyr".to_string();
    &name[1..]
}

// Why is this wrong?
fn lifetime2(name: String) -> &str {
    &name[1..]
}

// Why is this right?
fn lifetime3(name: &str) -> Chars {
    name.chars()
}

Feel free to answer in the comments section. I also welcome you to share your learning experiences and discuss and progress together. See you in the next lesson, where we’ll get back to the main topic and talk about Rust’s type system. See you in the next lesson!