16 Data Structure Vec Ttbox T Do You Really Understand the Collection Container

16 Data Structures: Vec_T_, &[T], Box_[T]_—Do You Really Understand Collection Containers? #

16 | Data Structures: Vec, &[T], Box<[T]>—Do You Really Understand Collection Containers?

Hello, I’m Chen Tian. Let’s learn about collection containers today.

Now we have encountered more and more data structures. Let me organize the main data structures in Rust from the dimensions of primitive types, container types, and system-related types. You can count which ones you have mastered.

As you can see, containers occupy a large territory in the data structure landscape.

When mentioning containers, what likely comes to mind first are arrays and lists that can be traversed, but actually any specific data encapsulated within a certain data structure makes it a container. For example, Option is a container that wraps an existing or non-existing T, while Cow is a container that encapsulates internal data B, which can be either borrowed or owned.

Among the subtypes of containers, we have introduced many that are created for specific purposes, including Box, Rc, Arc, RefCell, Option, and Result, which have not yet been discussed.

Today we will delve into another category, collection containers.

Collection Containers #

Collection containers, as the name suggests, are about collecting a series of data of the same type for unified processing, such as:

Familiar structures like strings String, arrays [T; n], lists Vec, and hash tables HashMap;
Slices, which are often used but not yet familiar to many;
Circular buffers VecDeque, doubly linked lists LinkedList, etc., which are used in other languages but not yet in Rust.

These collection containers share many characteristics, such as the ability to be traversed, perform map-reduce operations, convert from one type to another, and more.

We will select two typical categories of collection containers—slices and hash tables—to understand deeply. Understanding these two types of containers makes it much easier to learn about other collection containers. Today we will introduce slices and related containers first, and we will learn about hash tables in the next lecture.

What Exactly Are Slices? #

In Rust, a slice refers to a data structure that represents a group of data of the same type, with unspecified length, stored continuously in memory, denoted by [T]. Since the length is not determined, a slice is a DST (Dynamically Sized Type).

Slices usually appear in the definition of data structures and cannot be accessed directly. In practice, they mainly take the following forms:

&[T]: Indicates a read-only slice reference.
&mut [T]: Indicates a writable slice reference.
Box<[T]>: A slice allocated on the heap.

How should we understand slices? I’ll give you an analogy, slices are to specific data structures what views in a database are to tables. It’s like a tool that allows us to uniformly access types that have similar structures and behaviors with some differences.

Let’s look at the code below to help understand:

fn main() {
    let arr = [1, 2, 3, 4, 5];
    let vec = vec![1, 2, 3, 4, 5];
    let s1 = &arr[..2];
    let s2 = &vec[..2];
    println!("s1: {:?}, s2: {:?}", s1, s2);

    // Equality of &[T] and &[T] depends on the equality of length and content
    assert_eq!(s1, s2);
    // &[T] can be compared with Vec<T>/[T; n], taking length and content into account
    assert_eq!(&arr[..], vec);
    assert_eq!(&vec[..], arr);
}

For array and vector, although they are different data structures—one stored on the stack and the other on the heap—their slices are similar. Moreover, slices of the same part of data content, such as &arr[1..3] and &vec[1..3], are equivalent. Furthermore, slices can be directly compared with their corresponding data structures, which is due to their implementation of the PartialEq trait (source code reference).

The figure below shows the relationship between slices and data more clearly:

![Slices and Data Relationship](../image/c1e1c1909b761cfa3348115bb417d4d4.png)

In Rust, slices are usually used with references &[T], so many students get confused about the difference between &[T] and &Vec<T>. I have drawn a diagram to help you better understand their relationship:

![&Vec<T> versus &[T] Relationship](./image/c1e1c1909b761cfa3348115bb417d4d4.png)

When in use, for data types that support slicing, you can dereference it into a slice type as needed. For example, Vec<T> and [T; n] can be dereferenced to &[T], because Vec<T> implements Deref, and arrays are inherently dereferenced to &[T]. We can write a piece of code to verify this behavior (here is the code):

use std::fmt;
fn main() {
    let v = vec![1, 2, 3, 4];

    // Vec implements Deref, &Vec<T> is automatically dereferenced to &[T]
    print_slice(&v);
    // Directly &[T]
    print_slice(&v[..]);

    // &Vec<T> supports AsRef<[T]>
    print_slice1(&v);
    // &[T] supports AsRef<[T]>
    print_slice1(&v[..]);
    // Vec<T> also supports AsRef<[T]>
    print_slice1(v);

    let arr = [1, 2, 3, 4];
    // Arrays do not implement Deref, but their dereferencing is &[T]
    print_slice(&arr);
    print_slice(&arr[..]);
    print_slice1(&arr);
    print_slice1(&arr[..]);
    print_slice1(arr);
}

// Note the use of generic functions below
fn print_slice<T: fmt::Debug>(s: &[T]) {
    println!("{:?}", s);
}

fn print_slice1<T, U>(s: T)
where
    T: AsRef<[U]>,
    U: fmt::Debug,
{
    println!("{:?}", s.as_ref());
}

This also means that by dereferencing, these data structures related to slices will inherit all the capabilities of a slice, including binary_search, chunks, concat, contains, start_with, end_with, group_by, iter, join, sort, split, swap, and a series of rich functions. If you are interested, you can check out the slice documentation.

Slices and Iterators #

The iterator can be said to be the twin brother of the slice. While the slice is a view of the collection data, the iterator defines various access operations for collection data.

By using the iter() method on slices, we can generate an iterator to iterate over the slice.

We already saw the iterator trait in [Lecture 12] Rust Type Inference (using the collect method to form a new list from filtered data). The iterator trait has a lot of methods, but most of the time, we only need to define its associated type Item and next() method.

Item defines the data type that we take out of the iterator each time;
next() is the method that retrieves the next value from the iterator. When the next() method of an iterator returns None, it indicates that there is no more data in the iterator.

#[must_use = "iterators are lazy and do nothing unless consumed"]
pub trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
    
    // A large number of default methods, including size_hint, count,
    // chain, zip, map, filter, for_each, skip, take_while, flat_map,
    // flatten, collect, partition, etc.
    ...
}

Let’s look at an example of using the iter() method on Vec and performing various map/filter/take operations. This style is common in functional programming languages and offers strong readability. Rust also supports this style of writing (code example):

fn main() {
    // Here, Vec<T> is dereferenced to &[T] when calling iter(), so `iter()` is accessible
    let result = vec![1, 2, 3, 4]
        .iter()
        .map(|v| v * v)
        .filter(|v| *v < 16)
        .take(1)
        .collect::<Vec<_>>();

    println!("{:?}", result);
}

It’s important to note that iterators in Rust are a lazy interface, meaning this block of code does not actually execute until it reaches collect; earlier parts are just generating new structures to accumulate the logic. You might wonder, how is this achieved?

In VS Code, if you use the rust-analyzer plugin, you can discover the secret:

![Iterator Method Expansion](../image/c1e1c1909b761cfa3348115bb417d4d4.png)

It turns out that most methods of Iterator return a data structure that implements Iterator, allowing for this continuous chaining. In Rust’s standard library, these data structures are known as Iterator Adapters. For example, the map method above returns a Map structure, which implements Iterator (source code).

The process is as follows (links are to source code):

When collect() executes, it actually tries to use FromIterator to construct a collection type from an iterator, which keeps calling next() to get the next data;
At this point, the Iterator is Take, which calls its own next(), which is to call Filter’s next();
Filter’s next() calls find() on its own internal iter, which is a Map, and find() uses try_fold()](https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#2312-2325), which will continue calling next(), also from Map’s next();
Map’s next() will call next() on its inner iter and then execute the map function, and at this time the internal iter comes from Vec.

Hence, only when collect() is reached that the code triggers the calls going down through the chain, and these calls may end at any time as needed. In this piece of code, we used take(1), which makes the entire call chain loop once and thereby fulfill the requirements of take(1) and all intermediate steps, so it only loops once.

You might be concerned: This functional programming style is neat, however, wouldn’t all these function calls be bad for performance? After all, one of the notorious issues with functional programming languages is poor performance.

This is no cause for concern in Rust; a vast use of optimizations like inlining ensures that this clear and friendly language expression performs nearly the same as a C-language for loop. If you are interested in performance comparisons, you can take a look at the reference section at the end.

Having covered what a slice is, we traditionally follow up with practical code usage. However, iterators are a critical feature, well-supported in almost every language, so if you have used them before, you should find them familiar. The majority of methods should be self-explanatory upon inspection. Therefore, I will not provide additional examples here; instead, you can consult Iterator documentation when faced with specific needs.

If the standard library’s capabilities do not meet your needs, you can check out itertools, a utility named after the itertools in Python and offering similar functionality. It provides a wealth of additional adapter features. Here’s a simple example (code):

use itertools::Itertools;

fn main() {
    let err_str = "bad happened";
    let input = vec![Ok(21), Err(err_str), Ok(7)];
    let it = input
        .into_iter()
        .filter_map_ok(|i| if i > 10 { Some(i * 2) } else { None });
    // The result should be: vec![Ok(42), Err(err_str)]
    println!("{:?}", it.collect::<Vec<_>>());
}

In practical development, we might need to aggregate a set of results from a group of Futures, comprising both successful results and failure error messages. To further filter/map the successful results, itertools’ filter_map_ok() comes in handy when standard Rust utilities fall short.

Special Slices: &str #

Having learned about regular slices &[T], let’s look at a special kind of slice: &str. Previously, we discussed that String is a special Vec, so slicing on String gives us a special structure &str.

Many people also often get confused about the difference between String, &String, and &str. We touched on this issue briefly in a previous extra lesson, and compared String and &str in the last lecture on smart pointers. If you understand the difference between &Vec<T> and &[T] discussed earlier, then the difference between &String and &str is the same:

![&String versus &str Relationship](../image/c1e1c1909b761cfa3348115bb417d4d4.png)

String is dereferenced to &str. The following code snippet verifies this (code sample):

use std::fmt;
fn main() {
    let s = String::from("hello");
    // &String is dereferenced to &str
    print_slice(&s);
    // &s[..] and s.as_str() are the same, both resulting in &str
    print_slice(&s[..]);

    // String supports AsRef<str>
    print_slice1(&s);
    print_slice1(&s[..]);
    print_slice1(s.clone());

    // String also implements AsRef<[u8]>, so the following code stands
    // Output is [104, 101, 108, 108, 111]
    print_slice2(&s);
    print_slice2(&s[..]);
    print_slice2(s);
}

fn print_slice(s: &str) {
    println!("{:?}", s);
}

fn print_slice1<T: AsRef<str>>(s: T) {
    println!("{:?}", s.as_ref());
}

fn print_slice2<T, U>(s: T)
where
    T: AsRef<[U]>,
    U: fmt::Debug,
{
    println!("{:?}", s.as_ref());
}

One might ask: What’s the relationship and difference between a list of characters and a string? Let’s directly write some code to see:

use std::iter::FromIterator;

fn main() {
    let arr = ['h', 'e', 'l', 'l', 'o'];
    let vec = vec!['h', 'e', 'l', 'l', 'o'];
    let s = String::from("hello");
    let s1 = &arr[1..3];
    let s2 = &vec[1..3];
    // &str itself is a special slice
    let s3 = &s[1..3];
    println!("s1: {:?}, s2: {:?}, s3: {:?}", s1, s2, s3);

    // Equality of &[char] and &[char] depends on the equality of length and content
    assert_eq!(s1, s2);
    // &[char] and &str cannot be directly compared; we convert s3 to Vec<char>
    assert_eq!(s2, s3.chars().collect::<Vec<_>>());
    // &[char] can be converted to String via an iterator, and String can be directly compared with &str
    assert_eq!(String::from_iter(s2), s3);
}

Character lists can be transformed into String through iterators, and String can be converted into character lists through the chars() function; if not transformed, they cannot be compared.

In the figure below, I compare arrays, lists, and strings with their slices side by side, which should help you understand their differences:

![Comparison of Arrays, Lists, Strings, and