Part 3 - Don't panic

This section will look at error handling in Rust, and also take a look at a few other things that you might have come accross already in more detail.

Errors (9)

Rust's error handling mechanisms are kind of unique. It draws a lot on functional languages here too, with the Result<T, E> and Option<T> types being used to represent possible errors that can reasonably be expected to be handled. Panics are the other error mechanism in Rust, and are used for unrecoverable errors that halt execution.

Panics (9.1)

When something goes very very wrong in your program, and you can't really do anything about it, you panic. Rust has the panic!() macro, which prints an error message, unwinds and cleans up the stack, and then exits.

fn main() {
    panic!("uh oh");
}

See how our error message ends up in the console? By default, no backtrace is shown when code panics, but we can set the environment variable RUST_BACKTRACE=1 to. On your own machine, create a rust program that you know will panic, such as indexing out of bounds in an array, and run it with the environment variable set. You'll get a lot more information, especially when you're in debug mode. Printing the backtrace will help you pin down what is causing your code to panic.

Results (9.2)

A lot of errors can be handled at runtime, which is what Result<T, E> is for.

enum Result<T, E> {
    Ok(T),
    Err(E),
}

Result is generic over T, the type of the value returned upon success, and E, the type of the error value. Functions that return a Result expect you to handle possible error cases explicitly, whereas in other languages you may have to deal with exceptions. This makes error handling non-optional, meaning you're forced to write more robust code by design.

#![allow(unused)]
fn main() {
use std::{fs, io};
let f: Result<fs::File, io::Error> = fs::File::open("foo.txt");
println!("{f:?}"); //print the result
}

We can see in the above snippet that trying to open a non-existent file returns a result with either the file handle, or an I/O error. Since the code here is run on the Rust playground servers where foo.txt does not exist, the Err(io::Error) variant of the enum is printed, containing some information about the error.

If foo.txt did exist, we'd get an Ok() variant looking something like:

Ok(File { fd: 3, path: "/Users/Joey/code/rs118/foo.txt", read: true, write: false })

Remember that f would still be Ok(File), and we'd need to extract the contained file before we can use it. There are a few different ways to handle Results. Pattern matching with match or if let is a good option, as then we can destructure the type to get the values we want out of it, and deal with the error however we want.

#![allow(unused)]
fn main() {
use std::{fs, io};
let f: Result<fs::File, io::Error> = fs::File::open("foo.txt");
match f {
    Ok(file) => println!("File opened successfully!"),
    Err(_) => println!("Could not open file, continuing..."),
}
}

We could also use unwrap, which panics if the Result is an Err(E), and returns T if not. expect behaves the same, except we can specify an error message

use std::{fs, io};
fn main() {
    let f: Result<fs::File, io::Error> = fs::File::open("foo.txt");
    let file = f.expect("Could not open file");
}

If we don't want to handle the error ourselves, we can return propagate the Result error back up to the caller. If we wanted to do this manually, we would need something like:

fn maybe_func() -> Result<T, E>;

let x = match maybe_func() {
  Ok(x) => x,
  Err(e) => return Err(e),
};
// equivalent to
let x = maybe_func()?;

In functions that return a Result<T, E> or Option<T>, we can use the ? operator as a shorthand to bubble the error back up, where the return type is compatible.

use std::{fs, io};
fn open_my_file() -> Result<String, io::Error>  {
    let contents: String = fs::read_to_string("foo.txt")?;
    Ok(contents)
}

fn main() {
    let error = open_my_file().unwrap_err();
    println!("{error}");
}

Note that this only work if all the expressions you use ? on have the same return type as your function.

Alternatively, you can return Result<T, Box<dyn Error>>, which uses a neat trick with trait objects and boxes to allow you to return any error type that implements the Error trait. main can also return this type, allowing you to use ? within main.

use std::{error::Error, fs};

fn do_fallible_things() -> Result<(), Box<dyn Error>> {
    let _ten = u32::from_str_radix("A", 16)?;
    let _crab = String::from_utf8(vec![0xf0, 0x9f, 0xa6, 0x80])?;
    let _contents: String = fs::read_to_string("foo.txt")?;
    Ok(())
}

fn main() -> Result<(), Box<dyn Error>> {
    let _f = fs::File::create("new_file")?;
    do_fallible_things()
}

Best Practices (9.3)

Getting used to writing idiomatic error handling code in Rust can take a bit of practice. The general idea is to use the type system to your advantage where possible, and panic only where absolutely necessary.

unwrap and expect are handy when prototyping, before you decide on how you want to handle errors within your code. unwrap is also useful in cases where you have more information than the compiler, such as there being an invariant in your code that means the conditions for an error can never occur. panic should generally be used in cases where your code is in a bad state, and execution cannot continue under any circumstances.

Result should be used where failure is an expected possibility, and the caller can be reasonably expected to handle them. It is often useful to create custom types that can be returned within Result::Err(E), to describe the kinds of errors that may occur within your program. Encoding information within the type system makes your code more expressive and robust.

#[derive(Debug)]
enum MyError {
    FileOpenError,
    FileTooLarge,
    FileParseError,
    NumberTooSmallError,
}

fn do_stuff() -> Result<u64, MyError> {
    let contents = std::fs::read_to_string("data.txt").map_err(|_| MyError::FileOpenError)?;

    if contents.len() > 10 {
        return Err(MyError::FileTooLarge);
    }

    let num: u64 = contents.parse().map_err(|_| MyError::FileParseError)?;
    num.checked_sub(100).ok_or(MyError::NumberTooSmallError)
}

fn main() {
    println!("{:?}", do_stuff());
}

Note how I'm making use of some of the methods of Result which make working with errors a lot nicer.

  • map_err changes the Err type (technically transformed by a function/closure -- the |_| MyError::... ignore function input).
  • ok_or transforms an Option into a Result, you need to specify the error to give instead of None.

I want to shout out two crates here that make error handling much nicer. Anyhow provides a more ergonomic and flexible type for error handling. thiserror provides a #[derive] macro for the std::error::Error trait, allowing you to create custom error types much more easily. Have a look at some examples of using the two to see how to structure code that works with errors nicely.

Generics (10.1)

Generics allow for parametric polymorphism in Rust, letting us write type signatures that are generic over type variables. Generics can be used in struct/enum definitions:

struct Point<T> {
    x: T,
    y: T,
}

enum Option<T> {
    Some(T),
    None
}

And in function/method definitions:

fn id<T>(ty: T) -> T {
    ty
}

impl<T> Point<T> {
    fn new(x: T, y: T) -> Self {
        Self {x, y}
    }
}

At compilation, Rust creates a separate copy of the generic function for each type needed, a technique known as monomorphisation (unlike, for example, Java, which uses a single function that accepts any Object (type erasure)). Rust's approach is faster at runtime, at the cost of binary size and compilation time.

Generics are most powerful when used in combination with trait bounds, which leads me very nicely into...

Traits (10.2)

Traits define shared behaviour, and tell the compiler about what functionality a particular type has and shares with other types. They're very similar to interfaces/abstract classes in object-oriented languages, and very very similar to typeclasses in Haskell. Traits are defined as shown, with the function bodies omitted.

#![allow(unused)]
fn main() {
trait Printable {
    fn format(&self) -> String;
}
}

Giving a type a trait is much like implementing any other function, you useimpl <trait> for <type>.

#![allow(unused)]
fn main() {
trait Printable {
   fn format(&self) -> String;
}
struct Person { first_name: String, last_name: String }

impl Printable for Person {
    fn format(&self) -> String {
        format!("{} {}", self.first_name, self.last_name)
    }
}
}

The type can then be used as normal:

trait Printable {
   fn format(&self) -> String;
}
struct Person { first_name: String, last_name: String }

impl Printable for Person {
   fn format(&self) -> String {
       format!("{} {}", self.first_name, self.last_name)
   }
}


fn main() {
    let named = Person {
        first_name: String::from("Triz"),
        last_name: String::from("Luss"),
    };
    println!("{}", named.format());
}

Traits can be used to apply bounds to type variables, allowing only use of types that implements the trait. The syntax is f<T: Trait1 + Trait2>(a: T, b: T):

#![allow(unused)]
fn main() {
fn bold_fmt<T : Printable>(original: T) -> String {
    format!("**{}**", original.format())
}
}

If there are lots of trait bounds, you might also see them written using a where clause:

impl<Si, Item, U, Fut, F, E> Sink<U> for With<Si, Item, U, Fut, F>
where
    Si: Sink<Item>,
    F: FnMut(U) -> Fut,
    Fut: Future<Output = Result<Item, E>>,
    E: From<Si::Error>,
{
    //trait impl for Sink<U>
}

(This is an actual trait implementation from the futures crate that provides tools for asynchronous programming. This code is implementing the generic trait Sink for the struct With, that has 5 generic type parameters. Don't worry about the complexity here, I just wanted to show fully what the syntax looked like)

This pattern of a .format seems awfully useful, and luckily enough Rust provides a Display trait to print a struct. println!() calls fmt automatically on any Display types when using {} as the placeholder:

use std::fmt;

struct Person { first_name: String, last_name: String }

impl fmt::Display for Person {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{} {}", self.first_name, self.last_name)
    }
}

fn main() {
    let named = Person {
        first_name: String::from("Triz"),
        last_name: String::from("Luss"),
    };
    println!("{}", named);
}

Display is meant for user-facing representation of a type, but there also exists Debug that is intended to fully describe a type instance to the programmer (like repr in Python). This uses the {:?} placeholder.

Writing these, especially Debug, can get very tedious, so Rust allows you to derive some traits, including Debug, PartialEq, and Hash. This is just like Haskell's deriving.

#[derive(Debug, PartialEq)]
struct Person {
    first_name: String,
    last_name: String
}

fn main() {
    let named = Person {
        first_name: String::from("Triz"),
        last_name: String::from("Luss"),
    };
    
    // Try making these match
    let named2 = Person {
        first_name: String::from("Joris"),
        last_name: String::from("Bohnson"),
    };

    println!("{:?} {}", named, named == named2);
}

We can also use trait objects to allow us to return any type implementing a trait. These are often used in combination with Boxes, as trait objects are resolved at runtime, so we don't know their size statically and they have to be stored on the heap.

trait Printable {
   fn format(&self) -> String;
}
struct Person { first_name: String, last_name: String }

impl Printable for Person {
   fn format(&self) -> String {
       format!("{} {}", self.first_name, self.last_name)
   }
}

struct Animal { species: String, name: String }

impl Printable for Animal {
    fn format(&self) -> String {
        format!(" a {} named {}", self.species, self.name)
    }
}

fn resident_of_10_downing(is_cat: bool) -> Box<dyn Printable> {
    if is_cat {
        Box::new(Animal{  species: String::from("Cat"), name: String::from("Larry") })
    } else {
        Box::new(Person{ first_name: String::from("Triz"), last_name: String::from("Luss") })
    }
}

fn main() {
    println!("{}", resident_of_10_downing(false).format());
}

Traits and generics form the backbone of the type system, and there's an awful lot you can do with them. A few examples of interesting things to research:

  • Conditionally implementing methods by bounding impl blocks
  • Associated types
  • Trait objects & object safety
  • std::marker::PhantomData
  • Const generics
  • Deref coercion
  • Function traits (Fn vs fn)
  • The Sized trait
  • Interior mutability
  • GATs

References & Lifetimes (10.3)

One of Rust's selling points is that it ensures, at compile time, that all references and borrows are always valid. The compiler does this using lifetimes, which is the scope for which a reference is valid. Most of the time the compiler can do this for us, but sometimes we have to step in and help it out.

This is very unique to Rust and will feel weird to start with, but it'll really cement the ideas about borrow checking and ownership.

Recall how the borrow checker works: it makes sure that all of our references are valid, and prevents us from using references after the values have been dropped. Every reference in Rust has a lifetime. Below, r has a lifetime of 'a and x has a lifetime of 'b.

#![allow(unused)]
fn main() {
{
    let r;                // ---------+-- 'a
                          //          |
    {                     //          |
        let x = 5;        // -+-- 'b  |
        r = &x;           //  |       |
    }                     // -+       |
                          //          |
    println!("r: {}", r); //          |
}                         // ---------+
}

Have a look at that compile error. We can see that r borrows x, but x is dropped while it's still borrowed, which is not allowed.

Let's look at an example: a function that takes two strings and returns the longest one:

#![allow(unused)]
fn main() {
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

Rust isn't quite sure how to infer lifetimes here, so we need to help it out by adding generic lifetimes parameters. Just take a minute to appreciate that compiler error too, which tells us to do exactly what I'm about to explain now.

Lifetime annotations in function signatures tell the compiler which input references the output reference comes from, describing how input and output lifetimes relate to each other. We introduce lifetime names with the same syntax as generics, and then can use the annotations on references to tell the compiler what the lifetime of the reference is.

#![allow(unused)]
fn main() {
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

The output reference has the same lifetime as the two input references: 'a. The annotations go in the function signature, and become part of the contract of the function, much like types are. Anyone calling the function must uphold the contract, meaning the two input references must have the same lifetime. We are guaranteeing the returned reference is alive for at least 'a. The generic lifetime becomes the shortest of x and y's lifetimes (which is the maximum time both are alive). Take a look at the following error:

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() { x } else { y }
}

fn main() {
    let string1 = String::from("long string is long");
    let result;
    {
        let string2 = String::from("xyz");
        result = longest(string1.as_str(), string2.as_str());
    }   // string2 dropped, last point both string1 and string2 are live
    println!("The longest string is {}", result);
}   // string1 dropped

result could either be string1 or string2. The compiler cannot guarantee that result lasts longer than either string it could reference: if result referenced string2, the target memory will be dropped before the println!, making result invalid.

When retuning a reference from a function, the lifetime of the return type needs to match the lifetime of one of the parameters:

#![allow(unused)]
fn main() {
fn longest<'a>(x: &str, y: &str) -> &'a str {
    let result = String::from("really long string");
    result.as_str()
}
}

Have a look at the error. Even though we annotated the lifetime, we're trying to return a reference to new memory, which will be dropped when the function ends. In these cases, it is better to use an owning type (String) instead of a reference.

Lifetimes in Structs

What if you wanted to create a struct that contained a reference? The tuple struct below contains a reference to the first word in a string.

#![allow(unused)]
fn main() {
struct Word<'a>(&'a str);

impl<'a> Word<'a> {
    pub fn new(input: &'a str) -> Self {
        let first_word = input.split_ascii_whitespace().next().unwrap();
        Self(first_word)
    }
}
}

Just like with functions, we are tying the struct's lifetime to the lifetime of a referenced string (input). In this case, Word would contain invalid data once input is dropped, so cannot outlive it. The example below won't compile, take a look at the error message.

struct Word<'a>(&'a str);

impl<'a> Word<'a> {
    pub fn new(input: &'a str) -> Self {
        let first_word = input.split_ascii_whitespace().next().unwrap();
        Self(first_word)
    }
}

fn main() {
    let fst: Word;
    {
        let sentence = "Oh boy, I sure do love lifetimes!".to_owned();
        fst = Word::new(&sentence);
    }   // sentence dropped
    println!("First word is: {}", fst.0);
}

fst references sentence, so fst cannot outlive sentence. Lifetimes are hard, so don't worry if you don't get it all straight away. The best way to get the hang of them, as with anything, is to use them, so try writing some code that needs lifetime annotations, or adding annotations to existing code.

Closures (13.1)

Closures are anonymous functions, similar to lambdas in other languages. What makes closures special in Rust is that they close over their environment, allowing you to use variables from outer scopes in which they are defined. The arguments to a closure go within pipes: |args| and the expression following the pipes is the function body. Like variable types, you don't need to specify types as they're inferred, but you can if you like.

#![allow(unused)]
fn main() {
let increment = |x: u32| -> u32 { x+1 };
let also_increment = |x| x+1;
let four: i32 = also_increment(3);
}

The example above just shows a basic anonymous function. Consider the one below.

fn main() {
    let pi: f64 = 3.14;
    let area = |r| pi * r * r;

    let radii = [1.0, 2.5, 4.1];
    let areas = radii.map(area);
    println!("{areas:?}");
}

The closure is referencing a variable from local scope (pi) within it's function body, and we can still do all the things with it that we usually can with anonymous functions. We then pass the closure to map, which maps an array of circle radii to their areas by applying the closure to each element. Closures can be passed around to/from functions, bound to variables, stored in structs, etc, like any other value.

Closures capture variables in the least restrictive way they are allowed, in order: by reference &T, by mutable reference &mut T, by value T. Variables are captured between closure definition and the closure's last use and unsurprisingly follows the borrowing rules from before. Meaning, if a closure mutably references a variable, that variable cannot be borrowed again until after the closure's last use. If a variable is required by value and isn't Copyable, the closure cannot be run more than once, as the variable is consumed. These types have corresponding traits.

fn main() {
    let mut idx = 0;
    let zeros = [0; 5];
    println!("{:?}", zeros.map(|_| { idx += 1; idx }));
}

The Rust By Example section on closure capturing gives good examples of these rules.

Iterators (13.2)

Iterators are used to process sequences of items. When usually you'd have to use loops to process sequences of data, iterators allow common patterns such as maps, filters and folds to be expressed much more concisely. Many languges have analagous concepts: Haskell's list, Python's Generator, or Java's Iterator and Stream. Iterators can be created from any type that implements IntoIterator, such as slices, vectors and arrays.

#![allow(unused)]
fn main() {
let v1 = vec![1, 2, 3];

let v1_iter = v1.iter();

for val in v1_iter {
    println!("Got: {}", val);
}
}

for loops in Rust work using iterators. All iterators implement the Iterator trait, which requires the next() method: fn next(&mut self) -> Option<Item>. It will return the next item in the iterator each time it is called, until it returns None after the end of the iterator.

Iterators have a number of methods, generally grouped into consumers and adaptors. Consumers process the iterator and return some aggregated value, whereas adaptors convert the iterator into another iterator. Iterators in Rust are lazy, so no computation is done until they are consumed.

fn main() {
    let v1 = vec![4, 7, 12, -9, 18, 1];

    //consumers
    let total: i64 = v1.iter().sum();
    let min = v1.iter().min();
    let max = v1.iter().max();
    let product = v1.iter().fold(0, |acc, x| acc * x);

    //producers
    let evens = v1.iter().filter(|x| *x % 2 == 0);
    let squares = v1.iter().map(|x| x * x);

    //use the iterator in a for loop
    //zip zips two iterator together
    for (i, i_squared) in v1.iter().zip(squares){
        println!("{i} squared is {i_squared}");
    }
}

Take a look at the Iterator trait and std::iter to see what iterators can do.

Iterators are at least as performant as the equivalent loops, and often allow to express computation much more cleanly and efficiently. Choosing for loops vs chains of adaptors for more complex operations is often personal preference, but it is best to know both wayse. Two methods are shown below to calculate the dot product of two slices. Which do you think is better?

fn dot_1(x: &[i64], y: &[i64]) -> i64 {
    assert_eq!(x.len(), y.len());
    //simple indexing for loop
    let mut sum = 0;
    for i in 0..x.len() {
        sum += x[i] * y[i];
    }
    sum
}

fn dot_2(x: &[i64], y: &[i64]) -> i64 {
    assert_eq!(x.len(), y.len());
    x.iter()
        // zip combines two lists into pairs:
        // [1,2,3] zipped with [4,5,6] is [(1, 4), (2, 5), (3, 6)]
        .zip(y.iter())
        // fold accumulates the result (sum) over each element in the iterator.
        .fold(0, | sum, (xi, yi) | sum + xi * yi )
}

fn main() {
    let x = vec![4, 7, 12, -9, 18, 1];
    let y = vec![1, 0, -8, 63, 72, -11];

    let d1 = dot_1(&x, &y);
    let d2 = dot_2(&x, &y);

    println!("{d1} == {d2}");
}

Further Reading

Well, thanks for making is this far. Hopefully you consider yourself a confident Rustacean by now (at least in theory, doing the projects should solidify this). If you want to learn more, the list below contains a collection of Rust things that I think are excellent.