Part 2 - Fighting the Borrow Checker

Today we're gonna cover ownership and the borrow checker. This is where it will start to get confusing, but eventually it'll click and you'll see why all your C code doesn't work.

Ownership (4.1)

Rust does not have a garbage collector like Java/Python/C#/Go. Instead, it allows you to manually manage memory, but within a set of constraints enforced by the compiler:

  • Every value in Rust has a variable that is its owner
  • There can only be one owner at a time
  • When the owning variable goes out of scope, the value will be dropped

If you're not familiar with the stack and the heap, read this for a quick overview.

In terms of scope in Rust, this can broadly be interpreted in the way that you probably think. A variable is valid from the point at which it is declared, until the end of the current scope

fn main() { //s not valid here, not declared yet
    let s = "string"; // s is valid from here down

}// this scope ends here, s is no longer valid

I'm going to use the String type as an example, which hopefully you have already seen. String is more complex than types we've seen or written so far, because it represents a mutable string that may vary in length. This is as opposed to string literals, like s in the example above.

The value of string literals are hardcoded into the program, meaning if you were to compile the below program, you'd find the actual values of the characters in the executable (you can use the strings command to verify this).

fn main(){
    println!("Hello from your binary file!");
}

String, instead, stores the string data on the heap, which allows us to work with text that is unknown at compile time. String can be mutated:

fn main() {
    let mut s = String::from("hello");

    s.push_str(", world!"); // push_str() appends a literal to a String

    println!("{s}"); // This will print `hello, world!`
} // s goes out of scope here

Since data on the heap is being used, we need to:

  • Allocate that memory when we create the string
  • Modify the buffer as we grow the string
  • Free the memory when we're done with it

Rust is a big fan of zero-cost abstraction, meaning that lots of these details that would ordinarily have to be done manually in C/C++, are abstracted away behind String's implementation details. The zero-cost bit means it's just as fast as doing it manually, so we get all this simplicity for free with no performance cost.

  • Memory is allocated by the call to String::new() or String::from()
  • The buffer is modified behind the scenes when we modify the string's size using String::push_str()
  • The memory is automatically freed when the owning variable goes out of scope

This last bit is really key. Using the example above, we can see that s goes out of scope at the closing brace. s is the variable that owns the string, so the as the owner goes out of scope, the memory is freed. This is done using a special function called drop, which Rust calls for us and is implemented automatically for most types.

(This concept will be familiar to you if you know about the concept of Resource Acquisition Is Initialization (RAII) in C++)

Moves

What do you think this code does?

#![allow(unused)]
fn main() {
let x = 5;
let y = x;
}

How about this?

#![allow(unused)]
fn main() {
let s1 = String::from("hello");
let s2 = s1;
}

The first example binds the value 5 to the variable x, and then the second line makes a copy of the variable x and binds it to y.

The second example does not do this.

String is a Struct made up of 3 parts:

  • A number representing its length
  • A number representing the capacity of its internal buffer
  • A pointer to the buffer where the data is.

(For those of you not familiar with pointers, a pointer is just a number representing an address in memory. It is a variable that says something like "hello, the data you're looking for isn't here, but it can be found at this memory address!". The idea is that you then go and look at that memory address on the heap to go get your data.)

Our string looks a little something like this (the left values are on the stack, the heap to the right):

s1 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o

So to copy our data, we could copy both the heap and stack data, giving us a view in memory like this:

s2 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o s1 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o

This is also not what happens (imagine if your string was very very long, this would be a very expensive operation).

So what we could do is copy only the data on the stack (the three numbers), giving us a view of the two variables that looks like this:

s1 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o s2 name value ptr len 5 capacity 5

This is still dangerous. Remember that when variables go out of scope, they are dropped and their memory is freed. If this were the case, the pointer would be freed twice, and this is a big error that can cause the heap to be corrupted.

What actually happens, is that when you create a copy of the String, its values are moved, and the original invalidated.

s1 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o s2 name value ptr len 5 capacity 5

A shallow copy is made, copying only the stack data, and not the heap data. To avoid the issue of multiple pointers pointing to the same data the old variable is invalidated. Try to run this:

#![allow(unused)]
fn main() {
  let s1 = String::from("hello");
  let s2 = s1;

  println!("{}, world!", s1);
}

You can't, because s1 was invalidated when you moved it into s2, and s2 became the new owner.

Copy and Clone

Of course, this wasn't the case for our first example with the integers:

#![allow(unused)]
fn main() {
let x = 5;
let y = x;

println!("x = {}, y = {}", x, y);
}

Both variables are still valid. This is because, for certain types which:

  • are Small/cheap to copy,
  • have a known size at compile time,
  • and live on the stack

Rust disregards the whole move semantics thing and just makes copies all over the place, because it is basically free to do so. This is done via the Copy trait, a trait that tells the compiler that the type is free to copy. If a type is Copy, then you don't have to worry about move semantics. The following types are Copy by default:

  • All integral types, such as u32
  • bool (true/false)
  • Floats f64 and f32
  • chars
  • Tuples, but only if they contain types that are also Copy.
    • (u32,f64) is Copy but (char, String) is not

There are other ways to make copies of data too. The Clone trait can be implemented on any type, which gives it a clone() method. clone() may be called explicitly to make a deep copy of your data. The following code clones the data in s1 into s2, meaning both variables own separate, but identical, copies of the data.

#![allow(unused)]
fn main() {
let s1 = String::from("hello");
let s2 = s1.clone();

println!("s1 = {}, s2 = {}", s1, s2);
}
  • Clone is used to allow types to be explicitly copied when needed, and indicates that this is an expensive operation which may take time
  • Copy is used to tell the compiler that types may be copied for free, and that move semantics may be disregarded for this type

To learn how to add these traits to your types, see Derivable Traits.

Don't worry too much about traits for now, we'll get into the details later. They're basically typeclasses/interfaces/abstract base classes. Traits define some behaviour in an abstract way (such as the clone function), and then types can implement the behaviour.

Ownership and Functions

The semantics for passing values to functions follow the same rules, it will either move or copy, just like assigning to a variable.

fn main() {
    let s = String::from("hello");  // s comes into scope

    takes_ownership(s);             // s's value moves into the function...
                                    // ... and so is no longer valid here

    let x = 5;                      // x comes into scope

    makes_copy(x);                  // x would move into the function,
                                    // but i32 is Copy, so it's okay to still
                                    // use x afterward

} // Here, x goes out of scope, then s. But because s's value was moved, nothing
  // special happens.

fn takes_ownership(some_string: String) { // some_string comes into scope
    println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing
  // memory is freed.

fn makes_copy(some_integer: i32) { // some_integer comes into scope
    println!("{}", some_integer);
} // Here, some_integer goes out of scope. Nothing special happens.

We can't use s after take_ownership() is called, as it is moved into the function, then the function scope ends and it is dropped.

Functions return values, which gives ownership too. Assigning the result of a function call to a variable makes that variable the new owner:

fn main() {
    let s1 = gives_ownership();         // gives_ownership moves its return
                                        // value into s1

    let s2 = String::from("hello");     // s2 comes into scope

    let s3 = takes_and_gives_back(s2);  // s2 is moved into
                                        // takes_and_gives_back, which also
                                        // moves its return value into s3
} // Here, s3 goes out of scope and is dropped. s2 was moved, so nothing
  // happens. s1 goes out of scope and is dropped.

fn gives_ownership() -> String {             // gives_ownership will move its
                                             // return value into the function
                                             // that calls it

    let some_string = String::from("yours"); // some_string comes into scope

    some_string                              // some_string is returned and
                                             // moves out to the calling
                                             // function
}

// This function takes a String and returns one
fn takes_and_gives_back(a_string: String) -> String { // a_string comes into
                                                      // scope

    a_string  // a_string is returned and moves out to the calling function
}

References and Borrowing (4.2)

What if we want to use a value in a function, without having to pass ownership around?


fn main() {
    let s1 = String::from("hello");

    let len = calculate_length(s1);

    // error, s1 has been moved into calculate_length,
    // then dropped when it went out of scope
    println!("The length of '{}' is {}.", s1, len);

}

fn calculate_length(s: String) -> usize {
    s.len()
}

This is where borrowing comes in. References allow us to temporarily borrow values from their owner.


fn main() {
    let mut s1 = String::from("hello");

    let len = calculate_length(&s1);

    // calculate_length only *borrowed*
    //ownership remains with s1
    println!("The length of '{}' is {}.", s1, len);

}

fn calculate_length(s: &String) -> usize {
    s.len()
}

The ampersand & in front of String denotes that it is a reference type: the parameter s in the function does not own the String, it owns a reference to it. References are similar to pointers, except they come with a few more restrictions and static guarantees.

  • & in front of a type name denotes that the type is a reference to a value of that type, and does not have ownership
  • & in front of a value creates a reference to that value (see the call site in the listing above, &s1)
s name value ptr s1 name value ptr len 5 capacity 5 index value 0 h 1 e 2 l 3 l 4 o

When you create a reference, you are borrowing the value.

So, using that, try this below:

#[allow(unused_mut)]
fn main() {
    let mut s = String::from("hello");

    add_world(&s);
}

fn add_world(some_string: &String) {
    some_string.push_str(", world");
}

Look at that compile error. You've tried to modify a value that you do not own, which is not allowed. When you create a reference, you are not allowed to modify it: references are immutable.

Mutable References

I lied, you can modify values that you don't own, but you need a special kind of reference: &mut, the mutable reference:

fn main() {
    let mut s = String::from("hello");

    add_world(&mut s);

}

fn add_world(some_string: &mut String) {
    some_string.push_str(", world");
}

This works, because we told the function it could accept an &mut String, and then created one using &mut s. How about this?

fn main() {
    let mut s = String::from("hello");

    let ref_1 = &mut s;
    let ref_2 = &mut s;
    add_world(ref_1);
    add_exclamation(ref_2);
}

fn add_world(some_string: &mut String) {
    some_string.push_str(", world");
}

fn add_exclamation(some_string: &mut String){
    some_string.push_str("!");
}

I think the compiler is pretty clear here: we can only have one mutable reference in scope at a time. This is really annoying because this makes shared mutable state really hard, but for good reason. Shared mutable state is generally regarded as really bad, as it introduces loads of bugs: data races are non-existent, and there's no pointer aliasing at all. Rust will just straight up refuse to compile any of these bugs, which is very kind of it. Compare this to C, which doesn't give a shit how dumb you are, and will be even dumber in response.

You also cannot combine mutable and immutable references:

#![allow(unused)]
fn main() {
  let mut s = String::from("hello");

  let r1 = &s; // no problem
  let r2 = &s; // no problem
  let r3 = &mut s; // BIG PROBLEM

  println!("{}, {}, and {}", r1, r2, r3);
}

You can have either:

  • any number of immutable references or
  • one mutable reference.

Remember that this all depends on scope though, and references are dropped when they leave scope or when they are last used:

#![allow(unused)]
fn main() {
let mut s = String::from("hello");

let r1 = &s; // no problem
let r2 = &s; // no problem
println!("{} and {}", r1, r2);
// variables r1 and r2 will not be used after this point

let r3 = &mut s; // no problem
println!("{}", r3);
}

Rust will also not allow you to create a reference to data that outlives the reference:

fn main() {
    let reference_to_nothing = dangle();
}

fn dangle() -> &String {
    let s = String::from("hello");
    &s
}

This code won't compile, because s goes out of scope at the end of dangle, so the string that it owns is dropped. This means that if you were to access the reference_to_nothing that dangle returns, it would be precisely that: a reference to something that no longer exists. &s would be a dangling reference, and Rust doesn't allow this.

The concept of lifetimes deals with when and where references are valid, but that can get really tricky so we'll save that for another time.

Borrowing in Functions

Ownership and Functions above discusses moving and copying with function arguments, but most of the time functions will want to only borrow their arguments. Functions arguments can be captured by reference, mutable reference, or by value:

fn by_value(s: String) {
    println!("{s}");
    // string is dropped here
}

fn by_mut_ref(s: &mut String) {
    s.push_str("!!");
}

fn by_ref(s: &String) {
    // cannot modify string!
    // comment this line out to make by_ref work
    s.push_str(" world");

    // can read string to print it
    println!("{s}");
}

#[allow(unused_mut)]
fn main() {
    // convert our &str to a String
    let mut s = "hello".to_owned();
    by_ref(&s);
    by_mut_ref(&mut s);
    by_value(s);
    // by_value above took ownership of s, so cannot use after
    // comment out this line
    println!("{s}");

}

Play with the example above to see where you can and can't modify the string.

When writing a function, consider if you need to capture by value (move) or by reference. Always use references unless you absolutely need to modify your value, and then you should use immutable references. If you really need to take ownership of the value, then pass by value.

Of course, when working with Clone types, you can always pass by value as the compiler will make copies for you and you don't need to worry about borrows.

Slices (4.3)

Slices are a special kind of reference, that give you a view into a contiguous sequence of elements. Slices allow you to refer to parts of strings, arrays, or Vecs. Think of them as references with a length. Say we want to take a string, and return the first word in it:

  • We could return some indices, but this would be annoying to work with
  • We could create a copy, but this would be expensive (what if the word is really long?)
  • Or, we could use a slice, to return a reference to a part of the string:
fn first_word(s: &String) -> &str {
    let bytes = s.as_bytes();

    //some fancy for loop/iterator stuff
    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {
    let mut s = String::from("hello world");

    let word: &str = first_word(&s);

    // s.clear(); // Error if uncommented

    println!("the first word is: {}", word);
}

The compiler will ensure that our slice of the string remains valid, and doesn't get mutated while we're trying to use it. Uncommenting s.clear() fails as it creates a mutable reference to s (it borrows s mutably), which then invalidates our immutable slice in word, meaning we can no longer use it past that point.

&str

&str is the type of string slices, but it is also the type of string literals. Remember how string literals were stored inside the program binary somewhere? Well slices allow us to immutably refer to those.

#![allow(unused)]
fn main() {
let s: &str = "Hello, World!";
let s1: &str = &s[1..3];
}

We don't own the string, but we can slice it immutably.

More detail on String and &str is available in The Book. Rust uses UTF-8 encoding, so 1 byte != 1 character, and there's a fair amount of complexity associated with this.

Another neat thing about the &str type worth mentioning is that anywhere that accepts it will also automatically accept an &String (recall the important difference between the two). Basically, String implements the Deref trait, which allows the compiler to perform deref coercion. Don't worry about it too much for now, the main takeaway I want you to have is that: if you ever want a function to accept a string reference, you should probably use &str over &String.

Array slices

You can also slice arrays:

#![allow(unused)]
fn main() {
let a: [i32; 5] = [1, 2, 3, 4, 5];

let slice_1: &[i32] = &a[1..3];
let slice_2: &[i32] = &a[4..5];
println!("Slice 1: {slice_1:?}");
println!("Slice 2: {slice_2:?}");
}

Vecs (8.1)

Vec are Rust's equivalent of:

  • Python's List
  • Java's ArrayList
  • C++'s std::vector

It's a contiguous, heap allocated, dynamically-allocated, homogenous, linear, collection of elements.

fn main() {
    let mut v1: Vec<i32> = Vec::new(); // a new, empty Vec
    let v2: Vec<&str> = Vec::from(["hello", "world"]); // create from an array

    // we can push
    v1.push(5);
    v1.push(6);
    v1.push(7);
    v1.push(8);

    // and pop
    // as the vector may be empty, this returns an Option
    let maybe_tail: Option<i32> = v1.pop();

    match maybe_tail {
        Some(tail) => println!("Tail is {tail}"),
        None => println!("Empty Vec!!"),
    }
}

We can also index and slice into vectors.

#![allow(unused)]
fn main() {
let mut v = vec![1, 2, 3, 4, 5];
let third: &i32 = &v[2];

println!("The third element is {third}");

let mut_slice: &mut [i32] = &mut v[3..5];
mut_slice[1] +=1;

println!("The new last element is {}", mut_slice[1]);
}

Note that indexing is not a safe operation, and may panic if the index is out of bounds. We can fix this using the get method, which returns an Option.

fn main() {
    let v = vec![1, 2, 3, 4, 5];

    // will panic when uncommented!
    //let does_not_exist = &v[100];

    match v.get(100) {
        Some(i) => println!("100th element is {i}"),
        None => println!("Looks like your index is out of bounds there buddy"),
    }

    match v.get(2) {
        Some(i) => println!("2nd element is {i}"),
        None => println!("Looks like your index is out of bounds there buddy"),
    }
}

If you wanted to iterate over a vector, you might think of something like this:

#![allow(unused)]
fn main() {
let v = vec![100, 32, 57];
for i in 0..v.len() {
    print!("{i} ");
}
}

However, we can use iterators to do better:

#![allow(unused)]
fn main() {
let v = vec![100, 32, 57];
for elem in &v {
    print!("{elem} ");
}
}

This will iterate over an immutable reference to v, hence the &v. If we wanted to iterate over while mutating, we need a mutable iterator, which is created as you'd expect:

#![allow(unused)]
fn main() {
let mut v = vec![100, 32, 57];
for elem in &mut v {
    *elem += 1;
    print!("{elem} ");
}
}

Note how we have to use the dereference operator * on elem. Dereferencing is how we access the value behind a reference. This is often done implicitly for us, but sometimes we have to do it ourselves, like here.

Methods, Associated Functions & impls (5.3 & 10.2)

Methods are just like functions, but associated with a type. You've been using methods this whole time, by calling functions using . syntax. We can define methods on our own types using impl blocks, which are used to implement a function for a type.

struct Rectangle {
    width: f32,
    height: f32,
}

impl Rectangle {
    fn area(&self) -> f32 {
        self.width * self.height
    }
}

fn main() {
    let r = Rectangle {
        width: 2.0,
        height: 3.5
    };
    println!("The area of the rectangle is {}", r.area())
}

The area method can then be called as you'd expect: r.area(), where r is any Rectangle. Note how area takes a special parameter called self. self is the first parameter to any method, and can capture by reference (&self), by mutable reference (&mut self), or by value (self).

We can also use impl blocks to create associated functions, which are similar to methods, but don't capture self. These are the equivalent of static methods from many languages.

struct Rectangle {
    width: f32,
    height: f32,
}

impl Rectangle {
    fn area(&self) -> f32 {
        self.width * self.height
    }
}

// Note these impl blocks can be separate but do not need to be
impl Rectangle {
    fn new(size: (f32, f32)) -> Self {
        Rectangle {width: size.0, height: size.1}
    }
}

fn main() {
    let r = Rectangle::new((2.0, 3.5));
    println!("The area of the rectangle is {}", r.area())
}

new is a common associated function which is used to provide a constructor shorthand. We call it using :: syntax: Rectangle::new().

Note Self is the return type (distinct from the parameter self): Self is simply an alias for the type of whatever the impl block applies to (Rectangle, in this case).

Methods are just fancy associated functions, r.size() is equivalent to calling Rectangle::area(&r).

Traits

The other use for impl blocks is to implement a trait for a type. Traits are used to define shared behaviour, and types can implement traits to share functionality. They're very similar to Java's Interfaces or Python/C++'s Abstract base classes, and almost identical to Haskell's typeclasses. The idea is you implement a trait on a type by filling in some function definitions, and then you get a bunch of other functionality for free, or can use that type in certain places.

trait Shape {
    /// Returns the area of the shape
    fn area(&self) -> f32;
}

impl Shape for Rectangle {
    fn area(&self) -> f32 {
        self.width * self.height
    }

}

impl Shape for Circle {
    fn area(&self) -> f32 {
        self.radius * self.radius * 3.14
    }
}

Traits are one of the more complex bits of Rust's type system, and we'll go into more detail next time.