@ -6,13 +6,15 @@ Futures abstract over *computation*. They describe the "what", independent of th
## Send and Sync
Luckily, concurrent Rust already has two well-known and effective concepts abstracting over sharing between concurrent parts of a program: Send and Sync. Notably, both the Send and Sync traits abstract over *strategies* of concurrent work, compose neatly, and don't prescribe an implementation.
Luckily, concurrent Rust already has two well-known and effective concepts abstracting over sharing between concurrent parts of a program: `Send` and `Sync`. Notably, both the `Send` and `Sync` traits abstract over *strategies* of concurrent work, compose neatly, and don't prescribe an implementation.
As a quick summary, `Send` abstracts over passing data in a computation over to another concurrent computation (let's call it the receiver), losing access to it on the sender side. In many programming languages, this strategy is commonly implemented, but missing support from the language side expects you to enforce the "losing access" behaviour yourself. This is a regular source of bugs: senders keeping handles to sent things around and maybe even working with them after sending. Rust mitigates this problem by making this behaviour known. Types can be `Send` or not (by implementing the appropriate marker trait), allowing or disallowing sending them around, and the ownership and borrowing rules prevent subsequent access.
As a quick summary:
Note how we avoided any word like *"thread"*, but instead opted for "computation". The full power of `Send` (and subsequently also `Sync`) is that they relieve you of the burden of knowing *what* shares. At the point of implementation, you only need to know which method of sharing is appropriate for the type at hand. This keeps reasoning local and is not influenced by whatever implementation the user of that type later uses.
- `Send` abstracts over *passing data* in a computation to another concurrent computation (let's call it the receiver), losing access to it on the sender side. In many programming languages, this strategy is commonly implemented, but missing support from the language side expects you to enforce the "losing access" behaviour yourself. This is a regular source of bugs: senders keeping handles to sent things around and maybe even working with them after sending. Rust mitigates this problem by making this behaviour known. Types can be `Send` or not (by implementing the appropriate marker trait), allowing or disallowing sending them around, and the ownership and borrowing rules prevent subsequent access.
`Sync` is about sharing data between two concurrent parts of a program. This is another common pattern: as writing to a memory location or reading while another party is writing is inherently unsafe, this access needs to be moderated through synchronisation.[^1] There are many common ways for two parties to agree on not using the same part in memory at the same time, for example mutexes and spinlocks. Again, Rust gives you the option of (safely!) not caring. Rust gives you the ability to express that something *needs* synchronisation while not being specific about the *how*.
- `Sync` is about *sharing data* between two concurrent parts of a program. This is another common pattern: as writing to a memory location or reading while another party is writing is inherently unsafe, this access needs to be moderated through synchronisation.[^1] There are many common ways for two parties to agree on not using the same part in memory at the same time, for example mutexes and spinlocks. Again, Rust gives you the option of (safely!) not caring. Rust gives you the ability to express that something *needs* synchronisation while not being specific about the *how*.
Note how we avoided any word like *"thread"*, but instead opted for "computation". The full power of `Send` and `Sync` is that they relieve you of the burden of knowing *what* shares. At the point of implementation, you only need to know which method of sharing is appropriate for the type at hand. This keeps reasoning local and is not influenced by whatever implementation the user of that type later uses.
`Send` and `Sync` can be composed in interesting fashions, but that's beyond the scope here. You can find examples in the [Rust Book][rust-book-sync].
@ -30,12 +32,12 @@ While computation is a subject to write a whole [book](https://computationbook.c
## Deferring computation
As mentioned above `Send` and `Sync` are about data. But programs are not only about data, they also talk about *computing* the data. And that's what [`Futures`][futures] do. We are going to have a close look at how that works in the next chapter. Let's look at what Futures allow us to express, in English. Futures go from this plan:
As mentioned above,`Send` and `Sync` are about data. But programs are not only about data, they also talk about *computing* the data. And that's what [`Futures`][futures] do. We are going to have a close look at how that works in the next chapter. Let's look at what Futures allow us to express, in English. Futures go from this plan:
- Do X
- If X succeeds, do Y
- If X succeeded, do Y
towards
towards:
- Start doing X
- Once X succeeds, start doing Y
@ -48,67 +50,81 @@ Remember the talk about "deferred computation" in the intro? That's all it is. I
Let's have a look at a simple function, specifically the return value:
You can call that at any time, so you are in full control on when you call it. But here's the problem: the moment you call it, you transfer control to the called function until it returns a value - eventually.
Note that this return value talks about the past. The past has a drawback: all decisions have been made. It has an advantage: the outcome is visible. We can unwrap the results of the program's past computation, and then decide what to do with it.
But we wanted to abstract over *computation* and let someone else choose how to run it. That's fundamentally incompatible with looking at the results of previous computation all the time. So, let's find a type that *describes* a computation without running it. Let's look at the function again:
You can call that at any time, so you are in full control of when you call it. But here's the problem: the moment you call it, you transfer control to the called function. It returns a value.
Note that this return value talks about the past. The past has a drawback: all decisions have been made. It has an advantage: the outcome is visible. We can unwrap the presents of program past and then decide what to do with it.
Speaking in terms of time, we can only take action *before* calling the function or *after* the function returned. This is not desirable, as it takes from us the ability to do something *while* it runs. When working with parallel code, this would take from us the ability to start a parallel task while the first runs (because we gave away control).
But here's a problem: we wanted to abstract over *computation* to be allowed to let someone else choose how to run it. That's fundamentally incompatible with looking at the results of previous computation all the time. So, let's find a type that describes a computation without running it. Let's look at the function again:
This is the moment where we could reach for [threads](https://en.wikipedia.org/wiki/Thread_). But threads are a very specific concurrency primitive and we said that we are searching for an abstraction.
What we are searching for is something that represents ongoing work towards a result in the future. Whenever we say "something" in Rust, we almost always mean a trait. Let's start with an incomplete definition of the `Future` trait:
Speaking in terms of time, we can only take action *before* calling the function or *after* the function returned. This is not desirable, as it takes from us the ability to do something *while* it runs. When working with parallel code, this would take from us the ability to start a parallel task while the first runs (because we gave away control).
```rust
trait Future {
type Output;
This is the moment where we could reach for [threads](https://en.wikipedia.org/wiki/Thread_). But threads are a very specific concurrency primitive and we said that we are searching for an abstraction.
What we are searching is something that represents ongoing work towards a result in the future. Whenever we say `something` in Rust, we almost always mean a trait. Let's start with an incomplete definition of the `Future` trait:
- It provides a function called `poll`, which allows us to check on the state of the current computation.
- (Ignore `Pin` and `Context` for now, you don't need them for high-level understanding.)
Ignore `Pin` and `Context` for now, you don't need them for high-level understanding. Looking at it closely, we see the following: it is generic over the `Output`. It provides a function called `poll`, which allows us to check on the state of the current computation.
Every call to `poll()` can result in one of these two cases:
1. The future is done, `poll` will return [`Poll::Ready`](https://doc.rust-lang.org/std/task/enum.Poll.html#variant.Ready)
2. The future has not finished executing, it will return [`Poll::Pending`](https://doc.rust-lang.org/std/task/enum.Poll.html#variant.Pending)
1. The computation is done, `poll` will return [`Poll::Ready`](https://doc.rust-lang.org/std/task/enum.Poll.html#variant.Ready)
2. The computation has not finished executing, it will return [`Poll::Pending`](https://doc.rust-lang.org/std/task/enum.Poll.html#variant.Pending)
This allows us to externally check if a `Future` has finished doing its work, or is finally done and can give us the value. The most simple way (but not efficient) would be to just constantly poll futures in a loop. There's optimisations here, and this is what a good runtime does for you.
Note that calling `poll` after case 1 happened may result in confusing behaviour. See the [Future docs](https://doc.rust-lang.org/std/future/trait.Future.html) for details.
This allows us to externally check if a `Future`still has unfinished work, or is finally done and can give us the value. The most simple (but not efficient) way would be to just constantly poll futures in a loop. There are optimisations possible, and this is what a good runtime does for you.
Note that calling `poll` again after case 1 happened may result in confusing behaviour. See the [futures-docs](https://doc.rust-lang.org/std/future/trait.Future.html) for details.
## Async
While the `Future` trait has existed in Rust for a while, it was inconvenient to build and describe them. For this, Rust now has a special syntax: `async`. The example from above, implemented in `async-std`, would look like this:
While the `Future` trait has existed in Rust for a while, it was inconvenient to build and describe them. For this, Rust now has a special syntax: `async`. The example from above, implemented with `async-std`, would look like this:
Amazingly little difference, right? All we did is label the function `async` and insert 2 special commands: `.await`.
This function sets up a deferred computation. When this function is called, it will produce a `Future<Output=Result<String, io::Error>>` instead of immediately returning a `Result<String, io::Error>`. (Or, more precisely, generate a type for you that implements `Future<Output=Result<String, io::Error>>`.)
This `async`function sets up a deferred computation. When this function is called, it will produce a `Future<Output=Result<String, io::Error>>` instead of immediately returning a `Result<String, io::Error>`. (Or, more precisely, generate a type for you that implements `Future<Output=Result<String, io::Error>>`.)
## What does `.await` do?
The `.await` postfix does exactly what it says on the tin: the moment you use it, the code will wait until the requested action (e.g. opening a file or reading all data in it) is finished. `.await?` is not special, it's just the application of the `?` operator to the result of `.await`. So, what is gained over the initial code example? We’re getting futures and then immediately waiting for them?
The `.await` postfix does exactly what it says on the tin: the moment you use it, the code will wait until the requested action (e.g. opening a file or reading all data in it) is finished. The `.await?` is not special, it's just the application of the `?` operator to the result of `.await`. So, what is gained over the initial code example? We're getting futures and then immediately waiting for them?
The `.await` points act as a marker. Here, the code will wait for a `Future` to produce its value. How will a future finish? You don’t need to care! The marker allows the code later *executing* this piece of code (usually called the “runtime”) when it can take some time to care about all the other things it has to do. It will come back to this point when the operation you are doing in the background is done. This is why this style of programming is also called *evented programming*. We are waiting for *things to happen* (e.g. a file to be opened) and then react (by starting to read).
The `.await` points act as a marker. Here, the code will wait for a `Future` to produce its value. How will a future finish? You don't need to care! The marker allows the component (usually called the “runtime”) in charge of *executing* this piece of code to take care of all the other things it has to do while the computation finishes. It will come back to this point when the operation you are doing in the background is done. This is why this style of programming is also called *evented programming*. We are waiting for *things to happen* (e.g. a file to be opened) and then react (by starting to read).
When executing 2 or more of these functions at the same time, our runtime system is then able to fill the wait time with handling *all the other events* currently going on.
Now that we know what Futures are, we now want to run them!
Now that we know what Futures are, we want to run them!
In `async-std`, the [`tasks`](https://docs.rs/async-std/latest/async_std/task/index.html) module is responsible for this. The simplest way is using the `block_on` function:
In `async-std`, the [`tasks`][tasks] module is responsible for this. The simplest way is using the `block_on` function:
```rust
use async_std::fs::File;
@ -29,7 +29,7 @@ fn main() {
}
```
This asks the runtime baked into `async_std` to execute the code that reads a file. Let’s go one by one, though, inside to outside.
This asks the runtime baked into `async_std` to execute the code that reads a file. Let's go one by one, though, inside to outside.
```rust
async {
@ -43,7 +43,7 @@ async {
This is an `async`*block*. Async blocks are necessary to call `async` functions, and will instruct the compiler to include all the relevant instructions to do so. In Rust, all blocks return a value and `async` blocks happen to return a value of the kind `Future`.
But let’s get to the interesting part:
But let's get to the interesting part:
```rust
@ -51,24 +51,26 @@ task::spawn(async { })
```
`spawn` takes a Future and starts running it on a `Task`. It returns a `JoinHandle`. Futures in Rust are sometimes called *cold* Futures. You need something that starts running them. To run a Future, there may be some additional bookkeeping required, e.g. if it's running or finished, where it is being placed in memory and what the current state is. This bookkeeping part is abstracted away in a `Task`. A `Task` is similar to a `Thread`, with some minor differences: it will be scheduled by the program instead of the operating system kernel and if it encounters a point where it needs to wait, the program itself is responsible for waking it up again. We’ll talk a little bit about that later. An `async_std` task can also have a name and an ID, just like a thread.
`spawn` takes a `Future` and starts running it on a `Task`. It returns a `JoinHandle`. Futures in Rust are sometimes called *cold* Futures. You need something that starts running them. To run a Future, there may be some additional bookkeeping required, e.g. whether it's running or finished, where it is being placed in memory and what the current state is. This bookkeeping part is abstracted away in a `Task`.
For now, it is enough to know that once you `spawn` a task, it will continue running in the background. The `JoinHandle` in itself is a future that will finish once the `Task` has run to conclusion. Much like with `threads` and the `join` function, we can now call `block_on` on the handle to *block* the program (or the calling thread, to be specific) to wait for it to finish.
A `Task` is similar to a `Thread`, with some minor differences: it will be scheduled by the program instead of the operating system kernel, and if it encounters a point where it needs to wait, the program itself is responsible for waking it up again. We'll talk a little bit about that later. An `async_std` task can also have a name and an ID, just like a thread.
For now, it is enough to know that once you have `spawn`ed a task, it will continue running in the background. The `JoinHandle` is itself a future that will finish once the `Task` has run to conclusion. Much like with `threads` and the `join` function, we can now call `block_on` on the handle to *block* the program (or the calling thread, to be specific) and wait for it to finish.
## Tasks in `async_std`
Tasks in `async_std` are one of the core abstractions. Much like Rust’s `thread`s, they provide some practical functionality over the raw concept. `Tasks` have a relationship to the runtime, but they are in themselves separate. `async_std` tasks have a number of desirable properties:
Tasks in `async_std` are one of the core abstractions. Much like Rust's `thread`s, they provide some practical functionality over the raw concept. `Tasks` have a relationship to the runtime, but they are in themselves separate. `async_std` tasks have a number of desirable properties:
- They are allocated in one single allocation
- All tasks have a *backchannel*, which allows them to propagate results and errors to the spawning task through the `JoinHandle`
- The carry desirable metadata for debugging
- The carry useful metadata for debugging
- They support task local storage
`async_std`'s task API handles setup and teardown of a backing runtime for you and doesn’t rely on a runtime being started.
`async_std`s task api handles setup and teardown of a backing runtime for you and doesn't rely on a runtime being explicitly started.
## Blocking
`Task`s are assumed to run _concurrently_, potentially by sharing a thread of execution. This means that operations blocking an _operating system thread_, such as `std::thread::sleep` or I/O function from Rust's stdlib will _stop execution of all tasks sharing this thread_. Other libraries (such as database drivers) have similar behaviour. Note that _blocking the current thread_ is not in and of itself bad behaviour, just something that does not mix well with the concurrent execution model of `async-std`. Essentially, never do this:
`Task`s are assumed to run _concurrently_, potentially by sharing a thread of execution. This means that operations blocking an _operating system thread_, such as `std::thread::sleep` or io function from Rust's `std` library will _stop execution of all tasks sharing this thread_. Other libraries (such as database drivers) have similar behaviour. Note that _blocking the current thread_ is not in and by itself bad behaviour, just something that does not mix well with the concurrent execution model of `async-std`. Essentially, never do this:
```rust
fn main() {
@ -79,13 +81,13 @@ fn main() {
}
```
If you want to mix operation kinds, consider putting such operations on a `thread`.
If you want to mix operation kinds, consider putting such blocking operations on a separate`thread`.
## Errors and panics
`Task`s report errors through normal channels: If they are fallible, their `Output` should be of the kind `Result<T,E>`.
Tasks report errors through normal patterns: If they are fallible, their `Output` should be of kind `Result<T,E>`.
In case of `panic`, behaviour differs depending on if there's a reasonable part that addresses the `panic`. If not, the program _aborts_.
In case of `panic`, behaviour differs depending on whether there's a reasonable part that addresses the `panic`. If not, the program _aborts_.
In practice, that means that `block_on` propagates panics to the blocking component:
@ -126,4 +128,6 @@ That might seem odd at first, but the other option would be to silently ignore p
`async_std` comes with a useful `Task` type that works with an API similar to `std::thread`. It covers error and panic behaviour in a structured and defined way.
Tasks are separate concurrent units and sometimes they need to communicate. That’s where `Stream`s come in.
Tasks are separate concurrent units and sometimes they need to communicate. That's where `Stream`s come in.