예제로 배우는 러스트 (Rust by Example) 한국어판
러스트는 안전성과 속도 그리고, 병렬 처리에 초점을 맞춘 최신 시스템 프로그래밍 언어 입니다. 러스트는 이를 위해 가비지 컬렉션 기술을 사용하지 않고 메모리 안전성을 지원합니다.
이 문서는 실행 가능한 예제들로 러스트의 여러가지 개념과 표준 라이브러리를 소개합니다. 예제들을 사용하려면 로컬에 러스트를 설치하고 공식 문서도 읽어보기 바랍니다. 관심있는 분은 이 문서의 소스도 보아주세요.
(역주: 보고 계신 한글판의 번역은 여기에서 진행하고 있습니다.)
이제 시작할까요!
-
인사하기 - 전통의 Hello World 부터 만들어봅시다.
-
기본 자료형 - 부호있는 정수형과 부호없는 정수형, 기타 기본 자료형들에 대해 배웁시다.
-
사용자 정의 자료형 -
struct
와enum
. -
변수 바인딩 - mutable bindings, scope, shadowing.
-
자료형 - Learn about changing and defining types.
-
제어문 -
if
/else
,for
, and others. -
함수 - Learn about Methods, Closures and High Order Functions.
-
모듈 - Organize code using modules
-
크레이트(Crate) - A crate is a compilation unit in Rust. Learn to create a library.
-
카고(Cargo) - Go through some basic features of the official Rust package management tool.
-
Attributes - An attribute is metadata applied to some module, crate or item.
-
Generics - Learn about writing a function or data type which can work for multiple types of arguments.
-
Scoping rules - Scopes play an important part in ownership, borrowing, and lifetimes.
-
Traits - A trait is a collection of methods defined for an unknown type:
Self
-
Error handling - Learn Rust way of handling failures.
-
Std library types - Learn about some custom types provided by
std
library. -
Std misc - More custom types for file handling, threads.
-
Testing - All sorts of testing in Rust.
-
Meta - Documentation, Benchmarking.
인사하기
다음은 전통의 Hello World 프로그램 소스입니다.
println!
은 문자열을 콘솔에 출력하는 macro 입니다.
실행 파일은 러스트 컴파일러 rustc
로 만들 수 있습니다.
$ rustc hello.rs
rustc
가 실행 파일 hello
를 만들어 줄 겁니다.
$ ./hello
Hello World!
실습
페이지 위쪽의 프로그램 상자에서 "Run" 을 클릭하면 어떤 내용이 출력되는지 확인합니다. 그리고, 한 줄을 추가하고
println!
매크로를 한 번 더 사용해서 아래 문자열이 출력되도록 해보세요.
Hello World!
I'm a Rustacean!
코멘트
모든 프로그램에는 코멘트가 필요합니다. 러스트는 이를 위해 몇가지 문법을 제공합니다.
- 일반 코멘트를 사용하면 컴파일러가 안쪽의 내용을 무시해줍니다. :
// 해당 줄의 끝까지 코멘트가 됩니다.
/* 둘러싼 부분이 코멘트가 됩니다. */
- 문서 코멘트는 라이브러리 문서의 생성에 사용됩니다. :
/// 이 줄 다음에 오는 항목의 문서를 생성합니다.
//! 이 줄을 포함한 항목의 문서를 생성합니다.
참고:
형식을 지정하는 출력
러스트에서 출력 관련 기능은 std::fmt
에 정의된 몇개의 macro
로 처리합니다.
format!
: 형식 지정 문자열을String
에 출력합니다.print!
:format!
과 동일하지만, 출력을 콘솔 (io::stdout) 에 합니다.println!
:print!
과 동일하지만, 개행 문자를 덧붙여줍니다.eprint!
:format!
과 동일하지만, 표준 오류 스트림 (io::stderr) 에 출력합니다.eprintln!
:eprint!
과 동일하지만, 개행 문자를 덧붙여줍니다.
이들은 모두 동일한 형식 지정자를 사용합니다. 러스트는 컴파일 시점에 형식 지정이 올바른지 여부도 검사합니다.
std::fmt
에는 문자열 출력에 관련된 많은 트레잇(traits)
이 있습니다. 다음은 두개의 중요한 트레잇입니다.
fmt::Debug
:{:?}
에 사용됩니다. 디버깅에 사용합니다.fmt::Display
:{}
를 사용됩니다. 보기 편하게 출력 형식을 지정하는데 사용합니다.
위의 예제에서는 표준 라이브러리가 지원하는 자료형들을 출력했기 때문에 fmt::Display
를 사용했습니다.
사용자 정의 자료형을 위해서는 추가 작업이 필요합니다.
만약 fmt::Display
트레잇을 구현해주면 자동으로 ToString
트레잇이 구현되고,
해당 자료형을 String
으로 변환(convert)
할 수 있게됩니다.
Activities
- 위 코드에서 두 개의 이슈(FIXME 라고 된 부분들)를 수정하고 오류없이 실행되게 해보세요.
println!
매크로의 소수점 표시 기능을 이용해서원주율의 근사치는 3.142이다.
를 출력해보세요. 위한 파이값은let pi = 3.141592
라고 정의해주세요. (힌트:std::fmt
문서에서 소수점 표시(Precision) 항목을 참고하세요.)
참고:
std::fmt
, 매크로(macros)
, 구조체(struct)
,
트레잇(traits)
디버그
std::fmt
의 형식 지정자를 사용하려면 출력할 자료형마다 해당 기능을 구현해야 합니다.
표준(std
) 라이브러리의 자료형들에 대해서는 이미 구현이 되어 있지만,
다른 자료형의 경우에는 반드시 직접 구현해야만 합니다.
이때 fmt::Debug
를 이용하면 어렵지 않게 만들 수 있습니다.
어떤 자료형이든 fmt::Debug
을 이용해 파생 구현(derive
)을 할 수 있습니다.
또한 모든 표준(std) 라이브러리
의 자료형들은 {:?}
으로 출력이 가능합니다.
즉 fmt::Debug
는 출력 기능은 제공하지만, 우아함은 포기해야합니다.
러스트에는 {:#?}
를 이용한 "예쁘게 출력하기" 기능도 있습니다.
출력 형식을 바꾸려면 fmt::Display
를 직접 구현해야만 합니다.
참고:
속성(attributes)
, 파생 구현(derive)
, std::fmt
,
구조체(struct)
Display
fmt::Debug
의 출력은 별로 깔끔하지 않기 때문에, 출력 형태를 별도로 구현해야 하는
경우가 많습니다. 이때는 fmt::Display
트레잇을 구현하면 {}
마커로
출력할 수 있습니다. 구현하는 방법은 다음과 같습니다.
fmt::Display
를 쓰는 편이 fmt::Debug
의 경우 보다 더 깔끔합니다만,
표준(std) 라이브러리
에 구현하기에는 어려움이 있습니다. 출력 형식을 지정하기 애매한 자료형 때문입니다.
예를 들어, 표준 라이브러리에서 모든 Vec<T>
에 대해 출력방식을 구현한다면
어떤 식으로 해야 할까요? 다음 두가지 중에 어느쪽이 적절할까요?
Vec<path>
:/:/etc:/home/username:/bin
(:
로 나눠서 표시하기)Vec<number>
:1,2,3
(,
로 나눠서 표시하기)
둘 다 안됩니다. 모든 자료형을 위한 이상적인 한가지 출력형식이란 있을 수 없고,
표준라이브러리가 어떤 한가지 방식을 강제해서도 안되기 때문입니다. fmt::Display
는
Vec<T>
나 다른 제네릭 컨테이너에 대해서는 구현되어있지 않습니다. 이런 경우에는
fmt::Debug
를 사용하면 됩니다.
이것이 큰 문제는 되지 않습니다. 제네릭이 아닌 모든 새로운 컨테이너 자료형은
fmt::Display
를 구현하면 되기 때문입니다.
fmt::Display
는 구현했지만 fmt::Binary
는 안했기 때문에 {:b}
는 사용할 수 없습니다.
std::fmt
에는 많은 트레잇(traits)
이 있고 각각을 구현해주어야 합니다.
더 자세한 사항은 std::fmt
을 보아주세요.
실습
위의 출력을 확인하시고, Point2D
구조체를 참고해서 복소수(Complex 라고 명명하세요)
구조체를 만들어서 출력하면 다음처럼 나오도록 구현해보세요.
Display: 3.3 + 7.2i
Debug: Complex { real: 3.3, imag: 7.2 }
참고:
파생 구현(derive)
, std::fmt
, 매크로(macros)
,
구조체(struct)
, 트레잇(trait)
, use
테스트 케이스: List
내부 요소들이 순서대로 처리되어야 하는 구조체의 경우 fmt::Display
를 구현하기가 어렵습니다.
각각의 write!
들이 fmt::Result
를 리턴하는 것이 문제가 됩니다.
적절한 처리를 위해서는 각각의 리턴값을 잘 처리해야 하는데요.
러스트에는 정확히 이때 사용할 수 있는 ?
연산자가 있습니다.
write!
에서 ?
를 사용하는 방법은 다음과 같습니다.
// write! 를 실행하고 오류가 있는지 검사합니다. 만약 오류가 발생하면
// 오류를 리턴하고 아니면 계속 진행합니다.
write!(f, "{}", value)?;
연산자 ?
를 사용하면, Vec
의 fmt::Display
구현도 쉽게 할 수 있습니다.
실습
코드를 수정해서 각각의 인덱스 번호도 출력되게 해보세요. 다음처럼 출력되면 성공입니다.
[0: 1, 1: 2, 2: 3]
참고
for
, ref
, Result
, struct
,
?
, and vec!
출력 형식
출력 형식을 지정할 때 형식 지정자를 사용하는 것을 앞에서 보았습니다.
format!("{}", foo)
->"3735928559"
format!("0x{:X}", foo)
->"0xDEADBEEF"
format!("0o{:o}", foo)
->"0o33653337357"
동일한 변수(foo
)도 X
, o
등 지정된 형식에 따라 다르게 출력됩니다.
형식을 지정하는 기능은 트레잇을 통해서 구현되고, 인자의 자료형마다 트레잇이
하나씨 있습니다. 가장 자주 쓰이는 트레잇은 Display
이고, 형식을 지정하지
않는 경우 {}
를 담당합니다.
std::fmt
문서에 형식지정 트레잇 전체 목록과 인자들이
있습니다.
실습
Color
구조체의 fmt::Display
트레잇을 구현해서 다음처럼 출력되게 해보세요.
RGB (128, 255, 90) 0x80FF5A
RGB (0, 3, 254) 0x0003FE
RGB (0, 0, 0) 0x000000
다음을 참고하시면 구현할 수 있습니다. :
- 각각의 색상을 한번 이상 표시하기,
:02
로 0을 붙여서 2글자로 출력하기.
참고:
기본 자료형
러스트는 다양한 종류의 기본 자료형
을 제공합니다. 여기서는 그중 몇가지를 소개합니다.
단순 자료형
- 부호있는 정수:
i8
,i16
,i32
,i64
,i128
,isize
(포인터 사이즈) - 부호없는 정수:
u8
,u16
,u32
,u64
,u128
,usize
(포인터 사이즈) - 실수:
f32
,f64
char
유니코드 자료형'a'
,'α'
,'∞'
(각각 4바이트)bool
자료형:true
또는false
- 그리고 비어있는 튜플
()
만을 값으로 가지는 유닛 자료형()
.
유닛 자료형은 튜플이지만, 여러 개의 값을 가지지는 않아서 복합 자료형이 아닙니다.
복합 자료형
[1, 2, 3]
과 같은 배열(1, true)
과 같은 튜플
모든 변수는 자료형을 지정할 수 있습니다. 숫자들은 후위표시(suffix) 로
자료형을 표시합니다. 정수는 기본적으로 i32
이고 실수는 f64
입니다.
러스트는 문맥으로부터 자료형을 추론 할 수 있습니다.
참고
표준(std)
라이브러리, mut
, inference
, shadowing
변수와 연산자
변수를 정의할 때는, 정수의 경우는 1
, 실수는 1.2
, 문자는'a'
, 문자열은 "abc"
,
불리언은 true
그리고 유닛은 ()
와 같은 문법을 사용합니다.
정수는 16진수나 8진수 또는 2진수로 표시할 수 있으며, 각각의 변수 앞에 다음을
붙여서 표시합니다: 0x
, 0o
or 0b
.
숫자 자료형은 가독성을 높이기 위해 밑줄을 넣을 수 있습니다. 예를 들면,
1_000
은 1000
과 같고, 0.000_001
는 0.000001
과 같습니다.
변수를 정의할 때는 컴파일러에게 자료형을 알려주어야 합니다. 여기서는
u32
로 부호없는 32비트 정수형임을 알리고, i32
로 부호있는 32비트 정수임을 표시했습니다.
사용가능한 연산자와 우선 순위는 C계열 언어와 비슷합니다.
튜플
튜플은 자료형이 다른 값들의 모음입니다. 괄호()
로 생성하는데,
나열된 자료형 (T1, T2, ...)
에 해당하는 값들을 넣을 수 있습니다.
여러 개의 값을 가지기 때문에 함수에서 한개 이상의 값을 리턴할 때 사용할 수 있습니다.
실습
-
복습: 위의 예제에서 구조체 Matrix 를 위한
fmt::Display
트레잇을 추가하고, 디버깅 포맷{:?}
을 디스플레이 포맷{}
으로 변경하고, 다음처럼 출력되게 합니다.( 1.1 1.2 ) ( 2.1 2.2 )
앞서 보았던 예제를 참고하세요.
-
reverse
함수를 기반으로transpose
함수를 만들고, matrix 를 인자로 받도록 해서, 두 요소를 바꿔치기한 matrix 를 리턴하게 하세요.println!("Matrix:\n{}", matrix); println!("Transpose:\n{}", transpose(matrix));
라고 하면 다음이 출력되게 만드시면 됩니다.
Matrix: ( 1.1 1.2 ) ( 2.1 2.2 ) Transpose: ( 1.1 2.1 ) ( 1.2 2.2 )
배열과 슬라이스
배열은 동일한 자료형(T
)의 모음으로, 연속된 메모리에 저장됩니다. 대괄호 []
로 정의하고, 길이는 [T; length]
로 컴파일 할 때 지정해야합니다.
슬라이스는 배열과 비슷한 개념이지만, 컴파일 할 때 길이를 지정하지 않는 점이 다릅니다.
대신, 객체 내부에 2개의 워드를 가집니다. 첫번째 워드는 데이터를 가리키는 포인터로 사용하고,
두번째 워드에는 슬라이스의 길이를 저장합니다. 워드는 usize와 크기가 같은데, 이 값은 CPU 에따라 달라집니다.
즉 x86-64 CPU에서는 64비트가 됩니다. 배열의 일부를 빌려올 때 사용할 수 있고, 문법은 &[T]
입니다.
사용자 정의 자료형
러스트의 사용자 정의 자료형은 두개의 키워드로 만듭니다.
struct
: 구조체enum
: 열거형
상수의 경우에는 const
와 static
키워드로 생성합니다.
구조체
struct
키워드로 만들 수 있는 구조체에는 3가지 타입이 있습니다.
- 튜플 구조체. 이것은 기본적으로 네임드 튜플입니다.
- 고전적인 C 언어의 구조체.
- 유닛 구조체. 필드가 없는 제너릭에 유용한 구조체입니다.
Activity
- Add a function
rect_area
which calculates the area of a rectangle (try using nested destructuring). - Add a function
square
which takes aPoint
and af32
as arguments, and returns aRectangle
with its lower left corner on the point, and a width and height corresponding to thef32
.
See also
attributes
, and destructuring
Enums
The enum
keyword allows the creation of a type which may be one of a few
different variants. Any variant which is valid as a struct
is also valid as
an enum
.
Type aliases
If you use a type alias, you can refer to each enum variant via its alias. This might be useful if the enum's name is too long or too generic, and you want to rename it.
The most common place you'll see this is in impl
blocks using the Self
alias.
To learn more about enums and type aliases, you can read the stabilization report from when this feature was stabilized into Rust.
See also:
match
, fn
, and String
, "Type alias enum variants" RFC
use
The use
declaration can be used so manual scoping isn't needed:
See also:
C-like
enum
can also be used as C-like enums.
See also:
Testcase: linked-list
A common use for enums
is to create a linked-list:
See also:
constants
Rust has two different types of constants which can be declared in any scope including global. Both require explicit type annotation:
const
: An unchangeable value (the common case).static
: A possiblymut
able variable with'static
lifetime. The static lifetime is inferred and does not have to be specified. Accessing or modifying a mutable static variable isunsafe
.
See also:
The const
/static
RFC,
'static
lifetime
Variable Bindings
Rust provides type safety via static typing. Variable bindings can be type annotated when declared. However, in most cases, the compiler will be able to infer the type of the variable from the context, heavily reducing the annotation burden.
Values (like literals) can be bound to variables, using the let
binding.
Mutability
Variable bindings are immutable by default, but this can be overridden using
the mut
modifier.
The compiler will throw a detailed diagnostic about mutability errors.
Scope and Shadowing
Variable bindings have a scope, and are constrained to live in a block. A
block is a collection of statements enclosed by braces {}
.
Also, variable shadowing is allowed.
Declare first
It's possible to declare variable bindings first, and initialize them later. However, this form is seldom used, as it may lead to the use of uninitialized variables.
The compiler forbids use of uninitialized variables, as this would lead to undefined behavior.
Freezing
When data is bound by the same name immutably, it also freezes. Frozen data can't be modified until the immutable binding goes out of scope:
Types
Rust provides several mechanisms to change or define the type of primitive and user defined types. The following sections cover:
- Casting between primitive types
- Specifying the desired type of literals
- Using type inference
- Aliasing types
Casting
Rust provides no implicit type conversion (coercion) between primitive types.
But, explicit type conversion (casting) can be performed using the as
keyword.
Rules for converting between integral types follow C conventions generally, except in cases where C has undefined behavior. The behavior of all casts between integral types is well defined in Rust.
Literals
Numeric literals can be type annotated by adding the type as a suffix. As an example,
to specify that the literal 42
should have the type i32
, write 42i32
.
The type of unsuffixed numeric literals will depend on how they are used. If no
constraint exists, the compiler will use i32
for integers, and f64
for
floating-point numbers.
There are some concepts used in the previous code that haven't been explained yet, here's a brief explanation for the impatient readers:
std::mem::size_of_val
is a function, but called with its full path. Code can be split in logical units called modules. In this case, thesize_of_val
function is defined in themem
module, and themem
module is defined in thestd
crate. For more details, see modules and crates.
Inference
The type inference engine is pretty smart. It does more than looking at the type of the value expression during an initialization. It also looks at how the variable is used afterwards to infer its type. Here's an advanced example of type inference:
No type annotation of variables was needed, the compiler is happy and so is the programmer!
Aliasing
The type
statement can be used to give a new name to an existing type. Types
must have UpperCamelCase
names, or the compiler will raise a warning. The
exception to this rule are the primitive types: usize
, f32
, etc.
The main use of aliases is to reduce boilerplate; for example the IoResult<T>
type
is an alias for the Result<T, IoError>
type.
See also:
Conversion
Primitive types can be converted to each other through casting.
Rust addresses conversion between custom types (i.e., struct
and enum
)
by the use of traits. The generic
conversions will use the From
and Into
traits. However there are more
specific ones for the more common cases, in particular when converting to and
from String
s.
From
and Into
The From
and Into
traits are inherently linked, and this is actually part of
its implementation. If you are able to convert type A from type B, then it
should be easy to believe that we should be able to convert type B to type A.
From
The From
trait allows for a type to define how to create itself from another
type, hence providing a very simple mechanism for converting between several
types. There are numerous implementations of this trait within the standard
library for conversion of primitive and common types.
For example we can easily convert a str
into a String
We can do similar for defining a conversion for our own type.
Into
The Into
trait is simply the reciprocal of the From
trait. That is, if you
have implemented the From
trait for your type, Into
will call it when
necessary.
Using the Into
trait will typically require specification of the type to
convert into as the compiler is unable to determine this most of the time.
However this is a small trade-off considering we get the functionality for free.
TryFrom
and TryInto
Similar to From
and Into
, TryFrom
and TryInto
are
generic traits for converting between types. Unlike From
/Into
, the
TryFrom
/TryInto
traits are used for fallible conversions, and as such,
return Result
s.
To and from Strings
Converting to String
To convert any type to a String
is as simple as implementing the ToString
trait for the type. Rather than doing so directly, you should implement the
fmt::Display
trait which automagically provides ToString
and
also allows printing the type as discussed in the section on print!
.
Parsing a String
One of the more common types to convert a string into is a number. The idiomatic
approach to this is to use the parse
function and either to arrange for
type inference or to specify the type to parse using the 'turbofish' syntax.
Both alternatives are shown in the following example.
This will convert the string into the type specified so long as the FromStr
trait is implemented for that type. This is implemented for numerous types
within the standard library. To obtain this functionality on a user defined type
simply implement the FromStr
trait for that type.
Expressions
A Rust program is (mostly) made up of a series of statements:
fn main() {
// statement
// statement
// statement
}
There are a few kinds of statements in Rust. The most common two are declaring
a variable binding, and using a ;
with an expression:
fn main() {
// variable binding
let x = 5;
// expression;
x;
x + 1;
15;
}
Blocks are expressions too, so they can be used as values in
assignments. The last expression in the block will be assigned to the
place expression such as a local variable. However, if the last expression of the block ends with a
semicolon, the return value will be ()
.
Flow of Control
An essential part of any programming languages are ways to modify control flow:
if
/else
, for
, and others. Let's talk about them in Rust.
if/else
Branching with if
-else
is similar to other languages. Unlike many of them,
the boolean condition doesn't need to be surrounded by parentheses, and each
condition is followed by a block. if
-else
conditionals are expressions,
and, all branches must return the same type.
loop
Rust provides a loop
keyword to indicate an infinite loop.
The break
statement can be used to exit a loop at anytime, whereas the
continue
statement can be used to skip the rest of the iteration and start a
new one.
Nesting and labels
It's possible to break
or continue
outer loops when dealing with nested
loops. In these cases, the loops must be annotated with some 'label
, and the
label must be passed to the break
/continue
statement.
Returning from loops
One of the uses of a loop
is to retry an operation until it succeeds. If the
operation returns a value though, you might need to pass it to the rest of the
code: put it after the break
, and it will be returned by the loop
expression.
while
The while
keyword can be used to run a loop while a condition is true.
Let's write the infamous FizzBuzz using a while
loop.
for loops
for and range
The for in
construct can be used to iterate through an Iterator
.
One of the easiest ways to create an iterator is to use the range
notation a..b
. This yields values from a
(inclusive) to b
(exclusive) in steps of one.
Let's write FizzBuzz using for
instead of while
.
Alternatively, a..=b
can be used for a range that is inclusive on both ends.
The above can be written as:
for and iterators
The for in
construct is able to interact with an Iterator
in several ways.
As discussed in the section on the Iterator trait, by default the for
loop will apply the into_iter
function to the collection. However, this is
not the only means of converting collections into iterators.
into_iter
, iter
and iter_mut
all handle the conversion of a collection
into an iterator in different ways, by providing different views on the data
within.
iter
- This borrows each element of the collection through each iteration. Thus leaving the collection untouched and available for reuse after the loop.
into_iter
- This consumes the collection so that on each iteration the exact data is provided. Once the collection has been consumed it is no longer available for reuse as it has been 'moved' within the loop.
iter_mut
- This mutably borrows each element of the collection, allowing for the collection to be modified in place.
In the above snippets note the type of match
branch, that is the key
difference in the types of iteration. The difference in type then of course
implies differing actions that are able to be performed.
See also:
match
Rust provides pattern matching via the match
keyword, which can be used like
a C switch
.
Destructuring
A match
block can destructure items in a variety of ways.
tuples
Tuples can be destructured in a match
as follows:
See also:
enums
An enum
is destructured similarly:
See also:
#[allow(...)]
, color models and enum
pointers/ref
For pointers, a distinction needs to be made between destructuring
and dereferencing as they are different concepts which are used
differently from a language like C
.
- Dereferencing uses
*
- Destructuring uses
&
,ref
, andref mut
See also:
structs
Similarly, a struct
can be destructured as shown:
See also:
Guards
A match
guard can be added to filter the arm.
See also:
Binding
Indirectly accessing a variable makes it impossible to branch and use that
variable without re-binding. match
provides the @
sigil for binding values to
names:
You can also use binding to "destructure" enum
variants, such as Option
:
See also:
if let
For some use cases, when matching enums, match
is awkward. For example:
if let
is cleaner for this use case and in addition allows various
failure options to be specified:
In the same way, if let
can be used to match any enum value:
Another benefit is that if let
allows us to match non-parameterized enum variants. This is true even in cases where the enum doesn't implement or derive PartialEq
. In such cases if Foo::Bar == a
would fail to compile, because instances of the enum cannot be equated, however if let
will continue to work.
Would you like a challenge? Fix the following example to use if let
:
See also:
while let
Similar to if let
, while let
can make awkward match
sequences
more tolerable. Consider the following sequence that increments i
:
Using while let
makes this sequence much nicer:
See also:
Functions
Functions are declared using the fn
keyword. Its arguments are type
annotated, just like variables, and, if the function returns a value, the
return type must be specified after an arrow ->
.
The final expression in the function will be used as return value.
Alternatively, the return
statement can be used to return a value earlier
from within the function, even from inside loops or if
statements.
Let's rewrite FizzBuzz using functions!
Methods
Methods are functions attached to objects. These methods have access to the
data of the object and its other methods via the self
keyword. Methods are
defined under an impl
block.
Closures
Closures are functions that can capture the enclosing environment. For example, a closure that captures the x variable:
|val| val + x
The syntax and capabilities of closures make them very convenient for on the fly usage. Calling a closure is exactly like calling a function. However, both input and return types can be inferred and input variable names must be specified.
Other characteristics of closures include:
- using
||
instead of()
around input variables. - optional body delimination (
{}
) for a single expression (mandatory otherwise). - the ability to capture the outer environment variables.
Capturing
Closures are inherently flexible and will do what the functionality requires to make the closure work without annotation. This allows capturing to flexibly adapt to the use case, sometimes moving and sometimes borrowing. Closures can capture variables:
- by reference:
&T
- by mutable reference:
&mut T
- by value:
T
They preferentially capture variables by reference and only go lower when required.
Using move
before vertical pipes forces closure
to take ownership of captured variables:
See also:
Box
and std::mem::drop
As input parameters
While Rust chooses how to capture variables on the fly mostly without type
annotation, this ambiguity is not allowed when writing functions. When
taking a closure as an input parameter, the closure's complete type must be
annotated using one of a few traits
. In order of decreasing restriction,
they are:
Fn
: the closure captures by reference (&T
)FnMut
: the closure captures by mutable reference (&mut T
)FnOnce
: the closure captures by value (T
)
On a variable-by-variable basis, the compiler will capture variables in the least restrictive manner possible.
For instance, consider a parameter annotated as FnOnce
. This specifies
that the closure may capture by &T
, &mut T
, or T
, but the compiler
will ultimately choose based on how the captured variables are used in the
closure.
This is because if a move is possible, then any type of borrow should also
be possible. Note that the reverse is not true. If the parameter is
annotated as Fn
, then capturing variables by &mut T
or T
are not
allowed.
In the following example, try swapping the usage of Fn
, FnMut
, and
FnOnce
to see what happens:
See also:
std::mem::drop
, Fn
, FnMut
, Generics, where and FnOnce
Type anonymity
Closures succinctly capture variables from enclosing scopes. Does this have any consequences? It surely does. Observe how using a closure as a function parameter requires generics, which is necessary because of how they are defined:
When a closure is defined, the compiler implicitly creates a new
anonymous structure to store the captured variables inside, meanwhile
implementing the functionality via one of the traits
: Fn
, FnMut
, or
FnOnce
for this unknown type. This type is assigned to the variable which
is stored until calling.
Since this new type is of unknown type, any usage in a function will require
generics. However, an unbounded type parameter <T>
would still be ambiguous
and not be allowed. Thus, bounding by one of the traits
: Fn
, FnMut
, or
FnOnce
(which it implements) is sufficient to specify its type.
See also:
A thorough analysis, Fn
, FnMut
,
and FnOnce
Input functions
Since closures may be used as arguments, you might wonder if the same can be said about functions. And indeed they can! If you declare a function that takes a closure as parameter, then any function that satisfies the trait bound of that closure can be passed as a parameter.
As an additional note, the Fn
, FnMut
, and FnOnce
traits
dictate how
a closure captures variables from the enclosing scope.
See also:
As output parameters
Closures as input parameters are possible, so returning closures as
output parameters should also be possible. However, anonymous
closure types are, by definition, unknown, so we have to use
impl Trait
to return them.
The valid traits for returning a closure are:
Fn
FnMut
FnOnce
Beyond this, the move
keyword must be used, which signals that all captures
occur by value. This is required because any captures by reference would be
dropped as soon as the function exited, leaving invalid references in the
closure.
See also:
Fn
, FnMut
, Generics and impl Trait.
Examples in std
This section contains a few examples of using closures from the std
library.
Iterator::any
Iterator::any
is a function which when passed an iterator, will return
true
if any element satisfies the predicate. Otherwise false
. Its
signature:
pub trait Iterator {
// The type being iterated over.
type Item;
// `any` takes `&mut self` meaning the caller may be borrowed
// and modified, but not consumed.
fn any<F>(&mut self, f: F) -> bool where
// `FnMut` meaning any captured variable may at most be
// modified, not consumed. `Self::Item` states it takes
// arguments to the closure by value.
F: FnMut(Self::Item) -> bool {}
}
See also:
Searching through iterators
Iterator::find
is a function which iterates over an iterator and searches for the
first value which satisfies some condition. If none of the values satisfy the
condition, it returns None
. Its signature:
pub trait Iterator {
// The type being iterated over.
type Item;
// `find` takes `&mut self` meaning the caller may be borrowed
// and modified, but not consumed.
fn find<P>(&mut self, predicate: P) -> Option<Self::Item> where
// `FnMut` meaning any captured variable may at most be
// modified, not consumed. `&Self::Item` states it takes
// arguments to the closure by reference.
P: FnMut(&Self::Item) -> bool {}
}
Iterator::find
gives you a reference to the item. But if you want the index of the
item, use Iterator::position
.
See also:
std::iter::Iterator::rposition
Higher Order Functions
Rust provides Higher Order Functions (HOF). These are functions that take one or more functions and/or produce a more useful function. HOFs and lazy iterators give Rust its functional flavor.
Option and Iterator implement their fair share of HOFs.
Diverging functions
Diverging functions never return. They are marked using !
, which is an empty type.
As opposed to all the other types, this one cannot be instantiated, because the
set of all possible values this type can have is empty. Note that, it is
different from the ()
type, which has exactly one possible value.
For example, this function returns as usual, although there is no information in the return value.
fn some_fn() {
()
}
fn main() {
let a: () = some_fn();
println!("This function returns and you can see this line.")
}
As opposed to this function, which will never return the control back to the caller.
#![feature(never_type)]
fn main() {
let x: ! = panic!("This call never returns.");
println!("You will never see this line!");
}
Although this might seem like an abstract concept, it is in fact very useful and
often handy. The main advantage of this type is that it can be cast to any other
one and therefore used at places where an exact type is required, for instance
in match
branches. This allows us to write code like this:
fn main() {
fn sum_odd_numbers(up_to: u32) -> u32 {
let mut acc = 0;
for i in 0..up_to {
// Notice that the return type of this match expression must be u32
// because of the type of the "addition" variable.
let addition: u32 = match i%2 == 1 {
// The "i" variable is of type u32, which is perfectly fine.
true => i,
// On the other hand, the "continue" expression does not return
// u32, but it is still fine, because it never returns and therefore
// does not violate the type requirements of the match expression.
false => continue,
};
acc += addition;
}
acc
}
println!("Sum of odd numbers up to 9 (excluding): {}", sum_odd_numbers(9));
}
It is also the return type of functions that loop forever (e.g. loop {}
) like
network servers or functions that terminates the process (e.g. exit()
).
Modules
Rust provides a powerful module system that can be used to hierarchically split code in logical units (modules), and manage visibility (public/private) between them.
A module is a collection of items: functions, structs, traits, impl
blocks,
and even other modules.
Visibility
By default, the items in a module have private visibility, but this can be
overridden with the pub
modifier. Only the public items of a module can be
accessed from outside the module scope.
Struct visibility
Structs have an extra level of visibility with their fields. The visibility
defaults to private, and can be overridden with the pub
modifier. This
visibility only matters when a struct is accessed from outside the module
where it is defined, and has the goal of hiding information (encapsulation).
See also:
The use
declaration
The use
declaration can be used to bind a full path to a new name, for easier
access. It is often used like this:
You can use the as
keyword to bind imports to a different name:
super
and self
The super
and self
keywords can be used in the path to remove ambiguity
when accessing items and to prevent unnecessary hardcoding of paths.
File hierarchy
Modules can be mapped to a file/directory hierarchy. Let's break down the visibility example in files:
$ tree .
.
|-- my
| |-- inaccessible.rs
| |-- mod.rs
| `-- nested.rs
`-- split.rs
In split.rs
:
// This declaration will look for a file named `my.rs` or `my/mod.rs` and will
// insert its contents inside a module named `my` under this scope
mod my;
fn function() {
println!("called `function()`");
}
fn main() {
my::function();
function();
my::indirect_access();
my::nested::function();
}
In my/mod.rs
:
// Similarly `mod inaccessible` and `mod nested` will locate the `nested.rs`
// and `inaccessible.rs` files and insert them here under their respective
// modules
mod inaccessible;
pub mod nested;
pub fn function() {
println!("called `my::function()`");
}
fn private_function() {
println!("called `my::private_function()`");
}
pub fn indirect_access() {
print!("called `my::indirect_access()`, that\n> ");
private_function();
}
In my/nested.rs
:
pub fn function() {
println!("called `my::nested::function()`");
}
#[allow(dead_code)]
fn private_function() {
println!("called `my::nested::private_function()`");
}
In my/inaccessible.rs
:
#[allow(dead_code)]
pub fn public_function() {
println!("called `my::inaccessible::public_function()`");
}
Let's check that things still work as before:
$ rustc split.rs && ./split
called `my::function()`
called `function()`
called `my::indirect_access()`, that
> called `my::private_function()`
called `my::nested::function()`
Crates
A crate is a compilation unit in Rust. Whenever rustc some_file.rs
is called,
some_file.rs
is treated as the crate file. If some_file.rs
has mod
declarations in it, then the contents of the module files would be inserted in
places where mod
declarations in the crate file are found, before running
the compiler over it. In other words, modules do not get compiled
individually, only crates get compiled.
A crate can be compiled into a binary or into a library. By default, rustc
will produce a binary from a crate. This behavior can be overridden by passing
the --crate-type
flag to lib
.
Creating a Library
Let's create a library, and then see how to link it to another crate.
pub fn public_function() {
println!("called rary's `public_function()`");
}
fn private_function() {
println!("called rary's `private_function()`");
}
pub fn indirect_access() {
print!("called rary's `indirect_access()`, that\n> ");
private_function();
}
$ rustc --crate-type=lib rary.rs
$ ls lib*
library.rlib
Libraries get prefixed with "lib", and by default they get named after their
crate file, but this default name can be overridden by passing
the --crate-name
option to rustc
or by using the crate_name
attribute.
Using a Library
To link a crate to this new library you may use rustc
's --extern
flag. All
of its items will then be imported under a module named the same as the library.
This module generally behaves the same way as any other module.
// extern crate rary; // May be required for Rust 2015 edition or earlier
fn main() {
rary::public_function();
// Error! `private_function` is private
//rary::private_function();
rary::indirect_access();
}
# Where library.rlib is the path to the compiled library, assumed that it's
# in the same directory here:
$ rustc executable.rs --extern rary=library.rlib --edition=2018 && ./executable
called rary's `public_function()`
called rary's `indirect_access()`, that
> called rary's `private_function()`
Cargo
cargo
is the official Rust package management tool. It has lots of really
useful features to improve code quality and developer velocity! These include
- Dependency management and integration with crates.io (the official Rust package registry)
- Awareness of unit tests
- Awareness of benchmarks
This chapter will go through some quick basics, but you can find the comprehensive docs in The Cargo Book.
Dependencies
Most programs have dependencies on some libraries. If you have ever managed
dependencies by hand, you know how much of a pain this can be. Luckily, the Rust
ecosystem comes standard with cargo
! cargo
can manage dependencies for a
project.
To create a new Rust project,
# A binary
cargo new foo
# OR A library
cargo new --lib foo
For the rest of this chapter, let's assume we are making a binary, rather than a library, but all of the concepts are the same.
After the above commands, you should see a file hierarchy like this:
foo
├── Cargo.toml
└── src
└── main.rs
The main.rs
is the root source file for your new project -- nothing new there.
The Cargo.toml
is the config file for cargo
for this project (foo
). If you
look inside it, you should see something like this:
[package]
name = "foo"
version = "0.1.0"
authors = ["mark"]
[dependencies]
The name
field under [package]
determines the name of the project. This is
used by crates.io
if you publish the crate (more later). It is also the name
of the output binary when you compile.
The version
field is a crate version number using Semantic
Versioning.
The authors
field is a list of authors used when publishing the crate.
The [dependencies]
section lets you add dependencies for your project.
For example, suppose that we want our program to have a great CLI. You can find
lots of great packages on crates.io (the official Rust
package registry). One popular choice is clap.
As of this writing, the most recent published version of clap
is 2.27.1
. To
add a dependency to our program, we can simply add the following to our
Cargo.toml
under [dependencies]
: clap = "2.27.1"
. And of course, extern crate clap
in main.rs
, just like normal. And that's it! You can start using
clap
in your program.
cargo
also supports other types of dependencies. Here is just
a small sampling:
[package]
name = "foo"
version = "0.1.0"
authors = ["mark"]
[dependencies]
clap = "2.27.1" # from crates.io
rand = { git = "https://github.com/rust-lang-nursery/rand" } # from online repo
bar = { path = "../bar" } # from a path in the local filesystem
cargo
is more than a dependency manager. All of the available
configuration options are listed in the format specification of
Cargo.toml
.
To build our project we can execute cargo build
anywhere in the project
directory (including subdirectories!). We can also do cargo run
to build and
run. Notice that these commands will resolve all dependencies, download crates
if needed, and build everything, including your crate. (Note that it only
rebuilds what it has not already built, similar to make
).
Voila! That's all there is to it!
Conventions
In the previous chapter, we saw the following directory hierarchy:
foo
├── Cargo.toml
└── src
└── main.rs
Suppose that we wanted to have two binaries in the same project, though. What then?
It turns out that cargo
supports this. The default binary name is main
, as
we saw before, but you can add additional binaries by placing them in a bin/
directory:
foo
├── Cargo.toml
└── src
├── main.rs
└── bin
└── my_other_bin.rs
To tell cargo
to compile or run this binary as opposed to the default or other
binaries, we just pass cargo
the --bin my_other_bin
flag, where my_other_bin
is the name of the binary we want to work with.
In addition to extra binaries, cargo
supports more features such as
benchmarks, tests, and examples.
In the next chapter, we will look more closely at tests.
Testing
As we know testing is integral to any piece of software! Rust has first-class support for unit and integration testing (see this chapter in TRPL).
From the testing chapters linked above, we see how to write unit tests and
integration tests. Organizationally, we can place unit tests in the modules they
test and integration tests in their own tests/
directory:
foo
├── Cargo.toml
├── src
│ └── main.rs
└── tests
├── my_test.rs
└── my_other_test.rs
Each file in tests
is a separate integration test.
cargo
naturally provides an easy way to run all of your tests!
$ cargo test
You should see output like this:
$ cargo test
Compiling blah v0.1.0 (file:///nobackup/blah)
Finished dev [unoptimized + debuginfo] target(s) in 0.89 secs
Running target/debug/deps/blah-d3b32b97275ec472
running 3 tests
test test_bar ... ok
test test_baz ... ok
test test_foo_bar ... ok
test test_foo ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
You can also run tests whose name matches a pattern:
$ cargo test test_foo
$ cargo test test_foo
Compiling blah v0.1.0 (file:///nobackup/blah)
Finished dev [unoptimized + debuginfo] target(s) in 0.35 secs
Running target/debug/deps/blah-d3b32b97275ec472
running 2 tests
test test_foo ... ok
test test_foo_bar ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out
One word of caution: Cargo may run multiple tests concurrently, so make sure that they don't race with each other. For example, if they all output to a file, you should make them write to different files.
Build Scripts
Sometimes a normal build from cargo
is not enough. Perhaps your crate needs
some pre-requisites before cargo
will successfully compile, things like code
generation, or some native code that needs to be compiled. To solve this problem
we have build scripts that Cargo can run.
To add a build script to your package it can either be specified in the
Cargo.toml
as follows:
[package]
...
build = "build.rs"
Otherwise Cargo will look for a build.rs
file in the project directory by
default.
How to use a build script
The build script is simply another Rust file that will be compiled and invoked prior to compiling anything else in the package. Hence it can be used to fulfill pre-requisites of your crate.
Cargo provides the script with inputs via environment variables specified here that can be used.
The script provides output via stdout. All lines printed are written to
target/debug/build/<pkg>/output
. Further, lines prefixed with cargo:
will be
interpreted by Cargo directly and hence can be used to define parameters for the
package's compilation.
For further specification and examples have a read of the Cargo specification.
Attributes
An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:
- conditional compilation of code
- set crate name, version and type (binary or library)
- disable lints (warnings)
- enable compiler features (macros, glob imports, etc.)
- link to a foreign library
- mark functions as unit tests
- mark functions that will be part of a benchmark
When attributes apply to a whole crate, their syntax is #![crate_attribute]
,
and when they apply to a module or item, the syntax is #[item_attribute]
(notice the missing bang !
).
Attributes can take arguments with different syntaxes:
#[attribute = "value"]
#[attribute(key = "value")]
#[attribute(value)]
Attributes can have multiple values and can be separated over multiple lines, too:
#[attribute(value, value2)]
#[attribute(value, value2, value3,
value4, value5)]
dead_code
The compiler provides a dead_code
lint that will warn
about unused functions. An attribute can be used to disable the lint.
Note that in real programs, you should eliminate dead code. In these examples we'll allow dead code in some places because of the interactive nature of the examples.
Crates
The crate_type
attribute can be used to tell the compiler whether a crate is
a binary or a library (and even which type of library), and the crate_name
attribute can be used to set the name of the crate.
However, it is important to note that both the crate_type
and crate_name
attributes have no effect whatsoever when using Cargo, the Rust package
manager. Since Cargo is used for the majority of Rust projects, this means
real-world uses of crate_type
and crate_name
are relatively limited.
When the crate_type
attribute is used, we no longer need to pass the
--crate-type
flag to rustc
.
$ rustc lib.rs
$ ls lib*
library.rlib
cfg
Configuration conditional checks are possible through two different operators:
- the
cfg
attribute:#[cfg(...)]
in attribute position - the
cfg!
macro:cfg!(...)
in boolean expressions
While the former enables conditional compilation, the latter conditionally
evaluates to true
or false
literals allowing for checks at run-time. Both
utilize identical argument syntax.
See also:
the reference, cfg!
, and macros.
Custom
Some conditionals like target_os
are implicitly provided by rustc
, but
custom conditionals must be passed to rustc
using the --cfg
flag.
Try to run this to see what happens without the custom cfg
flag.
With the custom cfg
flag:
$ rustc --cfg some_condition custom.rs && ./custom
condition met!
Generics
Generics is the topic of generalizing types and functionalities to broader cases. This is extremely useful for reducing code duplication in many ways, but can call for rather involving syntax. Namely, being generic requires taking great care to specify over which types a generic type is actually considered valid. The simplest and most common use of generics is for type parameters.
A type parameter is specified as generic by the use of angle brackets and upper
camel case: <Aaa, Bbb, ...>
. "Generic type parameters" are
typically represented as <T>
. In Rust, "generic" also describes anything that
accepts one or more generic type parameters <T>
. Any type specified as a
generic type parameter is generic, and everything else is concrete (non-generic).
For example, defining a generic function named foo
that takes an argument
T
of any type:
fn foo<T>(arg: T) { ... }
Because T
has been specified as a generic type parameter using <T>
, it
is considered generic when used here as (arg: T)
. This is the case even if T
has previously been defined as a struct
.
This example shows some of the syntax in action:
See also:
Functions
The same set of rules can be applied to functions: a type T
becomes
generic when preceded by <T>
.
Using generic functions sometimes requires explicitly specifying type parameters. This may be the case if the function is called where the return type is generic, or if the compiler doesn't have enough information to infer the necessary type parameters.
A function call with explicitly specified type parameters looks like:
fun::<A, B, ...>()
.
See also:
Implementation
Similar to functions, implementations require care to remain generic.
See also:
functions returning references, impl
, and struct
Traits
Of course trait
s can also be generic. Here we define one which reimplements
the Drop
trait
as a generic method to drop
itself and an input.
See also:
Bounds
When working with generics, the type parameters often must use traits as bounds to
stipulate what functionality a type implements. For example, the following
example uses the trait Display
to print and so it requires T
to be bound
by Display
; that is, T
must implement Display
.
// Define a function `printer` that takes a generic type `T` which
// must implement trait `Display`.
fn printer<T: Display>(t: T) {
println!("{}", t);
}
Bounding restricts the generic to types that conform to the bounds. That is:
struct S<T: Display>(T);
// Error! `Vec<T>` does not implement `Display`. This
// specialization will fail.
let s = S(vec![1]);
Another effect of bounding is that generic instances are allowed to access the methods of traits specified in the bounds. For example:
As an additional note, where
clauses can also be used to apply bounds in
some cases to be more expressive.
See also:
Testcase: empty bounds
A consequence of how bounds work is that even if a trait
doesn't
include any functionality, you can still use it as a bound. Eq
and
Copy
are examples of such trait
s from the std
library.
See also:
std::cmp::Eq
, std::marker::Copy
, and trait
s
Multiple bounds
Multiple bounds can be applied with a +
. Like normal, different types are
separated with ,
.
See also:
Where clauses
A bound can also be expressed using a where
clause immediately
before the opening {
, rather than at the type's first mention.
Additionally, where
clauses can apply bounds to arbitrary types,
rather than just to type parameters.
Some cases that a where
clause is useful:
- When specifying generic types and bounds separately is clearer:
impl <A: TraitB + TraitC, D: TraitE + TraitF> MyTrait<A, D> for YourType {}
// Expressing bounds with a `where` clause
impl <A, D> MyTrait<A, D> for YourType where
A: TraitB + TraitC,
D: TraitE + TraitF {}
- When using a
where
clause is more expressive than using normal syntax. Theimpl
in this example cannot be directly expressed without awhere
clause:
See also:
New Type Idiom
The newtype
idiom gives compile time guarantees that the right type of value is supplied
to a program.
For example, an age verification function that checks age in years, must be given
a value of type Years
.
Uncomment the last print statement to observe that the type supplied must be Years
.
To obtain the newtype
's value as the base type, you may use tuple syntax like so:
See also:
Associated items
"Associated Items" refers to a set of rules pertaining to item
s
of various types. It is an extension to trait
generics, and allows
trait
s to internally define new items.
One such item is called an associated type, providing simpler usage
patterns when the trait
is generic over its container type.
See also:
The Problem
A trait
that is generic over its container type has type specification
requirements - users of the trait
must specify all of its generic types.
In the example below, the Contains
trait
allows the use of the generic
types A
and B
. The trait is then implemented for the Container
type,
specifying i32
for A
and B
so that it can be used with fn difference()
.
Because Contains
is generic, we are forced to explicitly state all of the
generic types for fn difference()
. In practice, we want a way to express that
A
and B
are determined by the input C
. As you will see in the next
section, associated types provide exactly that capability.
See also:
Associated types
The use of "Associated types" improves the overall readability of code
by moving inner types locally into a trait as output types. Syntax
for the trait
definition is as follows:
Note that functions that use the trait
Contains
are no longer required
to express A
or B
at all:
// Without using associated types
fn difference<A, B, C>(container: &C) -> i32 where
C: Contains<A, B> { ... }
// Using associated types
fn difference<C: Contains>(container: &C) -> i32 { ... }
Let's rewrite the example from the previous section using associated types:
Phantom type parameters
A phantom type parameter is one that doesn't show up at runtime, but is checked statically (and only) at compile time.
Data types can use extra generic type parameters to act as markers or to perform type checking at compile time. These extra parameters hold no storage values, and have no runtime behavior.
In the following example, we combine std::marker::PhantomData with the phantom type parameter concept to create tuples containing different data types.
See also:
Derive, struct, and TupleStructs
Testcase: unit clarification
A useful method of unit conversions can be examined by implementing Add
with a phantom type parameter. The Add
trait
is examined below:
// This construction would impose: `Self + RHS = Output`
// where RHS defaults to Self if not specified in the implementation.
pub trait Add<RHS = Self> {
type Output;
fn add(self, rhs: RHS) -> Self::Output;
}
// `Output` must be `T<U>` so that `T<U> + T<U> = T<U>`.
impl<U> Add for T<U> {
type Output = T<U>;
...
}
The whole implementation:
See also:
Borrowing (&
), Bounds (X: Y
), enum, impl & self,
Overloading, ref, Traits (X for Y
), and TupleStructs.
Scoping rules
Scopes play an important part in ownership, borrowing, and lifetimes. That is, they indicate to the compiler when borrows are valid, when resources can be freed, and when variables are created or destroyed.
RAII
Variables in Rust do more than just hold data in the stack: they also own
resources, e.g. Box<T>
owns memory in the heap. Rust enforces RAII
(Resource Acquisition Is Initialization), so whenever an object goes out of
scope, its destructor is called and its owned resources are freed.
This behavior shields against resource leak bugs, so you'll never have to manually free memory or worry about memory leaks again! Here's a quick showcase:
Of course, we can double check for memory errors using valgrind
:
$ rustc raii.rs && valgrind ./raii
==26873== Memcheck, a memory error detector
==26873== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26873== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==26873== Command: ./raii
==26873==
==26873==
==26873== HEAP SUMMARY:
==26873== in use at exit: 0 bytes in 0 blocks
==26873== total heap usage: 1,013 allocs, 1,013 frees, 8,696 bytes allocated
==26873==
==26873== All heap blocks were freed -- no leaks are possible
==26873==
==26873== For counts of detected and suppressed errors, rerun with: -v
==26873== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
No leaks here!
Destructor
The notion of a destructor in Rust is provided through the Drop
trait. The
destructor is called when the resource goes out of scope. This trait is not
required to be implemented for every type, only implement it for your type if
you require its own destructor logic.
Run the below example to see how the Drop
trait works. When the variable in
the main
function goes out of scope the custom destructor will be invoked.
See also:
Ownership and moves
Because variables are in charge of freeing their own resources, resources can only have one owner. This also prevents resources from being freed more than once. Note that not all variables own resources (e.g. references).
When doing assignments (let x = y
) or passing function arguments by value
(foo(x)
), the ownership of the resources is transferred. In Rust-speak,
this is known as a move.
After moving resources, the previous owner can no longer be used. This avoids creating dangling pointers.
Mutability
Mutability of data can be changed when ownership is transferred.
Partial moves
Pattern bindings can have by-move
and by-reference
bindings at
the same time which is used in destructuring. Using these pattern
will result in partial move for the variable, which means that part
of the variable is moved while other parts stayed. In this case, the
parent variable cannot be used afterwards as a whole. However, parts
of it that are referenced and not moved can be used.
See also:
Borrowing
Most of the time, we'd like to access data without taking ownership over
it. To accomplish this, Rust uses a borrowing mechanism. Instead of
passing objects by value (T
), objects can be passed by reference (&T
).
The compiler statically guarantees (via its borrow checker) that references always point to valid objects. That is, while references to an object exist, the object cannot be destroyed.
Mutability
Mutable data can be mutably borrowed using &mut T
. This is called
a mutable reference and gives read/write access to the borrower.
In contrast, &T
borrows the data via an immutable reference, and
the borrower can read the data but not modify it:
See also:
Aliasing
Data can be immutably borrowed any number of times, but while immutably borrowed, the original data can't be mutably borrowed. On the other hand, only one mutable borrow is allowed at a time. The original data can be borrowed again only after the mutable reference has been used for the last time.
The ref pattern
When doing pattern matching or destructuring via the let
binding, the ref
keyword can be used to take references to the fields of a struct/tuple. The
example below shows a few instances where this can be useful:
Lifetimes
A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid. Specifically, a variable's lifetime begins when it is created and ends when it is destroyed. While lifetimes and scopes are often referred to together, they are not the same.
Take, for example, the case where we borrow a variable via &
. The
borrow has a lifetime that is determined by where it is declared. As a result,
the borrow is valid as long as it ends before the lender is destroyed. However,
the scope of the borrow is determined by where the reference is used.
In the following example and in the rest of this section, we will see how lifetimes relate to scopes, as well as how the two differ.
Note that no names or types are assigned to label lifetimes. This restricts how lifetimes will be able to be used as we will see.
Explicit annotation
The borrow checker uses explicit lifetime annotations to determine how long references should be valid. In cases where lifetimes are not elided1, Rust requires explicit annotations to determine what the lifetime of a reference should be. The syntax for explicitly annotating a lifetime uses an apostrophe character as follows:
foo<'a>
// `foo` has a lifetime parameter `'a`
Similar to closures, using lifetimes requires generics.
Additionally, this lifetime syntax indicates that the lifetime of foo
may not exceed that of 'a
. Explicit annotation of a type has the form
&'a T
where 'a
has already been introduced.
In cases with multiple lifetimes, the syntax is similar:
foo<'a, 'b>
// `foo` has lifetime parameters `'a` and `'b`
In this case, the lifetime of foo
cannot exceed that of either 'a
or 'b
.
See the following example for explicit lifetime annotation in use:
elision implicitly annotates lifetimes and so is different.
See also:
Functions
Ignoring elision, function signatures with lifetimes have a few constraints:
- any reference must have an annotated lifetime.
- any reference being returned must have the same lifetime as an input or
be
static
.
Additionally, note that returning references without input is banned if it would result in returning references to invalid data. The following example shows off some valid forms of functions with lifetimes:
See also:
Methods
Methods are annotated similarly to functions:
See also:
Structs
Annotation of lifetimes in structures are also similar to functions:
See also:
Traits
Annotation of lifetimes in trait methods basically are similar to functions.
Note that impl
may have annotation of lifetimes too.
See also:
Bounds
Just like generic types can be bounded, lifetimes (themselves generic)
use bounds as well. The :
character has a slightly different meaning here,
but +
is the same. Note how the following read:
T: 'a
: All references inT
must outlive lifetime'a
.T: Trait + 'a
: TypeT
must implement traitTrait
and all references inT
must outlive'a
.
The example below shows the above syntax in action used after keyword where
:
See also:
generics, bounds in generics, and multiple bounds in generics
Coercion
A longer lifetime can be coerced into a shorter one so that it works inside a scope it normally wouldn't work in. This comes in the form of inferred coercion by the Rust compiler, and also in the form of declaring a lifetime difference:
Static
Rust has a few reserved lifetime names. One of those is 'static
. You
might encounter it in two situations:
Both are related but subtly different and this is a common source for confusion when learning Rust. Here are some examples for each situation:
Reference lifetime
As a reference lifetime 'static
indicates that the data pointed to by
the reference lives for the entire lifetime of the running program.
It can still be coerced to a shorter lifetime.
There are two ways to make a variable with 'static
lifetime, and both
are stored in the read-only memory of the binary:
- Make a constant with the
static
declaration. - Make a
string
literal which has type:&'static str
.
See the following example for a display of each method:
Trait bound
As a trait bound, it means the type does not contain any non-static references. Eg. the receiver can hold on to the type for as long as they want and it will never become invalid until they drop it.
It's important to understand this means that any owned data always passes
a 'static
lifetime bound, but a reference to that owned data generally
does not:
The compiler will tell you:
error[E0597]: `i` does not live long enough
--> src/lib.rs:15:15
|
15 | print_it(&i);
| ---------^^--
| | |
| | borrowed value does not live long enough
| argument requires that `i` is borrowed for `'static`
16 | }
| - `i` dropped here while still borrowed
See also:
Elision
Some lifetime patterns are overwhelmingly common and so the borrow checker will allow you to omit them to save typing and to improve readability. This is known as elision. Elision exists in Rust solely because these patterns are common.
The following code shows a few examples of elision. For a more comprehensive description of elision, see lifetime elision in the book.
See also:
Traits
A trait
is a collection of methods defined for an unknown type:
Self
. They can access other methods declared in the same trait.
Traits can be implemented for any data type. In the example below,
we define Animal
, a group of methods. The Animal
trait
is
then implemented for the Sheep
data type, allowing the use of
methods from Animal
with a Sheep
.
Derive
The compiler is capable of providing basic implementations for some traits via
the #[derive]
attribute. These traits can still be
manually implemented if a more complex behavior is required.
The following is a list of derivable traits:
- Comparison traits:
Eq
,PartialEq
,Ord
,PartialOrd
. Clone
, to createT
from&T
via a copy.Copy
, to give a type 'copy semantics' instead of 'move semantics'.Hash
, to compute a hash from&T
.Default
, to create an empty instance of a data type.Debug
, to format a value using the{:?}
formatter.
See also:
Returning Traits with dyn
The Rust compiler needs to know how much space every function's return type requires. This means all your functions have to return a concrete type. Unlike other languages, if you have a trait like Animal
, you can't write a function that returns Animal
, because its different implementations will need different amounts of memory.
However, there's an easy workaround. Instead of returning a trait object directly, our functions return a Box
which contains some Animal
. A box
is just a reference to some memory in the heap. Because a reference has a statically-known size, and the compiler can guarantee it points to a heap-allocated Animal
, we can return a trait from our function!
Rust tries to be as explicit as possible whenever it allocates memory on the heap. So if your function returns a pointer-to-trait-on-heap in this way, you need to write the return type with the dyn
keyword, e.g. Box<dyn Animal>
.
Operator Overloading
In Rust, many of the operators can be overloaded via traits. That is, some operators can
be used to accomplish different tasks based on their input arguments. This is possible
because operators are syntactic sugar for method calls. For example, the +
operator in
a + b
calls the add
method (as in a.add(b)
). This add
method is part of the Add
trait. Hence, the +
operator can be used by any implementor of the Add
trait.
A list of the traits, such as Add
, that overload operators can be found in core::ops
.
See Also
Drop
The Drop
trait only has one method: drop
, which is called automatically
when an object goes out of scope. The main use of the Drop
trait is to free the
resources that the implementor instance owns.
Box
, Vec
, String
, File
, and Process
are some examples of types that
implement the Drop
trait to free resources. The Drop
trait can also be
manually implemented for any custom data type.
The following example adds a print to console to the drop
function to announce
when it is called.
Iterators
The Iterator
trait is used to implement iterators over collections such as arrays.
The trait requires only a method to be defined for the next
element,
which may be manually defined in an impl
block or automatically
defined (as in arrays and ranges).
As a point of convenience for common situations, the for
construct
turns some collections into iterators using the .into_iter()
method.
impl Trait
If your function returns a type that implements MyTrait
, you can write its
return type as -> impl MyTrait
. This can help simplify your type signatures quite a lot!
More importantly, some Rust types can't be written out. For example, every
closure has its own unnamed concrete type. Before impl Trait
syntax, you had
to allocate on the heap in order to return a closure. But now you can do it all
statically, like this:
You can also use impl Trait
to return an iterator that uses map
or filter
closures! This makes using map
and filter
easier. Because closure types don't
have names, you can't write out an explicit return type if your function returns
iterators with closures. But with impl Trait
you can do this easily:
Clone
When dealing with resources, the default behavior is to transfer them during assignments or function calls. However, sometimes we need to make a copy of the resource as well.
The Clone
trait helps us do exactly this. Most commonly, we can
use the .clone()
method defined by the Clone
trait.
Supertraits
Rust doesn't have "inheritance", but you can define a trait as being a superset of another trait. For example:
See also:
The Rust Programming Language chapter on supertraits
Disambiguating overlapping traits
A type can implement many different traits. What if two traits both require the same name? For example, many traits might have a method named get()
. They might even have different return types!
Good news: because each trait implementation gets its own impl
block, it's
clear which trait's get
method you're implementing.
What about when it comes time to call those methods? To disambiguate between them, we have to use Fully Qualified Syntax.
See also:
The Rust Programming Language chapter on Fully Qualified syntax
macro_rules!
Rust provides a powerful macro system that allows metaprogramming. As you've
seen in previous chapters, macros look like functions, except that their name
ends with a bang !
, but instead of generating a function call, macros are
expanded into source code that gets compiled with the rest of the program.
However, unlike macros in C and other languages, Rust macros are expanded into
abstract syntax trees, rather than string preprocessing, so you don't get
unexpected precedence bugs.
Macros are created using the macro_rules!
macro.
So why are macros useful?
-
Don't repeat yourself. There are many cases where you may need similar functionality in multiple places but with different types. Often, writing a macro is a useful way to avoid repeating code. (More on this later)
-
Domain-specific languages. Macros allow you to define special syntax for a specific purpose. (More on this later)
-
Variadic interfaces. Sometimes you want to define an interface that takes a variable number of arguments. An example is
println!
which could take any number of arguments, depending on the format string!. (More on this later)
Syntax
In following subsections, we will show how to define macros in Rust. There are three basic ideas:
Designators
The arguments of a macro are prefixed by a dollar sign $
and type annotated
with a designator:
These are some of the available designators:
block
expr
is used for expressionsident
is used for variable/function namesitem
literal
is used for literal constantspat
(pattern)path
stmt
(statement)tt
(token tree)ty
(type)vis
(visibility qualifier)
For a complete list, see the Rust Reference.
Overload
Macros can be overloaded to accept different combinations of arguments.
In that regard, macro_rules!
can work similarly to a match block:
Repeat
Macros can use +
in the argument list to indicate that an argument may
repeat at least once, or *
, to indicate that the argument may repeat zero or
more times.
In the following example, surrounding the matcher with $(...),+
will
match one or more expression, separated by commas.
Also note that the semicolon is optional on the last case.
DRY (Don't Repeat Yourself)
Macros allow writing DRY code by factoring out the common parts of functions
and/or test suites. Here is an example that implements and tests the +=
, *=
and -=
operators on Vec<T>
:
$ rustc --test dry.rs && ./dry
running 3 tests
test test::mul_assign ... ok
test test::add_assign ... ok
test test::sub_assign ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured
Domain Specific Languages (DSLs)
A DSL is a mini "language" embedded in a Rust macro. It is completely valid Rust because the macro system expands into normal Rust constructs, but it looks like a small language. This allows you to define concise or intuitive syntax for some special functionality (within bounds).
Suppose that I want to define a little calculator API. I would like to supply an expression and have the output printed to console.
Output:
1 + 2 = 3
(1 + 2) * (3 / 4) = 0
This was a very simple example, but much more complex interfaces have been
developed, such as lazy_static
or
clap
.
Also, note the two pairs of braces in the macro. The outer ones are
part of the syntax of macro_rules!
, in addition to ()
or []
.
Variadic Interfaces
A variadic interface takes an arbitrary number of arguments. For example,
println!
can take an arbitrary number of arguments, as determined by the
format string.
We can extend our calculate!
macro from the previous section to be variadic:
Output:
1 + 2 = 3
3 + 4 = 7
(2 * 3) + 1 = 7
Error handling
Error handling is the process of handling the possibility of failure. For example, failing to read a file and then continuing to use that bad input would clearly be problematic. Noticing and explicitly managing those errors saves the rest of the program from various pitfalls.
There are various ways to deal with errors in Rust, which are described in the following subchapters. They all have more or less subtle differences and different use cases. As a rule of thumb:
An explicit panic
is mainly useful for tests and dealing with unrecoverable errors.
For prototyping it can be useful, for example when dealing with functions that
haven't been implemented yet, but in those cases the more descriptive unimplemented
is better. In tests panic
is a reasonable way to explicitly fail.
The Option
type is for when a value is optional or when the lack of a value is
not an error condition. For example the parent of a directory - /
and C:
don't
have one. When dealing with Option
s, unwrap
is fine for prototyping and cases
where it's absolutely certain that there is guaranteed to be a value. However expect
is more useful since it lets you specify an error message in case something goes
wrong anyway.
When there is a chance that things do go wrong and the caller has to deal with the
problem, use Result
. You can unwrap
and expect
them as well (please don't
do that unless it's a test or quick prototype).
For a more rigorous discussion of error handling, refer to the error handling section in the official book.
panic
The simplest error handling mechanism we will see is panic
. It prints an
error message, starts unwinding the stack, and usually exits the program.
Here, we explicitly call panic
on our error condition:
Option
& unwrap
In the last example, we showed that we can induce program failure at will.
We told our program to panic
if the royal received an inappropriate
gift - a snake. But what if the royal expected a gift and didn't receive
one? That case would be just as bad, so it needs to be handled!
We could test this against the null string (""
) as we do with a snake.
Since we're using Rust, let's instead have the compiler point out cases
where there's no gift.
An enum
called Option<T>
in the std
library is used when absence is a
possibility. It manifests itself as one of two "options":
Some(T)
: An element of typeT
was foundNone
: No element was found
These cases can either be explicitly handled via match
or implicitly with
unwrap
. Implicit handling will either return the inner element or panic
.
Note that it's possible to manually customize panic
with expect,
but unwrap
otherwise leaves us with a less meaningful output than explicit
handling. In the following example, explicit handling yields a more
controlled result while retaining the option to panic
if desired.
Unpacking options with ?
You can unpack Option
s by using match
statements, but it's often easier to
use the ?
operator. If x
is an Option
, then evaluating x?
will return
the underlying value if x
is Some
, otherwise it will terminate whatever
function is being executed and return None
.
You can chain many ?
s together to make your code much more readable.
Combinators: map
match
is a valid method for handling Option
s. However, you may
eventually find heavy usage tedious, especially with operations only valid
with an input. In these cases, combinators can be used to
manage control flow in a modular fashion.
Option
has a built in method called map()
, a combinator for the simple
mapping of Some -> Some
and None -> None
. Multiple map()
calls can be
chained together for even more flexibility.
In the following example, process()
replaces all functions previous
to it while staying compact.
See also:
closures, Option
, Option::map()
Combinators: and_then
map()
was described as a chainable way to simplify match
statements.
However, using map()
on a function that returns an Option<T>
results
in the nested Option<Option<T>>
. Chaining multiple calls together can
then become confusing. That's where another combinator called and_then()
,
known in some languages as flatmap, comes in.
and_then()
calls its function input with the wrapped value and returns the result. If the Option
is None
, then it returns None
instead.
In the following example, cookable_v2()
results in an Option<Food>
.
Using map()
instead of and_then()
would have given an
Option<Option<Food>>
, which is an invalid type for eat()
.
See also:
closures, Option
, and Option::and_then()
Result
Result
is a richer version of the Option
type that
describes possible error instead of possible absence.
That is, Result<T, E>
could have one of two outcomes:
Ok(T)
: An elementT
was foundErr(E)
: An error was found with elementE
By convention, the expected outcome is Ok
while the unexpected outcome is Err
.
Like Option
, Result
has many methods associated with it. unwrap()
, for
example, either yields the element T
or panic
s. For case handling,
there are many combinators between Result
and Option
that overlap.
In working with Rust, you will likely encounter methods that return the
Result
type, such as the parse()
method. It might not always
be possible to parse a string into the other type, so parse()
returns a
Result
indicating possible failure.
Let's see what happens when we successfully and unsuccessfully parse()
a string:
In the unsuccessful case, parse()
leaves us with an error for unwrap()
to panic
on. Additionally, the panic
exits our program and provides an
unpleasant error message.
To improve the quality of our error message, we should be more specific about the return type and consider explicitly handling the error.
Using Result
in main
The Result
type can also be the return type of the main
function if
specified explicitly. Typically the main
function will be of the form:
fn main() {
println!("Hello World!");
}
However main
is also able to have a return type of Result
. If an error
occurs within the main
function it will return an error code and print a debug
representation of the error (using the Debug
trait). The following example
shows such a scenario and touches on aspects covered in the following section.
map
for Result
Panicking in the previous example's multiply
does not make for robust code.
Generally, we want to return the error to the caller so it can decide what is
the right way to respond to errors.
We first need to know what kind of error type we are dealing with. To determine
the Err
type, we look to parse()
, which is implemented with the
FromStr
trait for i32
. As a result, the Err
type is
specified as ParseIntError
.
In the example below, the straightforward match
statement leads to code
that is overall more cumbersome.
Luckily, Option
's map
, and_then
, and many other combinators are also
implemented for Result
. Result
contains a complete listing.
aliases for Result
How about when we want to reuse a specific Result
type many times?
Recall that Rust allows us to create aliases. Conveniently,
we can define one for the specific Result
in question.
At a module level, creating aliases can be particularly helpful. Errors
found in a specific module often have the same Err
type, so a single alias
can succinctly define all associated Results
. This is so useful that the
std
library even supplies one: io::Result
!
Here's a quick example to show off the syntax:
See also:
Early returns
In the previous example, we explicitly handled the errors using combinators.
Another way to deal with this case analysis is to use a combination of
match
statements and early returns.
That is, we can simply stop executing the function and return the error if one occurs. For some, this form of code can be easier to both read and write. Consider this version of the previous example, rewritten using early returns:
At this point, we've learned to explicitly handle errors using combinators and early returns. While we generally want to avoid panicking, explicitly handling all of our errors is cumbersome.
In the next section, we'll introduce ?
for the cases where we simply
need to unwrap
without possibly inducing panic
.
Introducing ?
Sometimes we just want the simplicity of unwrap
without the possibility of
a panic
. Until now, unwrap
has forced us to nest deeper and deeper when
what we really wanted was to get the variable out. This is exactly the purpose of ?
.
Upon finding an Err
, there are two valid actions to take:
panic!
which we already decided to try to avoid if possiblereturn
because anErr
means it cannot be handled
?
is almost1 exactly equivalent to an unwrap
which return
s
instead of panic
king on Err
s. Let's see how we can simplify the earlier
example that used combinators:
The try!
macro
Before there was ?
, the same functionality was achieved with the try!
macro.
The ?
operator is now recommended, but you may still find try!
when looking
at older code. The same multiply
function from the previous example
would look like this using try!
:
See re-enter ? for more details.
Multiple error types
The previous examples have always been very convenient; Result
s interact
with other Result
s and Option
s interact with other Option
s.
Sometimes an Option
needs to interact with a Result
, or a
Result<T, Error1>
needs to interact with a Result<T, Error2>
. In those
cases, we want to manage our different error types in a way that makes them
composable and easy to interact with.
In the following code, two instances of unwrap
generate different error
types. Vec::first
returns an Option
, while parse::<i32>
returns a
Result<i32, ParseIntError>
:
Over the next sections, we'll see several strategies for handling these kind of problems.
Pulling Result
s out of Option
s
The most basic way of handling mixed error types is to just embed them in each other.
There are times when we'll want to stop processing on errors (like with
?
) but keep going when the Option
is None
. A
couple of combinators come in handy to swap the Result
and Option
.
Defining an error type
Sometimes it simplifies the code to mask all of the different errors with a single type of error. We'll show this with a custom error.
Rust allows us to define our own error types. In general, a "good" error type:
- Represents different errors with the same type
- Presents nice error messages to the user
- Is easy to compare with other types
- Good:
Err(EmptyVec)
- Bad:
Err("Please use a vector with at least one element".to_owned())
- Good:
- Can hold information about the error
- Good:
Err(BadChar(c, position))
- Bad:
Err("+ cannot be used here".to_owned())
- Good:
- Composes well with other errors
Box
ing errors
A way to write simple code while preserving the original errors is to Box
them. The drawback is that the underlying error type is only known at runtime and not
statically determined.
The stdlib helps in boxing our errors by having Box
implement conversion from
any type that implements the Error
trait into the trait object Box<Error>
,
via From
.
See also:
Dynamic dispatch and Error
trait
Other uses of ?
Notice in the previous example that our immediate reaction to calling
parse
is to map
the error from a library error into a boxed
error:
.and_then(|s| s.parse::<i32>()
.map_err(|e| e.into())
Since this is a simple and common operation, it would be convenient if it
could be elided. Alas, because and_then
is not sufficiently flexible, it
cannot. However, we can instead use ?
.
?
was previously explained as either unwrap
or return Err(err)
.
This is only mostly true. It actually means unwrap
or
return Err(From::from(err))
. Since From::from
is a conversion utility
between different types, this means that if you ?
where the error is
convertible to the return type, it will convert automatically.
Here, we rewrite the previous example using ?
. As a result, the
map_err
will go away when From::from
is implemented for our error type:
This is actually fairly clean now. Compared with the original panic
, it
is very similar to replacing the unwrap
calls with ?
except that the
return types are Result
. As a result, they must be destructured at the
top level.
See also:
From::from
and ?
Wrapping errors
An alternative to boxing errors is to wrap them in your own error type.
This adds a bit more boilerplate for handling errors and might not be needed in all applications. There are some libraries that can take care of the boilerplate for you.
See also:
From::from
and Enums
Iterating over Result
s
An Iter::map
operation might fail, for example:
Let's step through strategies for handling this.
Ignore the failed items with filter_map()
filter_map
calls a function and filters out the results that are None
.
Fail the entire operation with collect()
Result
implements FromIter
so that a vector of results (Vec<Result<T, E>>
)
can be turned into a result with a vector (Result<Vec<T>, E>
). Once an
Result::Err
is found, the iteration will terminate.
This same technique can be used with Option
.
Collect all valid values and failures with partition()
When you look at the results, you'll note that everything is still wrapped in
Result
. A little more boilerplate is needed for this.
Std library types
The std
library provides many custom types which expands drastically on
the primitives
. Some of these include:
- growable
String
s like:"hello world"
- growable vectors:
[1, 2, 3]
- optional types:
Option<i32>
- error handling types:
Result<i32, i32>
- heap allocated pointers:
Box<i32>
See also:
primitives and the std library
Box, stack and heap
All values in Rust are stack allocated by default. Values can be boxed
(allocated on the heap) by creating a Box<T>
. A box is a smart pointer to a
heap allocated value of type T
. When a box goes out of scope, its destructor
is called, the inner object is destroyed, and the memory on the heap is freed.
Boxed values can be dereferenced using the *
operator; this removes one layer
of indirection.
Vectors
Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time. A vector is represented using 3 parameters:
- pointer to the data
- length
- capacity
The capacity indicates how much memory is reserved for the vector. The vector can grow as long as the length is smaller than the capacity. When this threshold needs to be surpassed, the vector is reallocated with a larger capacity.
More Vec
methods can be found under the
std::vec module
Strings
There are two types of strings in Rust: String
and &str
.
A String
is stored as a vector of bytes (Vec<u8>
), but guaranteed to
always be a valid UTF-8 sequence. String
is heap allocated, growable and not
null terminated.
&str
is a slice (&[u8]
) that always points to a valid UTF-8 sequence, and
can be used to view into a String
, just like &[T]
is a view into Vec<T>
.
More str
/String
methods can be found under the
std::str and
std::string
modules
Literals and escapes
There are multiple ways to write string literals with special characters in them.
All result in a similar &str
so it's best to use the form that is the most
convenient to write. Similarly there are multiple ways to write byte string literals,
which all result in &[u8; N]
.
Generally special characters are escaped with a backslash character: \
.
This way you can add any character to your string, even unprintable ones
and ones that you don't know how to type. If you want a literal backslash,
escape it with another one: \\
String or character literal delimiters occuring within a literal must be escaped: "\""
, '\''
.
Sometimes there are just too many characters that need to be escaped or it's just much more convenient to write a string out as-is. This is where raw string literals come into play.
Want a string that's not UTF-8? (Remember, str
and String
must be valid UTF-8).
Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!
For conversions between character encodings check out the encoding crate.
A more detailed listing of the ways to write string literals and escape characters is given in the 'Tokens' chapter of the Rust Reference.
Option
Sometimes it's desirable to catch the failure of some parts of a program
instead of calling panic!
; this can be accomplished using the Option
enum.
The Option<T>
enum has two variants:
None
, to indicate failure or lack of value, andSome(value)
, a tuple struct that wraps avalue
with typeT
.
Result
We've seen that the Option
enum can be used as a return value from functions
that may fail, where None
can be returned to indicate failure. However,
sometimes it is important to express why an operation failed. To do this we
have the Result
enum.
The Result<T, E>
enum has two variants:
Ok(value)
which indicates that the operation succeeded, and wraps thevalue
returned by the operation. (value
has typeT
)Err(why)
, which indicates that the operation failed, and wrapswhy
, which (hopefully) explains the cause of the failure. (why
has typeE
)
?
Chaining results using match can get pretty untidy; luckily, the ?
operator
can be used to make things pretty again. ?
is used at the end of an expression
returning a Result
, and is equivalent to a match expression, where the
Err(err)
branch expands to an early Err(From::from(err))
, and the Ok(ok)
branch expands to an ok
expression.
Be sure to check the documentation,
as there are many methods to map/compose Result
.
panic!
The panic!
macro can be used to generate a panic and start unwinding
its stack. While unwinding, the runtime will take care of freeing all the
resources owned by the thread by calling the destructor of all its objects.
Since we are dealing with programs with only one thread, panic!
will cause the
program to report the panic message and exit.
Let's check that panic!
doesn't leak memory.
$ rustc panic.rs && valgrind ./panic
==4401== Memcheck, a memory error detector
==4401== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4401== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==4401== Command: ./panic
==4401==
thread '<main>' panicked at 'division by zero', panic.rs:5
==4401==
==4401== HEAP SUMMARY:
==4401== in use at exit: 0 bytes in 0 blocks
==4401== total heap usage: 18 allocs, 18 frees, 1,648 bytes allocated
==4401==
==4401== All heap blocks were freed -- no leaks are possible
==4401==
==4401== For counts of detected and suppressed errors, rerun with: -v
==4401== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
HashMap
Where vectors store values by an integer index, HashMap
s store values by key.
HashMap
keys can be booleans, integers, strings,
or any other type that implements the Eq
and Hash
traits.
More on this in the next section.
Like vectors, HashMap
s are growable, but HashMaps can also shrink themselves
when they have excess space.
You can create a HashMap with a certain starting capacity using
HashMap::with_capacity(uint)
, or use HashMap::new()
to get a HashMap
with a default initial capacity (recommended).
For more information on how hashing and hash maps (sometimes called hash tables) work, have a look at Hash Table Wikipedia
Alternate/custom key types
Any type that implements the Eq
and Hash
traits can be a key in HashMap
.
This includes:
bool
(though not very useful since there is only two possible keys)int
,uint
, and all variations thereofString
and&str
(protip: you can have aHashMap
keyed byString
and call.get()
with an&str
)
Note that f32
and f64
do not implement Hash
,
likely because floating-point precision errors
would make using them as hashmap keys horribly error-prone.
All collection classes implement Eq
and Hash
if their contained type also respectively implements Eq
and Hash
.
For example, Vec<T>
will implement Hash
if T
implements Hash
.
You can easily implement Eq
and Hash
for a custom type with just one line:
#[derive(PartialEq, Eq, Hash)]
The compiler will do the rest. If you want more control over the details,
you can implement Eq
and/or Hash
yourself.
This guide will not cover the specifics of implementing Hash
.
To play around with using a struct
in HashMap
,
let's try making a very simple user logon system:
HashSet
Consider a HashSet
as a HashMap
where we just care about the keys (
HashSet<T>
is, in actuality, just a wrapper around HashMap<T, ()>
).
"What's the point of that?" you ask. "I could just store the keys in a Vec
."
A HashSet
's unique feature is that
it is guaranteed to not have duplicate elements.
That's the contract that any set collection fulfills.
HashSet
is just one implementation. (see also: BTreeSet
)
If you insert a value that is already present in the HashSet
,
(i.e. the new value is equal to the existing and they both have the same hash),
then the new value will replace the old.
This is great for when you never want more than one of something, or when you want to know if you've already got something.
But sets can do more than that.
Sets have 4 primary operations (all of the following calls return an iterator):
-
union
: get all the unique elements in both sets. -
difference
: get all the elements that are in the first set but not the second. -
intersection
: get all the elements that are only in both sets. -
symmetric_difference
: get all the elements that are in one set or the other, but not both.
Try all of these in the following example:
(Examples are adapted from the documentation.)
Rc
When multiple ownership is needed, Rc
(Reference Counting) can be used. Rc
keeps track of the number of the references which means the number of owners of the value wrapped inside an Rc
.
Reference count of an Rc
increases by 1 whenever an Rc
is cloned, and decreases by 1 whenever one cloned Rc
is dropped out of the scope. When an Rc
's reference count becomes zero, which means there are no owners remained, both the Rc
and the value are all dropped.
Cloning an Rc
never performs a deep copy. Cloning creates just another pointer to the wrapped value, and increments the count.
See also:
std::rc and std::sync::arc.
Arc
When shared ownership between threads is needed, Arc
(Atomic Reference Counted) can be used. This struct, via the Clone
implementation can create a reference pointer for the location of a value in the memory heap while increasing the reference counter. As it shares ownership between threads, when the last reference pointer to a value is out of scope, the variable is dropped.
Std misc
Many other types are provided by the std library to support things such as:
- Threads
- Channels
- File I/O
These expand beyond what the primitives provide.
See also:
primitives and the std library
Threads
Rust provides a mechanism for spawning native OS threads via the spawn
function, the argument of this function is a moving closure.
These threads will be scheduled by the OS.
Testcase: map-reduce
Rust makes it very easy to parallelise data processing, without many of the headaches traditionally associated with such an attempt.
The standard library provides great threading primitives out of the box. These, combined with Rust's concept of Ownership and aliasing rules, automatically prevent data races.
The aliasing rules (one writable reference XOR many readable references) automatically prevent
you from manipulating state that is visible to other threads. (Where synchronisation is needed,
there are synchronisation
primitives like Mutex
es or Channel
s.)
In this example, we will calculate the sum of all digits in a block of numbers. We will do this by parcelling out chunks of the block into different threads. Each thread will sum its tiny block of digits, and subsequently we will sum the intermediate sums produced by each thread.
Note that, although we're passing references across thread boundaries, Rust understands that we're
only passing read-only references, and that thus no unsafety or data races can occur. Because
we're move
-ing the data segments into the thread, Rust will also ensure the data is kept alive
until the threads exit, so no dangling pointers occur.
Assignments
It is not wise to let our number of threads depend on user inputted data. What if the user decides to insert a lot of spaces? Do we really want to spawn 2,000 threads? Modify the program so that the data is always chunked into a limited number of chunks, defined by a static constant at the beginning of the program.
See also:
- Threads
- vectors and iterators
- closures, move semantics and
move
closures - destructuring assignments
- turbofish notation to help type inference
- unwrap vs. expect
- enumerate
Channels
Rust provides asynchronous channels
for communication between threads. Channels
allow a unidirectional flow of information between two end-points: the
Sender
and the Receiver
.
Path
The Path
struct represents file paths in the underlying filesystem. There are
two flavors of Path
: posix::Path
, for UNIX-like systems, and
windows::Path
, for Windows. The prelude exports the appropriate
platform-specific Path
variant.
A Path
can be created from an OsStr
, and provides several methods to get
information from the file/directory the path points to.
Note that a Path
is not internally represented as an UTF-8 string, but
instead is stored as a vector of bytes (Vec<u8>
). Therefore, converting a
Path
to a &str
is not free and may fail (an Option
is returned).
Be sure to check at other Path
methods (posix::Path
or windows::Path
) and
the Metadata
struct.
See also:
File I/O
The File
struct represents a file that has been opened (it wraps a file
descriptor), and gives read and/or write access to the underlying file.
Since many things can go wrong when doing file I/O, all the File
methods
return the io::Result<T>
type, which is an alias for Result<T, io::Error>
.
This makes the failure of all I/O operations explicit. Thanks to this, the programmer can see all the failure paths, and is encouraged to handle them in a proactive manner.
open
The open
static method can be used to open a file in read-only mode.
A File
owns a resource, the file descriptor and takes care of closing the
file when it is drop
ed.
Here's the expected successful output:
$ echo "Hello World!" > hello.txt
$ rustc open.rs && ./open
hello.txt contains:
Hello World!
(You are encouraged to test the previous example under different failure
conditions: hello.txt
doesn't exist, or hello.txt
is not readable,
etc.)
create
The create
static method opens a file in write-only mode. If the file
already existed, the old content is destroyed. Otherwise, a new file is
created.
static LOREM_IPSUM: &str =
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
";
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn main() {
let path = Path::new("lorem_ipsum.txt");
let display = path.display();
// Open a file in write-only mode, returns `io::Result<File>`
let mut file = match File::create(&path) {
Err(why) => panic!("couldn't create {}: {}", display, why),
Ok(file) => file,
};
// Write the `LOREM_IPSUM` string to `file`, returns `io::Result<()>`
match file.write_all(LOREM_IPSUM.as_bytes()) {
Err(why) => panic!("couldn't write to {}: {}", display, why),
Ok(_) => println!("successfully wrote to {}", display),
}
}
Here's the expected successful output:
$ rustc create.rs && ./create
successfully wrote to lorem_ipsum.txt
$ cat lorem_ipsum.txt
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
(As in the previous example, you are encouraged to test this example under failure conditions.)
There is OpenOptions
struct that can be used to configure how a file is opened.
read_lines
The method lines()
returns an iterator over the lines
of a file.
File::open
expects a generic, AsRef<Path>
. That's what
read_lines()
expects as input.
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./hosts") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(ip) = line {
println!("{}", ip);
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Running this program simply prints the lines individually.
$ echo -e "127.0.0.1\n192.168.0.1\n" > hosts
$ rustc read_lines.rs && ./read_lines
127.0.0.1
192.168.0.1
This process is more efficient than creating a String
in memory
especially working with larger files.
Child processes
The process::Output
struct represents the output of a finished child process,
and the process::Command
struct is a process builder.
(You are encouraged to try the previous example with an incorrect flag passed
to rustc
)
Pipes
The std::Child
struct represents a running child process, and exposes the
stdin
, stdout
and stderr
handles for interaction with the underlying
process via pipes.
use std::io::prelude::*;
use std::process::{Command, Stdio};
static PANGRAM: &'static str =
"the quick brown fox jumped over the lazy dog\n";
fn main() {
// Spawn the `wc` command
let process = match Command::new("wc")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn() {
Err(why) => panic!("couldn't spawn wc: {}", why),
Ok(process) => process,
};
// Write a string to the `stdin` of `wc`.
//
// `stdin` has type `Option<ChildStdin>`, but since we know this instance
// must have one, we can directly `unwrap` it.
match process.stdin.unwrap().write_all(PANGRAM.as_bytes()) {
Err(why) => panic!("couldn't write to wc stdin: {}", why),
Ok(_) => println!("sent pangram to wc"),
}
// Because `stdin` does not live after the above calls, it is `drop`ed,
// and the pipe is closed.
//
// This is very important, otherwise `wc` wouldn't start processing the
// input we just sent.
// The `stdout` field also has type `Option<ChildStdout>` so must be unwrapped.
let mut s = String::new();
match process.stdout.unwrap().read_to_string(&mut s) {
Err(why) => panic!("couldn't read wc stdout: {}", why),
Ok(_) => print!("wc responded with:\n{}", s),
}
}
Wait
If you'd like to wait for a process::Child
to finish, you must call
Child::wait
, which will return a process::ExitStatus
.
use std::process::Command;
fn main() {
let mut child = Command::new("sleep").arg("5").spawn().unwrap();
let _result = child.wait().unwrap();
println!("reached end of main");
}
$ rustc wait.rs && ./wait
# `wait` keeps running for 5 seconds until the `sleep 5` command finishes
reached end of main
Filesystem Operations
The std::fs
module contains several functions that deal with the filesystem.
use std::fs;
use std::fs::{File, OpenOptions};
use std::io;
use std::io::prelude::*;
use std::os::unix;
use std::path::Path;
// A simple implementation of `% cat path`
fn cat(path: &Path) -> io::Result<String> {
let mut f = File::open(path)?;
let mut s = String::new();
match f.read_to_string(&mut s) {
Ok(_) => Ok(s),
Err(e) => Err(e),
}
}
// A simple implementation of `% echo s > path`
fn echo(s: &str, path: &Path) -> io::Result<()> {
let mut f = File::create(path)?;
f.write_all(s.as_bytes())
}
// A simple implementation of `% touch path` (ignores existing files)
fn touch(path: &Path) -> io::Result<()> {
match OpenOptions::new().create(true).write(true).open(path) {
Ok(_) => Ok(()),
Err(e) => Err(e),
}
}
fn main() {
println!("`mkdir a`");
// Create a directory, returns `io::Result<()>`
match fs::create_dir("a") {
Err(why) => println!("! {:?}", why.kind()),
Ok(_) => {},
}
println!("`echo hello > a/b.txt`");
// The previous match can be simplified using the `unwrap_or_else` method
echo("hello", &Path::new("a/b.txt")).unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
println!("`mkdir -p a/c/d`");
// Recursively create a directory, returns `io::Result<()>`
fs::create_dir_all("a/c/d").unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
println!("`touch a/c/e.txt`");
touch(&Path::new("a/c/e.txt")).unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
println!("`ln -s ../b.txt a/c/b.txt`");
// Create a symbolic link, returns `io::Result<()>`
if cfg!(target_family = "unix") {
unix::fs::symlink("../b.txt", "a/c/b.txt").unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
}
println!("`cat a/c/b.txt`");
match cat(&Path::new("a/c/b.txt")) {
Err(why) => println!("! {:?}", why.kind()),
Ok(s) => println!("> {}", s),
}
println!("`ls a`");
// Read the contents of a directory, returns `io::Result<Vec<Path>>`
match fs::read_dir("a") {
Err(why) => println!("! {:?}", why.kind()),
Ok(paths) => for path in paths {
println!("> {:?}", path.unwrap().path());
},
}
println!("`rm a/c/e.txt`");
// Remove a file, returns `io::Result<()>`
fs::remove_file("a/c/e.txt").unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
println!("`rmdir a/c/d`");
// Remove an empty directory, returns `io::Result<()>`
fs::remove_dir("a/c/d").unwrap_or_else(|why| {
println!("! {:?}", why.kind());
});
}
Here's the expected successful output:
$ rustc fs.rs && ./fs
`mkdir a`
`echo hello > a/b.txt`
`mkdir -p a/c/d`
`touch a/c/e.txt`
`ln -s ../b.txt a/c/b.txt`
`cat a/c/b.txt`
> hello
`ls a`
> "a/b.txt"
> "a/c"
`rm a/c/e.txt`
`rmdir a/c/d`
And the final state of the a
directory is:
$ tree a
a
|-- b.txt
`-- c
`-- b.txt -> ../b.txt
1 directory, 2 files
An alternative way to define the function cat
is with ?
notation:
fn cat(path: &Path) -> io::Result<String> {
let mut f = File::open(path)?;
let mut s = String::new();
f.read_to_string(&mut s)?;
Ok(s)
}
See also:
Program arguments
Standard Library
The command line arguments can be accessed using std::env::args
, which
returns an iterator that yields a String
for each argument:
$ ./args 1 2 3
My path is ./args.
I got 3 arguments: ["1", "2", "3"].
Crates
Alternatively, there are numerous crates that can provide extra functionality
when creating command-line applications. The Rust Cookbook exhibits best
practices on how to use one of the more popular command line argument crates,
clap
.
Argument parsing
Matching can be used to parse simple arguments:
$ ./match_args Rust
This is not the answer.
$ ./match_args 42
This is the answer!
$ ./match_args do something
error: second argument not an integer
usage:
match_args <string>
Check whether given string is the answer.
match_args {increase|decrease} <integer>
Increase or decrease given integer by one.
$ ./match_args do 42
error: invalid command
usage:
match_args <string>
Check whether given string is the answer.
match_args {increase|decrease} <integer>
Increase or decrease given integer by one.
$ ./match_args increase 42
43
Foreign Function Interface
Rust provides a Foreign Function Interface (FFI) to C libraries. Foreign
functions must be declared inside an extern
block annotated with a #[link]
attribute containing the name of the foreign library.
use std::fmt;
// this extern block links to the libm library
#[link(name = "m")]
extern {
// this is a foreign function
// that computes the square root of a single precision complex number
fn csqrtf(z: Complex) -> Complex;
fn ccosf(z: Complex) -> Complex;
}
// Since calling foreign functions is considered unsafe,
// it's common to write safe wrappers around them.
fn cos(z: Complex) -> Complex {
unsafe { ccosf(z) }
}
fn main() {
// z = -1 + 0i
let z = Complex { re: -1., im: 0. };
// calling a foreign function is an unsafe operation
let z_sqrt = unsafe { csqrtf(z) };
println!("the square root of {:?} is {:?}", z, z_sqrt);
// calling safe API wrapped around unsafe operation
println!("cos({:?}) = {:?}", z, cos(z));
}
// Minimal implementation of single precision complex numbers
#[repr(C)]
#[derive(Clone, Copy)]
struct Complex {
re: f32,
im: f32,
}
impl fmt::Debug for Complex {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
if self.im < 0. {
write!(f, "{}-{}i", self.re, -self.im)
} else {
write!(f, "{}+{}i", self.re, self.im)
}
}
}
Testing
Rust is a programming language that cares a lot about correctness and it includes support for writing software tests within the language itself.
Testing comes in three styles:
- Unit testing.
- Doc testing.
- Integration testing.
Also Rust has support for specifying additional dependencies for tests:
See Also
- The Book chapter on testing
- API Guidelines on doc-testing
Unit testing
Tests are Rust functions that verify that the non-test code is functioning in the expected manner. The bodies of test functions typically perform some setup, run the code we want to test, then assert whether the results are what we expect.
Most unit tests go into a tests
mod with the #[cfg(test)]
attribute.
Test functions are marked with the #[test]
attribute.
Tests fail when something in the test function panics. There are some helper macros:
assert!(expression)
- panics if expression evaluates tofalse
.assert_eq!(left, right)
andassert_ne!(left, right)
- testing left and right expressions for equality and inequality respectively.
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
// This is a really bad adding function, its purpose is to fail in this
// example.
#[allow(dead_code)]
fn bad_add(a: i32, b: i32) -> i32 {
a - b
}
#[cfg(test)]
mod tests {
// Note this useful idiom: importing names from outer (for mod tests) scope.
use super::*;
#[test]
fn test_add() {
assert_eq!(add(1, 2), 3);
}
#[test]
fn test_bad_add() {
// This assert would fire and test will fail.
// Please note, that private functions can be tested too!
assert_eq!(bad_add(1, 2), 3);
}
}
Tests can be run with cargo test
.
$ cargo test
running 2 tests
test tests::test_bad_add ... FAILED
test tests::test_add ... ok
failures:
---- tests::test_bad_add stdout ----
thread 'tests::test_bad_add' panicked at 'assertion failed: `(left == right)`
left: `-1`,
right: `3`', src/lib.rs:21:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
failures:
tests::test_bad_add
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
Tests and ?
None of the previous unit test examples had a return type. But in Rust 2018,
your unit tests can return Result<()>
, which lets you use ?
in them! This
can make them much more concise.
See "The Edition Guide" for more details.
Testing panics
To check functions that should panic under certain circumstances, use attribute
#[should_panic]
. This attribute accepts optional parameter expected =
with
the text of the panic message. If your function can panic in multiple ways, it helps
make sure your test is testing the correct panic.
pub fn divide_non_zero_result(a: u32, b: u32) -> u32 {
if b == 0 {
panic!("Divide-by-zero error");
} else if a < b {
panic!("Divide result is zero");
}
a / b
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_divide() {
assert_eq!(divide_non_zero_result(10, 2), 5);
}
#[test]
#[should_panic]
fn test_any_panic() {
divide_non_zero_result(1, 0);
}
#[test]
#[should_panic(expected = "Divide result is zero")]
fn test_specific_panic() {
divide_non_zero_result(1, 10);
}
}
Running these tests gives us:
$ cargo test
running 3 tests
test tests::test_any_panic ... ok
test tests::test_divide ... ok
test tests::test_specific_panic ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests tmp-test-should-panic
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running specific tests
To run specific tests one may specify the test name to cargo test
command.
$ cargo test test_any_panic
running 1 test
test tests::test_any_panic ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out
Doc-tests tmp-test-should-panic
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
To run multiple tests one may specify part of a test name that matches all the tests that should be run.
$ cargo test panic
running 2 tests
test tests::test_any_panic ... ok
test tests::test_specific_panic ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out
Doc-tests tmp-test-should-panic
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Ignoring tests
Tests can be marked with the #[ignore]
attribute to exclude some tests. Or to run
them with command cargo test -- --ignored
$ cargo test
running 3 tests
test tests::ignored_test ... ignored
test tests::test_add ... ok
test tests::test_add_hundred ... ok
test result: ok. 2 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out
Doc-tests tmp-ignore
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
$ cargo test -- --ignored
running 1 test
test tests::ignored_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests tmp-ignore
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Documentation testing
The primary way of documenting a Rust project is through annotating the source code. Documentation comments are written in markdown and support code blocks in them. Rust takes care about correctness, so these code blocks are compiled and used as tests.
/// First line is a short summary describing function.
///
/// The next lines present detailed documentation. Code blocks start with
/// triple backquotes and have implicit `fn main()` inside
/// and `extern crate <cratename>`. Assume we're testing `doccomments` crate:
///
/// ```
/// let result = doccomments::add(2, 3);
/// assert_eq!(result, 5);
/// ```
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
/// Usually doc comments may include sections "Examples", "Panics" and "Failures".
///
/// The next function divides two numbers.
///
/// # Examples
///
/// ```
/// let result = doccomments::div(10, 2);
/// assert_eq!(result, 5);
/// ```
///
/// # Panics
///
/// The function panics if the second argument is zero.
///
/// ```rust,should_panic
/// // panics on division by zero
/// doccomments::div(10, 0);
/// ```
pub fn div(a: i32, b: i32) -> i32 {
if b == 0 {
panic!("Divide-by-zero error");
}
a / b
}
Tests can be run with cargo test
:
$ cargo test
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests doccomments
running 3 tests
test src/lib.rs - add (line 7) ... ok
test src/lib.rs - div (line 21) ... ok
test src/lib.rs - div (line 31) ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Motivation behind documentation tests
The main purpose of documentation tests is to serve as examples that exercise
the functionality, which is one of the most important
guidelines. It allows using examples from docs as
complete code snippets. But using ?
makes compilation fail since main
returns unit
. The ability to hide some source lines from documentation comes
to the rescue: one may write fn try_main() -> Result<(), ErrorType>
, hide it and
unwrap
it in hidden main
. Sounds complicated? Here's an example:
/// Using hidden `try_main` in doc tests.
///
/// ```
/// # // hidden lines start with `#` symbol, but they're still compileable!
/// # fn try_main() -> Result<(), String> { // line that wraps the body shown in doc
/// let res = try::try_div(10, 2)?;
/// # Ok(()) // returning from try_main
/// # }
/// # fn main() { // starting main that'll unwrap()
/// # try_main().unwrap(); // calling try_main and unwrapping
/// # // so that test will panic in case of error
/// # }
/// ```
pub fn try_div(a: i32, b: i32) -> Result<i32, String> {
if b == 0 {
Err(String::from("Divide-by-zero"))
} else {
Ok(a / b)
}
}
See Also
- RFC505 on documentation style
- API Guidelines on documentation guidelines
Integration testing
Unit tests are testing one module in isolation at a time: they're small and can test private code. Integration tests are external to your crate and use only its public interface in the same way any other code would. Their purpose is to test that many parts of your library work correctly together.
Cargo looks for integration tests in tests
directory next to src
.
File src/lib.rs
:
// Define this in a crate called `adder`.
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
File with test: tests/integration_test.rs
:
#[test]
fn test_add() {
assert_eq!(adder::add(3, 2), 5);
}
Running tests with cargo test
command:
$ cargo test
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/integration_test-bcd60824f5fbfe19
running 1 test
test test_add ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Each Rust source file in tests
directory is compiled as a separate crate. One
way of sharing some code between integration tests is making module with public
functions, importing and using it within tests.
File tests/common.rs
:
pub fn setup() {
// some setup code, like creating required files/directories, starting
// servers, etc.
}
File with test: tests/integration_test.rs
// importing common module.
mod common;
#[test]
fn test_add() {
// using common code.
common::setup();
assert_eq!(adder::add(3, 2), 5);
}
Modules with common code follow the ordinary modules rules, so it's ok to
create common module as tests/common/mod.rs
.
Development dependencies
Sometimes there is a need to have dependencies for tests (or examples,
or benchmarks) only. Such dependencies are added to Cargo.toml
in the
[dev-dependencies]
section. These dependencies are not propagated to other
packages which depend on this package.
One such example is using a crate that extends standard assert!
macros.
File Cargo.toml
:
# standard crate data is left out
[dev-dependencies]
pretty_assertions = "0.4.0"
File src/lib.rs
:
// externing crate for test-only use
#[cfg(test)]
#[macro_use]
extern crate pretty_assertions;
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_add() {
assert_eq!(add(2, 3), 5);
}
}
See Also
Cargo docs on specifying dependencies.
Unsafe Operations
As an introduction to this section, to borrow from the official docs, "one should try to minimize the amount of unsafe code in a code base." With that in mind, let's get started! Unsafe annotations in Rust are used to bypass protections put in place by the compiler; specifically, there are four primary things that unsafe is used for:
- dereferencing raw pointers
- calling functions or methods which are
unsafe
(including calling a function over FFI, see a previous chapter of the book) - accessing or modifying static mutable variables
- implementing unsafe traits
Raw Pointers
Raw pointers *
and references &T
function similarly, but references are
always safe because they are guaranteed to point to valid data due to the
borrow checker. Dereferencing a raw pointer can only be done through an unsafe
block.
Calling Unsafe Functions
Some functions can be declared as unsafe
, meaning it is the programmer's
responsibility to ensure correctness instead of the compiler's. One example
of this is std::slice::from_raw_parts
which will create a slice given a
pointer to the first element and a length.
For slice::from_raw_parts
, one of the assumptions which must be upheld is
that the pointer passed in points to valid memory and that the memory pointed to
is of the correct type. If these invariants aren't upheld then the program's
behaviour is undefined and there is no knowing what will happen.
Compatibility
The Rust language is fastly evolving, and because of this certain compatibility issues can arise, despite efforts to ensure forwards-compatibility wherever possible.
Raw identifiers
Rust, like many programming languages, has the concept of "keywords". These identifiers mean something to the language, and so you cannot use them in places like variable names, function names, and other places. Raw identifiers let you use keywords where they would not normally be allowed. This is particularly useful when Rust introduces new keywords, and a library using an older edition of Rust has a variable or function with the same name as a keyword introduced in a newer edition.
For example, consider a crate foo
compiled with the 2015 edition of Rust that
exports a function named try
. This keyword is reserved for a new feature in
the 2018 edition, so without raw identifiers, we would have no way to name the
function.
extern crate foo;
fn main() {
foo::try();
}
You'll get this error:
error: expected identifier, found keyword `try`
--> src/main.rs:4:4
|
4 | foo::try();
| ^^^ expected identifier, found keyword
You can write this with a raw identifier:
extern crate foo;
fn main() {
foo::r#try();
}
Meta
Some topics aren't exactly relevant to how you program but provide you tooling or infrastructure support which just makes things better for everyone. These topics include:
- Documentation: Generate library documentation for users via the included
rustdoc
. - Playpen: Integrate the Rust Playpen(also known as the Rust Playground) in your documentation.
Documentation
Use cargo doc
to build documentation in target/doc
.
Use cargo test
to run all tests (including documentation tests), and cargo test --doc
to only run documentation tests.
These commands will appropriately invoke rustdoc
(and rustc
) as required.
Doc comments
Doc comments are very useful for big projects that require documentation. When
running rustdoc
, these are the comments that get compiled into
documentation. They are denoted by a ///
, and support Markdown.
To run the tests, first build the code as a library, then tell rustdoc
where
to find the library so it can link it into each doctest program:
$ rustc doc.rs --crate-type lib
$ rustdoc --test --extern doc="libdoc.rlib" doc.rs
Doc attributes
Below are a few examples of the most common #[doc]
attributes used with rustdoc
.
inline
Used to inline docs, instead of linking out to separate page.
#[doc(inline)]
pub use bar::Bar;
/// bar docs
mod bar {
/// the docs for Bar
pub struct Bar;
}
no_inline
Used to prevent linking out to separate page or anywhere.
// Example from libcore/prelude
#[doc(no_inline)]
pub use crate::mem::drop;
Using this tells rustdoc
not to include this in documentation:
For documentation, rustdoc
is widely used by the community. It's what is used to generate the std library docs.
See also:
- The Rust Book: Making Useful Documentation Comments
- The rustdoc Book
- The Reference: Doc comments
- RFC 1574: API Documentation Conventions
- RFC 1946: Relative links to other items from doc comments (intra-rustdoc links)
- Is there any documentation style guide for comments? (reddit)
Playpen
The Rust Playpen is a way to experiment with Rust code through a web interface. This project is now commonly referred to as Rust Playground.
Using it with mdbook
In mdbook
, you can make code examples playable and editable.
This allows the reader to both run your code sample, but also modify and tweak it. The key here is the adding the word editable
to your codefence block separated by a comma.
```rust,editable
//...place your code here
```
Additionally, you can add ignore
if you want mdbook
to skip your code when it builds and tests.
```rust,editable,ignore
//...place your code here
```
Using it with docs
You may have noticed in some of the official Rust docs a button that says "Run", which opens the code sample up in a new tab in Rust Playground. This feature is enabled if you use the #[doc] attribute called html_playground_url
.