Data types - Rust for C-Programmers

5.4 Data Types

Rust is statically typed, meaning the type of every variable must be known at compile time. It is also strongly typed, generally preventing implicit type conversions between unrelated types (e.g., integer to float requires an explicit as cast). This catches many errors early.

Rust’s data types fall into several categories. Here we cover scalar and basic compound types.

5.4.1 Scalar Types

Scalar types represent single values.

Integers: Fixed-size signed (i8, i16, i32, i64, i128) and unsigned (u8, u16, u32, u64, u128) types. The number indicates the bit width. The default integer type (if unspecified and inferrable) is i32.
Pointer-Sized Integers: Signed isize and unsigned usize. Their size matches the target architecture’s pointer width (e.g., 32 bits on 32-bit targets, 64 bits on 64-bit targets). usize is crucial for indexing arrays and collections, representing memory sizes, and pointer arithmetic.
Floating-Point Numbers: f32 (single-precision) and f64 (double-precision), adhering to the IEEE 754 standard. The default is f64, as modern CPUs often handle it as fast as or faster than f32, and it offers higher precision.
Booleans: bool, with possible values true and false. Takes up 1 byte in memory typically.
Characters: char, representing a single Unicode scalar value (from U+0000 to U+D7FF and U+E000 to U+10FFFF). Note that a char is 4 bytes in size, unlike C’s char which is usually 1 byte and often represents ASCII or extended ASCII.

Scalar Type Summary Table:

Rust Type	Size (bits)	Range / Representation	C Equivalent (`<stdint.h>`)	Notes
`i8`	8	-128 to 127	`int8_t`	Signed 8-bit
`u8`	8	0 to 255	`uint8_t`	Unsigned 8-bit (often used for byte data)
`i16`	16	-32,768 to 32,767	`int16_t`	Signed 16-bit
`u16`	16	0 to 65,535	`uint16_t`	Unsigned 16-bit
`i32`	32	-2,147,483,648 to 2,147,483,647	`int32_t`	Default integer type
`u32`	32	0 to 4,294,967,295	`uint32_t`	Unsigned 32-bit
`i64`	64	Approx. -9.2e18 to 9.2e18	`int64_t`	Signed 64-bit
`u64`	64	0 to approx. 1.8e19	`uint64_t`	Unsigned 64-bit
`i128`	128	Approx. -1.7e38 to 1.7e38	`__int128_t` (compiler ext.)	Signed 128-bit
`u128`	128	0 to approx. 3.4e38	`__uint128_t` (compiler ext.)	Unsigned 128-bit
`isize`	Arch-dependent (32/64)	Arch-dependent	`intptr_t`	Signed pointer-sized integer
`usize`	Arch-dependent (32/64)	Arch-dependent	`uintptr_t`, `size_t`	Unsigned pointer-sized, used for indexing
`f32`	32 (IEEE 754)	Single-precision float	`float`
`f64`	64 (IEEE 754)	Double-precision float	`double`	Default float type
`bool`	8 (usually)	`true` or `false`	`_Bool` / `bool` (`<stdbool.h>`)	Boolean value
`char`	32	Unicode Scalar Value (U+0000..U+10FFFF, excl. surrogates)	`wchar_t` (varies), `char32_t` (C++)	Represents a Unicode character (4 bytes)

5.4.2 Compound Types

Compound types group multiple values into one type. Rust has two primitive compound types: tuples and arrays.

Tuple

A tuple is an ordered, fixed-size collection of values where each element can have a different type. Tuples are useful for grouping related data without the formality of defining a struct.

Syntax: Types are written (T1, T2, ..., Tn), and values are (v1, v2, ..., vn).
Fixed Size: The number of elements is fixed at compile time.
Heterogeneous: Elements can have different types.
Access: Use a period (.) followed by a zero-based literal numeric index (e.g., tup.0, tup.1). This index must be known at compile time (it cannot be a variable). Attempting to access a non-existent index results in a compile-time error.

fn main() {
    // A tuple with an i32, f64, and u8
    let tup: (i32, f64, u8) = (500, 6.4, 1);

    // Access elements using period and index (0-based)
    let five_hundred = tup.0;
    let six_point_four = tup.1;
    let one = tup.2;
    println!("Tuple elements: {}, {}, {}", five_hundred, six_point_four, one);

    // Tuple elements must be accessed with literal indices (0, 1, 2, ...).
    // You cannot use a variable index like tup[i] or tup.variable_index.
    // const IDX: usize = 1;
    // let element = tup.IDX; // Compile Error

    // Tuples can be mutable if declared with 'mut'
    let mut mutable_tup = (10, "hello");
    mutable_tup.0 = 20; // OK
    println!("Mutable tuple: {:?}", mutable_tup);

    // Destructuring: Extract values into separate variables
    let (x, y, z) = tup; // Assigns tup.0 to x, tup.1 to y, tup.2 to z
    println!("Destructured: x={}, y={}, z={}", x, y, z);
}

Unit Type (): An empty tuple () is called the “unit type”. It represents the absence of a meaningful value. Functions that don’t explicitly return anything implicitly return (). Statements also evaluate to ().
Singleton Tuple: A tuple with one element requires a trailing comma to distinguish it from a parenthesized expression: (50,) is a tuple, (50) is just the integer 50.

Accessing tuple fields by index (e.g., tup.0) is extremely efficient. The compiler calculates the exact memory offset at compile time, resulting in a direct memory access with no runtime overhead, similar in performance to accessing struct fields in C.

Tuples are good for returning multiple values from a function or when you need a simple, anonymous grouping of data. For more complex data with meaningful field names, use a struct.

Array

An array is a fixed-size collection where every element must have the same type. Arrays are stored contiguously in memory on the stack (unless part of a heap-allocated structure).

Syntax: Type is [T; N] where T is the element type and N is the compile-time constant length. Value is [v1, v2, ..., vN].
Fixed Size: Length N must be known at compile time and cannot change.
Homogeneous: All elements must be of type T.
Initialization:
- List all elements: let a: [i32; 3] = [1, 2, 3];
- Initialize all elements to the same value: let b = [0; 5]; // Creates [0, 0, 0, 0, 0]
Access: Use square brackets [] with a usize index. Access is bounds-checked at runtime; out-of-bounds access causes a panic.

fn main() {
    // Array of 5 integers
    let numbers: [i32; 5] = [1, 2, 3, 4, 5];

    // Type and length can often be inferred
    let inferred_numbers = [10, 20, 30]; // Inferred as [i32; 3]

    // Initialize with a default value
    let zeros = [0u8; 10]; // Array of 10 bytes, all zero

    // Access elements (0-based index, must be usize)
    let first = numbers[0];
    let third = numbers[2];
    println!("First: {}, Third: {}", first, third);

    // Index must be usize
    let idx: usize = 1;
    println!("Element at index {}: {}", idx, numbers[idx]);

    // let invalid_idx: i32 = 1;
    // println!("{}", numbers[invalid_idx]); // Compile Error: index must be usize

    // Bounds checking (this would panic if uncommented)
    // println!("Out of bounds: {}", numbers[10]);

    // Arrays can be mutable
    let mut mutable_array = [1, 1, 1];
    mutable_array[1] = 2;
    println!("Mutable array: {:?}", mutable_array);

    // Get length
    println!("Length of numbers: {}", numbers.len()); // 5
}

Memory: Arrays are typically stack-allocated (if declared locally) and provide efficient, cache-friendly access due to contiguous storage.
Copy Trait: If the element type T implements the Copy trait (like primitive numbers, bool, char), then the array type [T; N] also implements Copy.

Array element access (array[index]) using a runtime variable index is typically very fast. It involves a simple calculation to find the element’s memory address (base + index * size). Crucially, safe Rust precedes this access with a runtime bounds check (index < array.len()) to ensure memory safety, preventing buffer overflows common in C. While this check adds a minimal runtime overhead compared to C’s unchecked access, it provides a vital safety guarantee.

However, if the index is a compile-time constant (e.g., array[2] or an index defined via const), the compiler can perform the bounds check statically. If the constant index is verifiably within the array bounds at compile time, the optimizer will usually eliminate the runtime bounds check entirely. In such cases, the access compiles down to a direct memory operation with a known offset, making it as efficient as accessing a tuple or struct field.

Use arrays when you know the exact number of elements at compile time and need a simple, fixed-size sequence. For dynamically sized collections, use Vec<T> (vector) from the standard library (covered later).

Multidimensional Arrays

You can create multidimensional arrays in Rust by nesting array declarations. For example, a 2x3 matrix (2 rows, 3 columns) can be represented as an array of 2 elements, where each element is an array of 3 integers:

fn main() {
    let matrix: [[i32; 3]; 2] = [ // Type: array of 2 elements, each [i32; 3]
        [1, 2, 3], // Row 0: An array of 3 i32s
        [4, 5, 6], // Row 1: An array of 3 i32s
    ];

    // Accessing element at row 1, column 2 (0-based index)
    let element = matrix[1][2]; // Accesses the value 6
    println!("Element at [1][2]: {}", element);

    // You can also modify elements if the matrix is mutable
    let mut mutable_matrix = matrix;
    // Copies the original matrix (since [i32; 3] and [[i32; 3]; 2] are Copy)
    mutable_matrix[0][1] = 20; // Change element at row 0, column 1 to 20
    println!("Modified matrix[0][1]: {}", mutable_matrix[0][1]); // Prints 20
    println!("Original matrix[0][1]: {}", matrix[0][1]);
    // Prints 2 (original is unchanged)
}

This demonstrates creating an array of arrays. Accessing elements uses chained indexing (matrix[row][column]), and standard bounds checking applies at each level.

References 5.4.3

As introduced in Chapter 2, Rust provides references—safe, managed pointers that allow indirect access to data stored elsewhere in memory. Much like pointers in C, references contain the memory address of a value, enabling one level of indirection.

References in Rust come in two forms: immutable and mutable. They make it possible to temporarily access data without taking ownership or creating a copy, which is particularly efficient when passing values to functions.

To create a reference, Rust uses the & symbol for immutable access and &mut for mutable access. The dereferencing operator * can be used to access the value behind a reference, though Rust often applies dereferencing automatically when needed. In principle, it’s possible to create references to references (e.g., &&value), introducing multiple levels of indirection, but this is seldom required in practice.

Rust also supports raw pointers, which can be used within unsafe blocks for low-level operations that are not checked by the compiler.

Chapter 6 will explore references more thoroughly as part of the discussion on Ownership, Borrowing, and Memory Management.

The following example demonstrates a function that takes a mutable reference to a fixed-size array and squares each element in place:

fn square_elements(arr: &mut [i32; 5]) {
    for i in 0..arr.len() {
        arr[i] *= arr[i];
    }
}

fn main() {
    let mut numbers = [1, 2, 3, 4, 5];
    square_elements(&mut numbers);
    println!("{:?}", numbers); // [1, 4, 9, 16, 25]
}

The function modifies the original array by working directly on its elements through a mutable reference. This avoids the overhead of copying data into and out of the function.

5.4.4 Stack vs. Heap Allocation (Brief Overview)

By default, local variables holding scalar types, tuples, and arrays are allocated on the stack. Stack allocation is very fast because it involves just adjusting a pointer. The size of stack-allocated data must be known at compile time.

Data whose size might change or is not known until runtime (like the contents of a Vec<T> or String) is typically allocated on the heap. Heap allocation is more flexible but involves more overhead (finding free space, bookkeeping).

We will explore stack, heap, ownership, and borrowing—concepts central to Rust’s memory management—in detail in later chapters. For now, understand that primitive types like those discussed here are usually stack-allocated when used as local variables.

5.4.5 A Note on Sub-Range Types

Coming from languages like Ada, Pascal, or Nim, you might be familiar with defining integer types restricted to a specific sub-range, such as type Month = 1..12;. Rust does not have direct, built-in syntax for creating such custom integer types where the range constraint is automatically enforced by the type system on all assignments and operations. This generally aligns with Rust’s philosophy of providing powerful, composable building blocks (like structs and enums) rather than adding numerous specialized types to the language core.

When you need to ensure a number consistently stays within a specific range in Rust, idiomatic approaches include:

The Newtype Pattern: This involves defining a simple struct that wraps a primitive integer (e.g., struct Month(u8);). You then implement associated functions (like Month::new(value: u8)) that perform validation upon creation, typically returning an Option<Month> or Result<Month, Error>. This ensures that if you have a value of type Month, its internal value is guaranteed to be within the valid range (e.g., 1-12). We will explore this useful pattern in more detail in the chapter on structs.
Enums: For small, fixed sets of discrete values (like days of the week or specific error codes), defining an enum is often the clearest and safest approach, providing strong compile-time guarantees.
Runtime Assertions: In internal functions or performance-sensitive code where the overhead of the Newtype pattern isn’t desired, you might use a standard integer type and add checks using assert! or debug_assert! to validate the range at critical points.

Interestingly, while Rust lacks general integer sub-range types, the language and standard library do heavily utilize the concept of value restriction – particularly non-nullness or non-zero-ness – to enhance safety and enable crucial optimizations:

References & Box: Rust’s references (&T, &mut T) and the smart pointer Box<T> are guaranteed by the type system (in safe code) to never be null.
NonNull and NonZero: The standard library provides explicit types like std::ptr::NonNull<T> (for raw pointers) and the std::num::NonZero{Integer} family (e.g., NonZeroU8, NonZeroIsize, stable since Rust 1.79). These types encapsulate a value that is guaranteed not to be zero (or null). This guarantee allows for significant memory layout optimizations; for example, Option<NonZeroU8> takes up only 1 byte of memory, the same as u8, because the “None” variant can safely reuse the zero representation.

So, while you won’t find a direct equivalent to type Day = 1..31;, Rust provides patterns to achieve similar guarantees and leverages specific range restrictions (like non-zero) where they offer substantial benefits.