8.11 Function Inlining

Inlining is a compiler optimization where the code of a called function is inserted directly at the call site, rather than performing an actual function call (which involves setting up a stack frame, jumping, and returning). Rust’s compiler (specifically, the LLVM backend) automatically performs inlining based on heuristics (function size, call frequency, optimization level, etc.) during release builds (cargo build --release).

Benefits of Inlining: Inlining primarily aims to reduce the overhead associated with function calls. More importantly, by making the function’s body visible within the caller’s context, it can unlock further optimizations:

  • Constant Propagation: If arguments passed to the inlined function are compile-time constants, the compiler can often simplify the inlined code significantly.
  • Dead Code Elimination: Conditional branches within the inlined function might become constant, allowing the compiler to remove unreachable code.
  • Specialization: When generic functions or functions taking closures are inlined, the compiler can generate highly specialized code tailored to the specific types or closure being used, often resulting in performance equivalent to hand-written specialized code. (We will see more about closures and optimization in a later chapter).

You can influence inlining decisions using the #[inline] attribute:

  • #[inline]: Suggests to the compiler that inlining this function might be beneficial. It’s a hint, not a command.
  • #[inline(always)]: A stronger hint, requesting the compiler to always inline the function if possible. The compiler might still decline if inlining is impossible or deemed harmful (e.g., for recursive functions without TCO, or if it leads to excessive code bloat).
  • #[inline(never)]: Suggests the compiler should avoid inlining this function.
// Suggest inlining this small function.
#[inline]
fn add_one(x: i32) -> i32 {
    x + 1
}

// Strongly request inlining.
#[inline(always)]
fn is_positive(x: i32) -> bool {
    x > 0
}

// Discourage inlining (rarely needed).
#[inline(never)]
fn complex_calculation(data: &[u8]) {
    // ... potentially large function body ...
    println!("Performing complex calculation.");
}


fn main() {
    let y = add_one(5);       // May be inlined
    let positive = is_positive(y); // Likely to be inlined
    complex_calculation(&[1, 2, 3]); // Unlikely to be inlined
    println!("y = {}, positive = {}", y, positive);
}

Trade-offs: While inlining reduces call overhead and enables optimizations, over-inlining (especially of large functions) can lead to code bloat, increasing the overall size of the compiled binary, which can negatively impact instruction cache performance. Relying on the compiler’s default heuristics is often sufficient, but #[inline] can be useful for performance-critical library code or very small, frequently called helper functions.

8.11.1 When Inlining Might Not Occur or Be Limited

While the compiler often performs inlining aggressively in optimized builds, certain technical and practical factors can prevent or limit it, even when hinted with #[inline] or #[inline(always)]:

  • Optimization Level: Inlining is primarily an optimization feature of release builds (--release, -C opt-level=3). Debug builds (-C opt-level=0) intentionally perform minimal inlining for faster compiles and better debugging.
  • Call Type:
    • Indirect Calls: Calls via function pointers or dynamic dispatch (trait objects) generally cannot be inlined as the target function isn’t known at compile time.
    • External/FFI Calls: Calls to external functions (e.g., C libraries) cannot be inlined as their body isn’t available to the Rust compiler.
    • Recursion: Directly recursive functions usually cannot be fully inlined.
  • Compilation Boundaries:
    • Across Crates: Inlining code from dependency crates requires the function’s metadata (like MIR) to be available (common for generics or #[inline] functions) or Link-Time Optimization (LTO) to be enabled. Without these conditions, cross-crate inlining of regular functions is limited.
    • Within Crates (CGUs): Incremental compilation divides crates into Code Generation Units (CGUs). Aggressive inlining across CGU boundaries might be restricted by default (unless LTO is on) to improve incremental build times. Inlining within a CGU (or across modules within a single CGU) is common.
  • Compiler Limits: Even with #[inline(always)], the compiler uses heuristics and may refuse to inline very large/complex functions to avoid excessive code bloat.
  • Dynamic Linking Preference (prefer-dynamic): Requesting dynamic linking at the final executable stage generally does not prevent the compiler from inlining functions from Rust libraries (.rlib) during the compilation phase itself.

Finally, enabling Link-Time Optimization (LTO) can overcome some of these boundary limitations, allowing the compiler/linker to perform more aggressive inlining across crates and codegen units, often at the cost of significantly longer link times.