Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Ferrompi Backend

Purpose

The ferrompi backend is the primary production communication backend for Cobre. It wraps the ferrompi MPI bindings behind the Communicator and SharedMemoryProvider traits defined in Communicator Trait §1 and Communicator Trait §4, providing zero-cost abstraction over MPI collectives and shared memory windows for HPC cluster deployments. As the reference backend, all other backends must match its observable behavior – the ferrompi backend defines the canonical semantics that the local, TCP, and shared memory backends emulate. The architecture follows the compile-time selection pattern established in Solver Abstraction §10 and gated by the mpi feature flag as specified in Backend Registration and Selection §1.2.

1. Struct and Trait Implementation

1.1 Struct Definition

The FerrompiBackend struct holds the MPI communicator handles obtained during initialization. The world communicator is used for all inter-rank collective operations during training; the intra-node communicator is used for shared memory window creation and leader/follower determination.

#![allow(unused)]
fn main() {
/// Primary production communication backend wrapping ferrompi MPI bindings.
///
/// Delegates all `Communicator` trait methods directly to the corresponding
/// ferrompi API calls, achieving zero-cost abstraction via monomorphization
/// (see §4.1).
#[cfg(feature = "mpi")]
pub struct FerrompiBackend {
    // SAFETY: `mpi` must be declared before `world` and `shared` to ensure
    // Rust's reverse-declaration drop order drops the communicators before
    // the MPI guard. Dropping `mpi` calls `MPI_Finalize`, which must happen
    // only after all communicators and windows have been freed.
    /// MPI RAII guard. Its `Drop` calls `MPI_Finalize`.
    /// Must outlive all communicators and shared memory windows.
    mpi: ferrompi::Mpi,

    /// World communicator for all collective operations and rank/size queries.
    world: ferrompi::Communicator,

    /// Intra-node communicator created via `split_shared()`. Groups
    /// ranks on the same physical node for shared memory window operations.
    /// `None` only if initialization is incomplete (see §2).
    shared: Option<ferrompi::Communicator>,
}
}

1.2 Communicator Trait Implementation

The impl Communicator for FerrompiBackend delegates each trait method to the corresponding ferrompi API call. The method mapping table below summarizes the conversions.

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl Communicator for FerrompiBackend {
    fn allgatherv<T: CommData>(
        &self,
        send: &[T],
        recv: &mut [T],
        counts: &[usize],
        displs: &[usize],
    ) -> Result<(), CommError> {
        // counts/displs are &[usize] in Cobre's trait but &[i32] in ferrompi;
        // conversion is performed by a helper (omitted for clarity).
        let (i32_counts, i32_displs) = to_i32_vecs(counts, displs);
        self.world
            .allgatherv(send, recv, &i32_counts, &i32_displs)
            .map_err(|e| map_ferrompi_error(e, "allgatherv"))
    }

    fn allreduce<T: CommData>(
        &self,
        send: &[T],
        recv: &mut [T],
        op: ReduceOp,
    ) -> Result<(), CommError> {
        let mpi_op = match op {
            ReduceOp::Sum => ferrompi::ReduceOp::Sum,
            ReduceOp::Min => ferrompi::ReduceOp::Min,
            ReduceOp::Max => ferrompi::ReduceOp::Max,
        };
        self.world
            .allreduce(send, recv, mpi_op)
            .map_err(|e| map_ferrompi_error(e, "allreduce"))
    }

    fn broadcast<T: CommData>(
        &self,
        buf: &mut [T],
        root: usize,
    ) -> Result<(), CommError> {
        self.world
            .broadcast(buf, root as i32)
            .map_err(|e| map_ferrompi_error(e, "broadcast"))
    }

    fn barrier(&self) -> Result<(), CommError> {
        self.world
            .barrier()
            .map_err(|e| map_ferrompi_error(e, "barrier"))
    }

    fn rank(&self) -> usize {
        self.world.rank() as usize
    }

    fn size(&self) -> usize {
        self.world.size() as usize
    }
}
}

Method mapping summary:

Trait Methodferrompi API CallType Conversion
allgathervself.world.allgatherv(send, recv, &i32_counts, &i32_displs)counts/displs: &[usize] to &[i32]
allreduceself.world.allreduce(send, recv, mpi_op)ReduceOp to ferrompi::ReduceOp (zero-cost match)
broadcastself.world.broadcast(buf, root)root: usize to i32
barrierself.world.barrier()None
rankself.world.rank()i32 to usize
sizeself.world.size()i32 to usize

Error delegation: All fallible methods convert ferrompi::Error to CommError via the map_ferrompi_error helper (see SS5.2). The helper extracts the MPI error code from the Error::Mpi variant and classifies errors into the most specific CommError variant. No new CommError variants are introduced – all errors map to the variants defined in Communicator Trait §1.4 and Communicator Trait §4.6. See SS5 for the complete error mapping table.

Precondition validation: Buffer size preconditions (Communicator Trait §2.1 – §2.5) are enforced by the ferrompi layer, which validates arguments before issuing the MPI call. The FerrompiBackend does not duplicate these checks.

2. Initialization

The FerrompiBackend::new() constructor wraps the MPI initialization sequence defined in Hybrid Parallelism §6 Steps 1-3. It is called once during the Startup phase, before any collective operations or shared memory allocations.

2.1 Initialization Sequence

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl FerrompiBackend {
    /// Initialize the ferrompi backend (MPI startup sequence, Steps 1-3 of
    /// [Hybrid Parallelism §6](./hybrid-parallelism.md)).
    ///
    /// # Errors
    ///
    /// Returns `BackendError::InitializationFailed` if `MPI_Init_thread` fails
    /// or returns a threading level below `MPI_THREAD_FUNNELED`, or if the
    /// shared-memory communicator split fails.
    ///
    /// # Panics
    ///
    /// Panics if called more than once (MPI does not support multiple
    /// initializations).
    pub fn new() -> Result<Self, BackendError> {
        // Step 1 — MPI initialization (Hybrid Parallelism §6 Step 1)
        // MPI_THREAD_FUNNELED: only the main thread makes MPI calls.
        // The Cobre training loop serializes all MPI collective calls on
        // the main thread; Rayon worker threads never call MPI directly.
        // ferrompi::Mpi::init_thread returns the Mpi guard whose Drop
        // calls MPI_Finalize. The guard is stored in the backend struct
        // (see note below on lifetime management).
        let mpi = ferrompi::Mpi::init_thread(
            ferrompi::ThreadLevel::Funneled,
        ).map_err(|e| BackendError::InitializationFailed {
            backend: "mpi".to_string(),
            source: Box::new(e),
        })?;

        let world = mpi.world();

        // Step 2 — Topology detection (Hybrid Parallelism §6 Step 2)
        let _rank = world.rank();
        let _size = world.size();

        // Step 3 — Shared memory communicator (Hybrid Parallelism §6 Step 3)
        // split_shared() is a convenience wrapper for
        // MPI_Comm_split_type(MPI_COMM_TYPE_SHARED).
        let shared = world
            .split_shared()
            .map_err(|e| BackendError::InitializationFailed {
                backend: "mpi".to_string(),
                source: Box::new(e),
            })?;

        Ok(FerrompiBackend {
            mpi,
            world,
            shared: Some(shared),
        })
    }
}
}

2.2 Shutdown and Drop

The FerrompiBackend implements Drop to call MPI_Finalize when the backend is destroyed. All shared memory regions (SharedRegion §3) must be dropped before the backend is dropped, because MPI_Win_free must precede MPI_Finalize.

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl Drop for FerrompiBackend {
    fn drop(&mut self) {
        // Drop the intra-node communicator before MPI_Finalize.
        self.shared.take();
        // MPI_Finalize is called by ferrompi's Mpi drop.
    }
}
}

Ordering constraint: The training loop ensures the following drop order during shutdown:

  1. All SharedRegion<T> handles (calls MPI_Win_free on each window)
  2. The FerrompiBackend (calls MPI_Finalize via ferrompi’s Mpi drop)

This ordering is consistent with the lifetime contract in Communicator Trait §4.2.

3. SharedWindow Implementation

3.1 SharedMemoryProvider Trait Implementation

Status: Not Yet Implemented – using HeapRegion fallback. Per spec SS4.7 (Communicator Trait §4.7), true MPI shared windows (MPI_Win_allocate_shared) are deferred to post-profiling. The current implementation uses HeapRegion<T> (heap fallback) as the Region associated type for all backends, including FerrompiBackend. Each rank holds its own Vec<T> copy with no memory shared across ranks. The FerrompiRegion<T> / SharedWindow<T> design below is the aspirational target architecture that will be activated when production-scale profiling demonstrates memory pressure (see trigger conditions in Communicator Trait §4.7).

The target architecture has FerrompiBackend implementing SharedMemoryProvider by delegating to ferrompi’s SharedWindow<T> for true intra-node shared memory. The Region associated type would wrap ferrompi::SharedWindow<T> in a FerrompiRegion<T> newtype that implements the SharedRegion<T> trait.

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl SharedMemoryProvider for FerrompiBackend {
    type Region<T: CommData> = FerrompiRegion<T>;

    fn create_shared_region<T: CommData>(
        &self,
        count: usize,
    ) -> Result<Self::Region<T>, CommError> {
        let shared_comm = self.shared.as_ref().ok_or(
            CommError::InvalidCommunicator,
        )?;

        // The leader (local rank 0) allocates `count` elements.
        // Followers allocate size 0 and receive a handle to the
        // leader's memory via MPI window attachment.
        let alloc_count = if shared_comm.rank() == 0 {
            count
        } else {
            0
        };

        let window = ferrompi::SharedWindow::allocate(shared_comm, alloc_count)
            .map_err(|e| CommError::AllocationFailed {
                requested_bytes: count * std::mem::size_of::<T>(),
                message: e.to_string(),
            })?;

        Ok(FerrompiRegion {
            window,
            count,
        })
    }

    fn split_local(&self) -> Result<Box<dyn LocalCommunicator>, CommError> {
        // Issues MPI_Comm_split_type(MPI_COMM_TYPE_SHARED) to obtain an
        // intra-node communicator, then wraps it in FerrompiLocalComm.
        // FerrompiLocalComm implements LocalCommunicator (rank, size, barrier)
        // but not the full generic Communicator trait.
        self.world
            .split_shared()
            .map(|c| Box::new(FerrompiLocalComm(c)) as Box<dyn LocalCommunicator>)
            .map_err(|e| map_ferrompi_error(&e, "split_local"))
    }

    fn is_leader(&self) -> bool {
        self.shared
            .as_ref()
            .map(|c| c.rank() == 0)
            .unwrap_or(true)
    }
}
}

Leader determination: The leader is the rank with local rank 0 within the intra-node communicator, consistent with the convention in Hybrid Parallelism §6 Step 3 and the leader/follower pattern in Communicator Trait §4.3.

3.2 SharedRegion Wrapper (Aspirational – Not Yet Implemented)

The FerrompiRegion<T> newtype would wrap ferrompi::SharedWindow<T> and implement the SharedRegion<T> trait. The wrapper would encapsulate the unsafe boundary for raw pointer dereference into MPI shared windows, presenting a safe Rust interface to the training loop. In the current codebase, FerrompiBackend::Region<T> is HeapRegion<T> and create_shared_region allocates via vec![T::default(); count].

#![allow(unused)]
fn main() {
/// Shared memory region backed by an MPI window (`ferrompi::SharedWindow<T>`).
///
/// Lifecycle follows [Communicator Trait §4.2](./communicator-trait.md).
/// RAII: dropping calls `MPI_Win_free` on the underlying window.
#[cfg(feature = "mpi")]
pub struct FerrompiRegion<T: CommData> {
    /// The underlying MPI shared memory window.
    window: ferrompi::SharedWindow<T>,

    /// Logical element count (the `count` passed to `create_shared_region`;
    /// followers allocate 0 locally but reference the leader's memory).
    count: usize,
}
}
#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl<T: CommData> SharedRegion<T> for FerrompiRegion<T> {
    fn as_slice(&self) -> &[T] {
        // All ranks (leader and followers) access the leader's (rank 0)
        // memory via remote_slice(0). This works uniformly: rank 0 gets
        // a view into its own allocation, other ranks get a view into the
        // shared window mapped from rank 0's region.
        // Caller must call fence() before reading to ensure no data races.
        self.window
            .remote_slice(0)
            .expect("rank 0 window region always valid")
    }

    fn as_mut_slice(&mut self) -> &mut [T] {
        // Returns the caller's local allocation. For the leader (rank 0),
        // this is the full shared region; for followers, the local
        // allocation is size 0, so local_slice_mut() returns an empty slice.
        // Safety: &mut self prevents concurrent access at compile time.
        self.window.local_slice_mut()
    }

    fn fence(&self) -> Result<(), CommError> {
        // MPI_Win_fence: collective -- all ranks must call fence().
        self.window
            .fence()
            .map_err(|e| map_ferrompi_error(e, "fence"))
    }
}
}
#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
impl<T: CommData> Drop for FerrompiRegion<T> {
    fn drop(&mut self) {
        // MPI_Win_free called via ferrompi::SharedWindow::drop().
        // Must complete before MPI_Finalize (see §2.2).
    }
}
}

4. Performance

4.1 Zero-Cost Abstraction via Monomorphization

When the training loop is generic over C: Communicator and the binary is built with only the mpi feature enabled, the compiler resolves C to FerrompiBackend at compile time. This enables:

  1. Full inlining – Each trait method call compiles to a direct call to the corresponding ferrompi function. The match on ReduceOp in allreduce is eliminated when the variant is known at the call site (constant propagation).

  2. No vtable, no indirection – Static dispatch means there is no trait object, no vtable lookup, and no dynamic dispatch overhead. The generated assembly for train::<FerrompiBackend>(comm, ...) is identical to code that calls comm.world.allgatherv(...) directly.

  3. Dead code elimination – In a single-feature build (--features mpi only), all cfg-gated code paths for other backends are removed from the binary. The CommBackend enum (Backend Registration and Selection §4.2) is never instantiated.

This achieves the same zero-cost property as the solver abstraction pattern in Solver Abstraction §10: the abstraction layer has no runtime cost in production builds.

4.2 Persistent Collectives as Internal Optimization

MPI 4.0 persistent collectives (Communication Patterns §4) can be used as an internal optimization within the FerrompiBackend without affecting the Communicator trait interface. The trait methods (allgatherv, allreduce, broadcast, barrier) define the external API; the backend is free to implement them using either standard or persistent MPI collectives.

Persistent collective strategy:

Trait MethodPersistent CandidateImplementation Strategy
allgathervConditionalPre-initialize if buffer sizes are known at construction (fixed ); fall back to standard collective if counts change between calls
allreduceYesPre-initialize with fixed buffer at construction for simulation-mode min/max; reuse every iteration
broadcastNoCalled only during initialization and LB distribution with varying buffer sizes; standard collective is sufficient
barrierNoCalled only at checkpoints; standard collective is sufficient

Internal state for persistent collectives:

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
pub struct FerrompiBackend {
    world: ferrompi::Communicator,
    shared: Option<ferrompi::Communicator>,

    /// Pre-initialized persistent allreduce request for convergence
    /// statistics (4 x f64, fixed size). `None` if the MPI implementation
    /// does not support MPI 4.0 persistent collectives.
    persistent_allreduce: Option<ferrompi::PersistentRequest>,
}
}

The persistent collective initialization occurs during FerrompiBackend::new() if the MPI implementation supports it (detected via MPI_T_get_info or a runtime version check). If persistent collectives are unavailable, the backend falls back to standard collectives transparently – the trait interface is unchanged.

Key property: Persistent collectives are an implementation detail of the ferrompi backend. They do not appear in the Communicator trait definition, do not affect the trait’s method signatures, and do not alter the observable semantics of any collective operation. Other backends are unaffected by this optimization.

5. Error Mapping

The ferrompi backend maps MPI error codes to CommError variants defined in Communicator Trait §1.4 and Communicator Trait §4.6. No new CommError variants are introduced by this backend.

5.1 MPI Error Code to CommError Mapping

MPI Error CodeTypical CauseCommError Variant
MPI_ERR_COMMInvalid or finalized communicator handleCommError::InvalidCommunicator
MPI_ERR_BUFFERInvalid buffer pointer (null, misaligned)CommError::InvalidBufferSize { operation, expected, actual }
MPI_ERR_COUNTNegative count or count/displacement mismatchCommError::InvalidBufferSize { operation, expected, actual }
MPI_ERR_TYPEUnsupported or uncommitted MPI datatypeCommError::CollectiveFailed { operation, mpi_error_code, message }
MPI_ERR_ROOTRoot rank out of range in broadcastCommError::InvalidRoot { root, size }
MPI_ERR_WINInvalid MPI window in shared memory operationCommError::CollectiveFailed { operation, mpi_error_code, message }
MPI_ERR_NO_MEMShared memory allocation rejected by OS/MPICommError::AllocationFailed { requested_bytes, message }
MPI_ERR_OTHERUnclassified MPI error (process crash, network failure)CommError::CollectiveFailed { operation, mpi_error_code, message }

5.2 Error Conversion Implementation

The map_ferrompi_error helper converts a ferrompi::Error to the most specific CommError variant. The real ferrompi::Error is a thiserror-derived enum (not a struct), so the conversion pattern-matches on the variant. Only the Error::Mpi variant carries an MPI error class and code; the other variants (AlreadyInitialized, InvalidBuffer, NotSupported, Internal) map to fixed CommError variants.

#![allow(unused)]
fn main() {
/// Convert a ferrompi::Error to the most specific CommError variant.
/// Used by all Communicator and SharedRegion trait method implementations.
#[cfg(feature = "mpi")]
fn map_ferrompi_error(e: ferrompi::Error, operation: &'static str) -> CommError {
    match e {
        ferrompi::Error::Mpi { class, code, ref message } => match class {
            ferrompi::MpiErrorClass::Comm => CommError::InvalidCommunicator,
            ferrompi::MpiErrorClass::Root => CommError::InvalidRoot {
                // ferrompi::Error does not carry root/size context;
                // use sentinel values. The message string provides details.
                root: 0,
                size: 0,
            },
            ferrompi::MpiErrorClass::Buffer | ferrompi::MpiErrorClass::Count => {
                CommError::InvalidBufferSize {
                    operation,
                    // ferrompi::Error does not carry expected/actual counts;
                    // use sentinel values. The message string provides details.
                    expected: 0,
                    actual: 0,
                }
            }
            // MPI_ERR_NO_MEM maps to AllocationFailed for shared memory ops.
            // The MpiErrorClass enum does not have a NoMem variant; this
            // condition surfaces as MpiErrorClass::Other or MpiErrorClass::Raw(_).
            // In practice, window allocation failures are rare and the fallback
            // to CollectiveFailed is acceptable.
            _ => CommError::CollectiveFailed {
                operation,
                mpi_error_code: code,
                message: message.clone(),
            },
        },
        ferrompi::Error::InvalidBuffer => CommError::InvalidBufferSize {
            operation,
            expected: 0,
            actual: 0,
        },
        ferrompi::Error::AlreadyInitialized => CommError::InvalidCommunicator,
        _ => CommError::CollectiveFailed {
            operation,
            mpi_error_code: -1,
            message: e.to_string(),
        },
    }
}
}

Design note: The real ferrompi::Error enum does not carry structured context fields (root rank, buffer sizes, requested bytes) that the speculative API assumed. The CommError fields that require these values (InvalidRoot.root, InvalidBufferSize.expected, etc.) are populated with sentinel values (0); the human-readable message from ferrompi provides the diagnostic detail. This is an acceptable trade-off because these error paths are not performance-critical and the message string is always available for logging.

5.3 MPI_Init_thread Failure

If ferrompi::Mpi::init_thread(ThreadLevel::Funneled) fails – either because MPI_Init_thread returns an error or because the provided threading level is below MPI_THREAD_FUNNELED – the error is reported as BackendError::InitializationFailed (defined in Backend Registration and Selection §6.2), not as a CommError. This is because initialization failure is a factory-level concern that occurs before any Communicator trait method can be called.

6. Feature Gating

6.1 Conditional Compilation

The entire FerrompiBackend implementation is gated behind the mpi Cargo feature flag, consistent with the feature flag matrix in Backend Registration and Selection §1.2.

#![allow(unused)]
fn main() {
#[cfg(feature = "mpi")]
mod ferrompi_backend {
    use crate::comm::{
        CommData, CommError, Communicator, ReduceOp,
        SharedMemoryProvider, SharedRegion,
    };

    pub struct FerrompiBackend { /* ... */ }
    pub struct FerrompiRegion<T: CommData> { /* ... */ }

    impl Communicator for FerrompiBackend { /* ... */ }
    impl SharedMemoryProvider for FerrompiBackend { /* ... */ }
    impl<T: CommData> SharedRegion<T> for FerrompiRegion<T> { /* ... */ }
}

#[cfg(feature = "mpi")]
pub use ferrompi_backend::{FerrompiBackend, FerrompiRegion};
}

6.2 Cargo.toml Feature Declaration

[features]
default = []
mpi = ["dep:ferrompi"]

[dependencies]
ferrompi = { version = "0.2.2", optional = true }

When mpi is not enabled:

  • The ferrompi crate is not compiled or linked.
  • No MPI development libraries (OpenMPI, MPICH, Intel MPI) are required at build time.
  • The FerrompiBackend and FerrompiRegion types do not exist – any attempt to reference them produces a Rust compilation error.
  • The create_communicator() factory function (Backend Registration and Selection §4.1) does not include an MPI branch.

When mpi is enabled:

  • The ferrompi crate is compiled and linked against the system MPI library (libmpi.so or equivalent).
  • All #[cfg(feature = "mpi")] items are included in the binary.
  • The MPI backend is available for selection via the factory function or COBRE_COMM_BACKEND=mpi.

6.3 Build Profile Integration

The ferrompi backend is included in the following build profiles from Backend Registration and Selection §1.3:

Build ProfileIncludes mpi?Rationale
CLI / HPCYesMPI is the production communication layer for cluster deployment
Python wheelNoNo MPI dependency on user machines
Test / CINoOnly local backend; no external dependencies
DevelopmentYesAll backends compiled for testing

7. ferrompi API Reference

This section documents the public API surface of the ferrompi crate (version 0.2.2, github.com/rjmalves/ferrompi) – a safe Rust wrapper around the MPI C library. ferrompi encapsulates all unsafe FFI calls internally; no public method requires the Rust unsafe keyword at the call site. The API documented here is the contract that FerrompiBackend (SS1–SS6) implements against.

Note. ferrompi is a thin wrapper around the MPI C functions (MPI_Init_thread, MPI_Comm_rank, MPI_Allgatherv, MPI_Win_allocate_shared, etc.). It adds Rust type safety, RAII resource management, and Result-based error handling, but does not alter MPI semantics. All MPI behavioral guarantees (ordering, collective synchronization, shared memory coherence) are inherited from the underlying MPI implementation.

7.1 Mpi Struct and Initialization

The Mpi struct is the RAII guard for the MPI runtime lifetime. Creating an Mpi instance calls MPI_Init_thread; dropping it calls MPI_Finalize. Exactly one Mpi instance may exist per process.

7.1.1 Mpi::init

Initializes the MPI runtime with the default threading level (MPI_THREAD_SINGLE).

#![allow(unused)]
fn main() {
impl Mpi {
    pub fn init() -> Result<Self>
}
}

Preconditions:

ConditionDescription
First and only callMust not have been called previously in this process. MPI does not support multiple initializations.
No prior MPI_InitNo other code (including C libraries) may have called MPI_Init or MPI_Init_thread before this call.

Postconditions (on Ok):

ConditionDescription
MPI runtime is initializedMPI_Init_thread has been called.
Returned Mpi holds MPI lifetimeThe MPI runtime remains active until the Mpi is dropped.

Error type: ferrompi::Error::AlreadyInitialized if called more than once. ferrompi::Error::Mpi { .. } if MPI_Init_thread fails.

Unsafe: No. The internal FFI call to MPI_Init_thread is encapsulated.

7.1.2 Mpi::init_thread

Initializes the MPI runtime with the requested threading support level. This is the initialization function used by Cobre (see SS2.1).

#![allow(unused)]
fn main() {
impl Mpi {
    pub fn init_thread(required: ThreadLevel) -> Result<Self>
}
}

Preconditions:

ConditionDescription
First and only callMust not have been called previously in this process. MPI does not support multiple initializations.
No prior MPI_InitNo other code (including C libraries) may have called MPI_Init or MPI_Init_thread before this call.

Postconditions (on Ok):

ConditionDescription
MPI runtime is initializedMPI_Init_thread has been called with the requested level.
Returned Mpi holds MPI lifetimeThe MPI runtime remains active until the Mpi is dropped.
Provided level is satisfiedThe MPI implementation supports at least the requested ThreadLevel. If the provided level is below the requested level, the function returns Err.

Error type: ferrompi::Error::AlreadyInitialized if called more than once. ferrompi::Error::Mpi { .. } if MPI_Init_thread fails or returns a level below the requested one.

Unsafe: No. The internal FFI call to MPI_Init_thread is encapsulated.

Cobre requests ThreadLevel::Funneled (see SS2.1) because only the main thread makes MPI calls; Rayon worker threads perform LP solves but never invoke MPI collectives directly.

7.1.3 Mpi::world

Returns the world communicator (MPI_COMM_WORLD).

#![allow(unused)]
fn main() {
impl Mpi {
    pub fn world(&self) -> Communicator
}
}

Preconditions: None beyond holding a valid Mpi (guaranteed by successful init or init_thread).

Postconditions:

ConditionDescription
Returned Communicator wraps MPI_COMM_WORLDRank and size reflect the full set of MPI processes.
Communicator is valid for the lifetime of MpiThe communicator must not be used after Mpi is dropped.

Error type: Infallible (does not return Result).

Unsafe: No.

7.1.4 Other Mpi Methods

MethodSignatureDescription
thread_levelfn thread_level(&self) -> ThreadLevelReturns the threading level provided by the MPI implementation.
wtimefn wtime() -> f64Returns wall-clock time in seconds (wraps MPI_Wtime). Static.
versionfn version() -> Result<String>Returns the MPI library version string.
is_initializedfn is_initialized() -> boolReturns true if MPI has been initialized. Static.
is_finalizedfn is_finalized() -> boolReturns true if MPI has been finalized. Static.

Drop: Dropping Mpi calls MPI_Finalize. All SharedWindow instances and derived communicators must be dropped before the Mpi guard (see SS2.2).

7.2 Communicator

The Communicator type is a safe handle to an MPI communicator. It implements Send + Sync, enabling shared use across threads. Under MPI_THREAD_FUNNELED, only the main thread makes MPI calls; Send + Sync on the handle is still sound because it allows safe passing between threads without implying concurrent MPI invocation.

#![allow(unused)]
fn main() {
pub struct Communicator { /* opaque: wraps MPI_Comm handle */ }

// Thread safety: Communicator is Send + Sync.
// Under MPI_THREAD_FUNNELED, only the main thread issues MPI calls.
// Send + Sync on the handle is sound because the handle is an integer
// into a C-side table; ferrompi::Communicator carries no thread-local state.
}

The following subsections document in detail the methods used by the Cobre backend (SS1–SS3). Section 7.2.8 provides a summary table of the remaining methods.

7.2.1 rank

Returns the rank of the calling process within this communicator.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn rank(&self) -> i32
}
}

Preconditions: None.

Postconditions: Returns a value in [0, self.size()).

Error type: Infallible. Wraps MPI_Comm_rank, which does not fail on a valid communicator.

Unsafe: No.

7.2.2 size

Returns the number of processes in this communicator.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn size(&self) -> i32
}
}

Preconditions: None.

Postconditions: Returns a value >= 1.

Error type: Infallible. Wraps MPI_Comm_size.

Unsafe: No.

7.2.3 allgatherv

Gathers variable-length data from all ranks and distributes the concatenated result to all ranks.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn allgatherv<T: MpiDatatype>(
        &self,
        send: &[T],
        recv: &mut [T],
        recvcounts: &[i32],
        displs: &[i32],
    ) -> Result<()>
}
}

Preconditions:

ConditionDescription
recvcounts.len() == self.size()One count per rank.
displs.len() == self.size()One displacement per rank.
send.len() == recvcounts[self.rank()]Send buffer length matches this rank’s declared count.
recv.len() >= displs[i] + recvcounts[i] for all iReceive buffer is large enough for all incoming data.
All ranks call collectivelyallgatherv is a collective operation; all ranks in the communicator must call it with consistent counts and displacements.

Postconditions (on Ok):

ConditionDescription
recv contains gathered datarecv[displs[i]..displs[i]+recvcounts[i]] holds data sent by rank i.

Error type: ferrompi::Error. Common: Error::Mpi with class Buffer, Count, or Comm.

Unsafe: No.

7.2.4 allreduce

Performs an element-wise reduction across all ranks and distributes the result to all ranks.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn allreduce<T: MpiDatatype>(
        &self,
        send: &[T],
        recv: &mut [T],
        op: ReduceOp,
    ) -> Result<()>
}
}

Preconditions:

ConditionDescription
send.len() == recv.len()Send and receive buffers have the same length.
All ranks call collectivelyAll ranks must call with the same op and the same element count.

Postconditions (on Ok):

ConditionDescription
recv[i] is the reduction of send[i] across all ranksThe reduction operation is determined by op (Sum, Min, Max, or Prod).

Error type: ferrompi::Error. Common: Error::Mpi with class Buffer, Count, Comm, or Type.

Unsafe: No.

7.2.5 broadcast

Broadcasts data from the root rank to all other ranks.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn broadcast<T: MpiDatatype>(
        &self,
        data: &mut [T],
        root: i32,
    ) -> Result<()>
}
}

Preconditions:

ConditionDescription
0 <= root < self.size()Root rank is valid within this communicator.
All ranks call collectivelyAll ranks must call with the same root and the same buffer length.
On root: data contains the data to broadcastOn non-root ranks: data content is overwritten.

Postconditions (on Ok):

ConditionDescription
All ranks hold root’s data in dataAfter return, data on every rank contains the data that was in data on the root rank at entry.

Error type: ferrompi::Error. Common: Error::Mpi with class Root, Buffer, Count, Comm.

Unsafe: No.

7.2.6 barrier

Synchronizes all ranks in the communicator. No rank returns until all ranks have entered the barrier.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn barrier(&self) -> Result<()>
}
}

Preconditions:

ConditionDescription
All ranks call collectivelyDeadlock occurs if any rank does not call barrier.

Postconditions (on Ok):

ConditionDescription
All ranks have reached the barrierGlobal synchronization point.

Error type: ferrompi::Error. Common: Error::Mpi with class Comm.

Unsafe: No.

7.2.7 split_shared

Creates a sub-communicator grouping ranks that share a physical node (i.e., can use shared memory). Convenience wrapper for MPI_Comm_split_type with MPI_COMM_TYPE_SHARED.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn split_shared(&self) -> Result<Communicator>
}
}

Preconditions:

ConditionDescription
All ranks call collectivelyAll ranks in the communicator must participate.

Postconditions (on Ok):

ConditionDescription
Returned communicator groups co-located ranksAll ranks on the same physical node are in the same sub-communicator.
Rank 0 in the sub-communicator is the node leaderBy MPI convention, rank ordering within the split follows the original rank ordering.

Error type: ferrompi::Error. Common: Error::Mpi with class Comm.

Unsafe: No.

Note: The more general split_type method is also available, accepting a SplitType enum parameter. split_shared() is equivalent to split_type(SplitType::Shared, 0).

7.2.8 allreduce_init (Persistent Collective)

Pre-initializes an allreduce operation for repeated execution with the same parameters (MPI 4.0+). Used by the persistent collective optimization in SS4.2.

#![allow(unused)]
fn main() {
impl Communicator {
    pub fn allreduce_init<T: MpiDatatype>(
        &self,
        send: &[T],
        recv: &mut [T],
        op: ReduceOp,
    ) -> Result<PersistentRequest>
}
}

Preconditions: Same as allreduce (SS7.2.4), plus the MPI implementation must support MPI 4.0 persistent collectives.

Postconditions (on Ok): Returns a PersistentRequest that can be started repeatedly via start() and completed via wait(). Each start/wait cycle performs one allreduce with the pre-bound parameters.

Error type: ferrompi::Error. Error::NotSupported if persistent collectives are unavailable.

Unsafe: No.

7.2.9 Additional Communicator Methods (Summary)

The following methods are available on Communicator but are not directly used by the Cobre backend. They are listed for completeness.

Management:

MethodSignatureDescription
duplicatefn duplicate(&self) -> Result<Self>Duplicates the communicator.
splitfn split(&self, color: i32, key: i32) -> Result<Option<Communicator>>Splits by color/key.
split_typefn split_type(&self, split_type: SplitType, key: i32) -> Result<Option<Communicator>>Splits by type (generalization of split_shared).
processor_namefn processor_name(&self) -> Result<String>Returns the MPI processor name.
raw_handlefn raw_handle(&self) -> i32Returns the raw MPI_Comm handle.

Point-to-Point (blocking):

MethodSignatureDescription
sendfn send<T: MpiDatatype>(&self, data: &[T], dest: i32, tag: i32) -> Result<()>Blocking send.
recvfn recv<T: MpiDatatype>(&self, data: &mut [T], source: i32, tag: i32) -> Result<(i32, i32, i64)>Blocking receive; returns (source, tag, count).

Point-to-Point (nonblocking):

MethodSignatureDescription
isendfn isend<T: MpiDatatype>(&self, data: &[T], dest: i32, tag: i32) -> Result<Request>Nonblocking send; returns Request.
irecvfn irecv<T: MpiDatatype>(&self, data: &mut [T], source: i32, tag: i32) -> Result<Request>Nonblocking receive.

Blocking Collectives (beyond Cobre-used):

MethodDescription
allgatherFixed-count gather-to-all.
reduceReduce to root rank only.
gather / gathervGather to root (fixed / variable count).
scatter / scattervScatter from root (fixed / variable count).
alltoall / alltoallvAll-to-all exchange (fixed / variable count).
scan / exscanInclusive / exclusive prefix scan.
reduce_scatter_blockReduce-scatter with equal block sizes.

Scalar Variants: allreduce_scalar, reduce_scalar, scan_scalar, exscan_scalar – convenience wrappers for single-element operations.

In-Place Variants: allreduce_inplace, reduce_inplace – use MPI_IN_PLACE to avoid separate send/recv buffers.

Nonblocking Collectives: iallreduce, iallgather, iallgatherv, ibroadcast, ibarrier, etc. – all blocking collectives have i-prefixed nonblocking variants returning Request.

Persistent Collectives (MPI 4.0+): allreduce_init, broadcast_init, allgather_init, allgatherv_init, etc. – all blocking collectives have _init-suffixed persistent variants returning PersistentRequest.

7.3 SharedWindow<T> and RMA Types

SharedWindow<T> provides access to MPI-3 shared memory windows (MPI_Win_allocate_shared). Requires the rma Cargo feature. All ranks in the communicator share a contiguous memory region; each rank specifies a local allocation count.

#![allow(unused)]
fn main() {
pub struct SharedWindow<T: MpiDatatype> {
    // Opaque: wraps MPI_Win handle and local/remote memory pointers.
    // The type parameter T determines the element type and alignment.
}
}

7.3.1 SharedWindow::allocate

Allocates a shared memory window. Each rank specifies its local element count. For the Cobre use case, rank 0 allocates the full region and other ranks allocate 0.

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> SharedWindow<T> {
    pub fn allocate(comm: &Communicator, local_count: usize) -> Result<Self>
}
}

Preconditions:

ConditionDescription
comm is a shared-memory communicatorTypically obtained from split_shared(). Using a non-shared communicator is undefined behavior in MPI.
All ranks call collectivelyAll ranks must call allocate (each with its own local_count).
local_count * size_of::<T>() does not overflowTotal byte size per rank fits in MPI_Aint.

Postconditions (on Ok):

ConditionDescription
Shared memory region is allocatedEach rank owns local_count * size_of::<T>() bytes. Rank 0 typically owns the full region.
All ranks can access remote memoryVia remote_slice(rank) (subject to synchronization).
Memory is uninitializedCaller must write initial values (via leader’s local_slice_mut) and call fence() before readers access it.

Error type: ferrompi::Error. Error::Mpi with class Win or allocation-related classes.

Unsafe: No. The internal MPI_Win_allocate_shared FFI call and raw pointer management are encapsulated.

7.3.2 local_slice / local_slice_mut

Access the calling rank’s own portion of the shared memory window.

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> SharedWindow<T> {
    pub fn local_slice(&self) -> &[T]
    pub fn local_slice_mut(&mut self) -> &mut [T]
}
}

Preconditions:

ConditionDescription
For local_slice_mut: no concurrent readsRust’s &mut self borrow checker enforces exclusive access at compile time.

Postconditions:

ConditionDescription
Returns slice over the caller’s local allocationFor the leader (who allocated count elements): full region. For followers (who allocated 0): empty slice.
Exclusive access via &mut self (mutable variant)Rust’s borrow checker prevents concurrent local_slice and local_slice_mut calls.

Error type: Infallible.

Unsafe: No. The &mut self receiver enforces exclusive access at compile time.

7.3.3 remote_slice

Returns a shared reference to the memory region owned by the specified rank.

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> SharedWindow<T> {
    pub fn remote_slice(&self, rank: i32) -> Result<&[T]>
}
}

Preconditions:

ConditionDescription
0 <= rank < self.comm_size()Target rank is valid within the window’s communicator.
No rank is concurrently writing to the target’s regionCaller must ensure a fence() has completed since the last write.

Postconditions:

ConditionDescription
Returns &[T] pointing to rank’s memoryLength equals the local_count that rank passed to allocate.

This is the primary read method for the Cobre shared memory pattern: all ranks call remote_slice(0) to access the leader’s (rank 0) memory region.

Logical safety contract: Same as SS7.3.4 below – the caller must ensure that a fence() has been called after the last write and before any remote_slice call. Violating this produces unspecified values but not undefined behavior in the Rust sense.

Error type: ferrompi::Error if rank is out of range.

Unsafe: No (see logical safety contract above).

7.3.4 fence

Collective synchronization on the shared memory window. All ranks in the window’s communicator must call fence(). Completes all pending RMA operations and establishes a memory barrier.

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> SharedWindow<T> {
    pub fn fence(&self) -> Result<()>
}
}

Preconditions:

ConditionDescription
All ranks call collectivelyDeadlock occurs if any rank does not call fence.

Postconditions (on Ok):

ConditionDescription
All prior writes are visible to all ranksMemory consistency is established across the shared memory region.
Safe to read via remote_sliceUntil the next write + fence cycle.

Error type: ferrompi::Error. Common: Error::Mpi with class Win.

Unsafe: No.

7.3.5 lock / lock_all

Passive-target synchronization for fine-grained access control, as an alternative to fence().

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> SharedWindow<T> {
    pub fn lock(&self, lock_type: LockType, rank: i32) -> Result<LockGuard<'_, T>>
    pub fn lock_all(&self) -> Result<LockAllGuard<'_, T>>
}
}

LockGuard and LockAllGuard are RAII guards that call MPI_Win_unlock / MPI_Win_unlock_all on drop. LockType is Exclusive or Shared. These methods are not used by the Cobre backend (which uses the simpler fence() pattern) but are available for advanced use cases.

7.3.6 Other SharedWindow Methods

MethodSignatureDescription
raw_handlefn raw_handle(&self) -> i32Returns the raw MPI_Win handle.
comm_sizefn comm_size(&self) -> i32Returns the size of the window’s communicator.

7.3.7 Drop

SharedWindow<T> implements Drop to call MPI_Win_free on the underlying MPI window handle.

#![allow(unused)]
fn main() {
impl<T: MpiDatatype> Drop for SharedWindow<T> {
    fn drop(&mut self) {
        // Calls MPI_Win_free. Must complete before MPI_Finalize (Mpi drop).
    }
}
}

Ordering constraint: All SharedWindow instances must be dropped before the Mpi guard is dropped. Dropping a SharedWindow after MPI_Finalize is undefined behavior in MPI. The Cobre training loop enforces this ordering by dropping shared regions before the FerrompiBackend (see SS2.2).

7.4 Supporting Types

7.4.1 ThreadLevel

Requested MPI threading support level, passed to Mpi::init_thread. Maps directly to the MPI constants with explicit #[repr(i32)] discriminants.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
#[repr(i32)]
pub enum ThreadLevel {
    /// `MPI_THREAD_SINGLE` -- Only one thread will execute.
    Single = 0,

    /// `MPI_THREAD_FUNNELED` -- Only the main thread will make MPI calls.
    Funneled = 1,

    /// `MPI_THREAD_SERIALIZED` -- Only one thread at a time will make MPI calls.
    Serialized = 2,

    /// `MPI_THREAD_MULTIPLE` -- Any thread may make MPI calls at any time.
    Multiple = 3,
}
}

The variants are ordered by increasing capability: Single < Funneled < Serialized < Multiple. The Ord implementation reflects this ordering, so Mpi::init_thread can compare the requested level against the provided level using standard comparison operators.

Cobre requests ThreadLevel::Funneled (see SS2.1) because only the main thread makes MPI calls; Rayon worker threads perform LP solves but never invoke MPI collectives directly.

7.4.2 ReduceOp

Reduction operation for allreduce and other reduction collectives. Maps directly to MPI predefined operations.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(i32)]
pub enum ReduceOp {
    /// `MPI_SUM` -- Element-wise sum.
    Sum = 0,

    /// `MPI_MIN` -- Element-wise minimum.
    Min = 2,

    /// `MPI_MAX` -- Element-wise maximum.
    Max = 1,

    /// `MPI_PROD` -- Element-wise product.
    Prod = 3,
}
}

Cobre uses Sum and Min (mapped from cobre_comm::ReduceOp; see SS1.2).

7.4.3 MpiDatatype (Sealed Trait)

Sealed marker trait for types that can be transmitted via MPI. A type implementing MpiDatatype has a corresponding MPI datatype tag and a fixed in-memory representation suitable for direct byte transmission. The trait cannot be implemented by downstream crates (sealed).

#![allow(unused)]
fn main() {
/// Sealed marker trait for MPI-transmissible types.
///
/// The trait is sealed: only types with implementations in ferrompi
/// can satisfy the bound. This prevents unsound transmissions of
/// types whose layout does not match an MPI datatype.
pub trait MpiDatatype: sealed::Sealed + Copy + Send + 'static {
    /// Returns the datatype tag identifying the MPI type.
    const TAG: DatatypeTag;
}
}

Built-in implementations:

Rust TypeDatatypeTag VariantMPI Datatype
f32DatatypeTag::F32MPI_FLOAT
f64DatatypeTag::F64MPI_DOUBLE
i32DatatypeTag::I32MPI_INT
i64DatatypeTag::I64MPI_LONG_LONG
u8DatatypeTag::U8MPI_UNSIGNED_CHAR
u32DatatypeTag::U32MPI_UNSIGNED
u64DatatypeTag::U64MPI_UNSIGNED_LONG_LONG

The MpiDatatype bound on all generic ferrompi methods (allgatherv, allreduce, broadcast, SharedWindow::allocate) ensures that only MPI-compatible types can be transmitted. The Copy supertrait guarantees that the type has no drop glue and can be safely memcpy’d, which matches MPI’s byte-oriented transmission model.

Relationship to Cobre’s CommData: Cobre’s CommData trait (Communicator Trait §1.2) is a blanket Copy + Send + Sync + 'static trait. The FerrompiBackend constrains T: CommData at the trait level, but the ferrompi calls require T: MpiDatatype. Since all MpiDatatype implementors satisfy CommData (they are all Copy + Send + Sync + 'static), the backend’s generic bounds are compatible. The Cobre training loop only transmits f64, u8, and u32 – all of which implement MpiDatatype.

7.4.4 DatatypeTag

Discriminant enum identifying MPI datatypes. Used as the associated constant in MpiDatatype::TAG.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[repr(i32)]
pub enum DatatypeTag {
    F32 = 0, F64 = 1, I32 = 2, I64 = 3, U8 = 4, U32 = 5, U64 = 6,
}
}

7.4.5 Error

Error type returned by all fallible ferrompi operations. A thiserror-derived enum with variants for different failure modes.

#![allow(unused)]
fn main() {
#[derive(thiserror::Error, Debug)]
pub enum Error {
    /// MPI has already been initialized (double init).
    #[error("MPI has already been initialized")]
    AlreadyInitialized,

    /// An MPI library-level error with classified error class.
    #[error("MPI error: {message} (class={class}, code={code})")]
    Mpi {
        class: MpiErrorClass,
        code: i32,
        message: String,
    },

    /// Invalid buffer argument (null, misaligned, or wrong size).
    #[error("Invalid buffer")]
    InvalidBuffer,

    /// The requested operation is not supported by this MPI implementation.
    #[error("Operation not supported: {0}")]
    NotSupported(String),

    /// An internal error in ferrompi (should not occur in normal use).
    #[error("Internal error: {0}")]
    Internal(String),
}

impl Error {
    /// Construct an Error from a raw MPI error code.
    pub fn from_code(code: i32) -> Self;

    /// Check a raw MPI error code; return Ok(()) for MPI_SUCCESS.
    pub fn check(code: i32) -> Result<()>;
}

/// Convenience type alias used throughout ferrompi.
pub type Result<T> = std::result::Result<T, Error>;
}

The Error::Mpi variant carries the classified MpiErrorClass, the raw integer code, and a human-readable message obtained from MPI_Error_string. The from_code and check constructors are used internally by ferrompi to convert raw MPI return codes.

7.4.6 MpiErrorClass

Classification of MPI error codes. Maps MPI error classes to Rust enum variants for pattern matching in error conversion logic (see SS5.2). Contains 24+ variants covering the standard MPI error classes plus a Raw(i32) fallback for unrecognized classes.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum MpiErrorClass {
    Success,    // MPI_SUCCESS
    Buffer,     // MPI_ERR_BUFFER
    Count,      // MPI_ERR_COUNT
    Type,       // MPI_ERR_TYPE
    Tag,        // MPI_ERR_TAG
    Comm,       // MPI_ERR_COMM
    Rank,       // MPI_ERR_RANK
    Request,    // MPI_ERR_REQUEST
    Root,       // MPI_ERR_ROOT
    Group,      // MPI_ERR_GROUP
    Op,         // MPI_ERR_OP
    Topology,   // MPI_ERR_TOPOLOGY
    Dims,       // MPI_ERR_DIMS
    Arg,        // MPI_ERR_ARG
    Unknown,    // MPI_ERR_UNKNOWN
    Truncate,   // MPI_ERR_TRUNCATE
    Other,      // MPI_ERR_OTHER
    Intern,     // MPI_ERR_INTERN
    InStatus,   // MPI_ERR_IN_STATUS
    Pending,    // MPI_ERR_PENDING
    Win,        // MPI_ERR_WIN
    Info,       // MPI_ERR_INFO
    File,       // MPI_ERR_FILE
    Raw(i32),   // Any unrecognized error class
}
}

Note: The real MpiErrorClass does not have a dedicated NoMem variant (unlike the speculative API). Memory allocation failures from MPI_ERR_NO_MEM surface as MpiErrorClass::Other or MpiErrorClass::Raw(MPI_ERR_NO_MEM). The error conversion in SS5.2 handles this by falling through to CommError::CollectiveFailed, with the human-readable message providing the diagnostic detail.

7.4.7 Other Supporting Types

TypeDescription
SplitTypeEnum with Shared = 0 variant for split_type. split_shared() uses this internally.
RequestHandle for nonblocking MPI operations. Methods: wait(), test(), cancel().
PersistentRequestHandle for MPI 4.0+ persistent operations. Methods: start(), wait(), test().
StatusMessage metadata from receive operations: source rank, tag, element count.
InfoRAII wrapper for MPI_Info objects. RAII Drop calls MPI_Info_free.
LockTypeEnum: Exclusive, Shared. Used with SharedWindow::lock().
LockGuard<'a, T>RAII guard from SharedWindow::lock(). Drop calls MPI_Win_unlock.
LockAllGuard<'a, T>RAII guard from SharedWindow::lock_all(). Drop calls MPI_Win_unlock_all.

7.5 Unsafe Boundary Summary

No public ferrompi method or function uses the Rust unsafe keyword. All unsafe code is internal to the ferrompi crate, concentrated in:

  1. FFI calls – Every MPI C function call (MPI_Init_thread, MPI_Comm_rank, MPI_Allgatherv, MPI_Win_allocate_shared, MPI_Win_free, etc.) is wrapped in an unsafe block within the ferrompi implementation.
  2. Send + Sync implsCommunicator implements Send + Sync via unsafe impl, justified by the fact that the handle wraps an integer into a C-side table with no thread-local state. Under MPI_THREAD_FUNNELED, only the main thread issues MPI calls; Send + Sync permits safe sharing of the handle without implying concurrent MPI invocation.
  3. Raw pointer dereferenceSharedWindow::local_slice, SharedWindow::local_slice_mut, and SharedWindow::remote_slice dereference the raw pointer obtained from MPI_Win_allocate_shared. The pointer is guaranteed valid by MPI for the lifetime of the window.

The public API exposes safe Rust types and borrows. Callers are not required to write unsafe code to use ferrompi. The SharedWindow::remote_slice method has a logical safety contract (SS7.3.3) but not a Rust unsafe contract – violating the contract produces unspecified values, not undefined behavior.

Cross-References

  • Communicator Trait §1Communicator trait definition, CommData, ReduceOp, CommError type definitions implemented by this backend
  • Communicator Trait §2 – Method contracts (preconditions, postconditions, determinism guarantees) that this backend preserves by delegation to ferrompi
  • Communicator Trait §3 – Generic parameterization pattern (train<C: Communicator>) enabling zero-cost monomorphization of this backend
  • Communicator Trait §4SharedMemoryProvider trait, SharedRegion<T> lifecycle phases, leader/follower pattern, drop behavior table
  • Communicator Trait §4.6CommError::AllocationFailed variant used for shared memory allocation failure mapping
  • Communication Patterns §1.1 – ferrompi API signatures (comm.allgatherv, comm.allreduce, comm.broadcast, comm.barrier) wrapped by this backend
  • Communication Patterns §4 – Persistent collectives (allreduce_init, allgatherv_init) used as internal optimization (SS4.2)
  • Communication Patterns §5SharedWindow<T> capabilities (window creation, intra-node grouping, read access, write synchronization) wrapped by FerrompiRegion<T>
  • Hybrid Parallelism §1.2 – ferrompi capabilities table: Communicator is Send + Sync, SharedWindow<T>, collectives API, threading level
  • Hybrid Parallelism §6 – MPI initialization sequence (Steps 1-3) implemented by FerrompiBackend::new()
  • Backend Registration and Selection §1.2 – Feature flag matrix; mpi feature gates the ferrompi crate dependency
  • Backend Registration and Selection §4 – Factory pattern returning concrete FerrompiBackend type in single-feature builds
  • Solver Abstraction §10 – Compile-time selection pattern via generic parameters and Cargo feature flags; the architectural precedent for this backend’s zero-cost design