Skip to content

Reactive Network of Operators In Rust. Framework for Parallel and distributed computation inspired from the DataFlow model

License

Notifications You must be signed in to change notification settings

Lucas-LSB04/renoir

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Lucas_Bianchessi
Sep 16, 2024
d77c65d · Sep 16, 2024
Mar 19, 2024
Mar 18, 2024
Jun 24, 2024
Jul 25, 2024
Jul 2, 2024
May 27, 2024
Mar 26, 2023
Jan 20, 2023
Jan 20, 2023
Jun 24, 2024
Jul 2, 2024
Sep 16, 2024
Mar 18, 2024
May 4, 2022

Repository files navigation

Renoir

Preprint

REactive Network of Operators In Rust

API Docs

Renoir (short: Noir) [/ʁənwaʁ/, /nwaʁ/] is a distributed data processing platform based on the dataflow paradigm that provides an ergonomic programming interface, similar to that of Apache Flink, but has much better performance characteristics.

Renoir converts each job into a dataflow graph of operators and groups them in blocks. Blocks contain a sequence of operors which process the data sequentially without repartitioning it. They are the deployment unit used by the system and can be distributed and executed on multiple systems.

The common layout of a Renoir program starts with the creation of a StreamContext, then one or more Sources are initialised creating a Stream. The graph of operators is composed using the methods of the Stream object, which follow a similar approach to Rust's Iterator trait allowing ergonomically define a processing workflow through method chaining.

Examples

Wordcount

use renoir::prelude::*;

fn main() {
    // Convenience method to parse deployment config from CLI arguments
    let (config, args) = RuntimeConfig::from_args();
    config.spawn_remote_workers();
    let env = StreamContext::new(config);

    let result = env
        // Open and read file line by line in parallel
        .stream_file(&args[1])
        // Split into words
        .flat_map(|line| tokenize(&line))
        // Partition
        .group_by(|word| word.clone())
        // Count occurrences
        .fold(0, |count, _word| *count += 1)
        // Collect result
        .collect_vec();
        
    env.execute_blocking(); // Start execution (blocking)
    if let Some(result) = result.get() {
        // Print word counts
        result.into_iter().for_each(|(word, count)| println!("{word}: {count}"));
    }
}

fn tokenize(s: &str) -> Vec<String> {
    // Simple tokenisation strategy
    s.split_whitespace().map(str::to_lowercase).collect()
}

// Execute on 6 local hosts `cargo run -- -l 6 input.txt`

Wordcount associative (faster)

use renoir::prelude::*;

fn main() {
    // Convenience method to parse deployment config from CLI arguments
    let (config, args) = RuntimeConfig::from_args();
    let env = StreamContext::new(config);

    let result = env
        .stream_file(&args[1])
        // Adaptive batching(default) has predictable latency
        // Fixed size batching often leads to shorter execution times
        // If data is immediately available and latency is not critical
        .batch_mode(BatchMode::fixed(1024))
        .flat_map(move |line| tokenize(&line))
        .map(|word| (word, 1))
        // Associative operators split the operation in a local and a
        // global step for faster execution
        .group_by_reduce(|w| w.clone(), |(_w1, c1), (_w2, c2)| *c1 += c2)
        .unkey()
        .collect_vec();

    env.execute_blocking(); // Start execution (blocking)
    if let Some(result) = result.get() {
        // Print word counts
        result.into_iter().for_each(|(_, (word, count))| println!("{word}: {count}"));
    }
}

fn tokenize(s: &str) -> Vec<String> {
    s.split_whitespace().map(str::to_lowercase).collect()
}

// Execute on multiple hosts `cargo run -- -r config.toml input.txt`

Remote deployment

# config.toml
[[host]]
address = "host1.lan"
base_port = 9500
num_cores = 16

[[host]]
address = "host2.lan"
base_port = 9500
num_cores = 24
ssh = { username = "renoir", key_file = "/home/renoir/.ssh/id_ed25519" }

Refer to the examples directory for an extended set of working examples

About

Reactive Network of Operators In Rust. Framework for Parallel and distributed computation inspired from the DataFlow model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 99.3%
  • Jupyter Notebook 0.7%