An overview of Elixir Experimental.Flow module that allows developers to express processing steps for collections, like Stream, but utilizing the power of parallel execution. This repository allows You to compare processing of data using multiple approaches (Eager
, Lazy
and Concurrent
) with some differently sized datasets.
This is a supplementary repository for my Kyiv Elixir Meetup 3.1 talk on 2016/11/03.
You can find slides on Slideshare or in Experimental.Flow.pdf
file in the root folder of this repository.
The video of this talk is available from YouTube. Sorry but this talk was on Russian language!
-
Clone
-
mix deps.get
-
Unpack
7zip
archives with test files in files dir -
You can run Your code from
try_me.exs
withmix run try_me.exs
command
The lib
dir contains different implementations of the same task: parsing dataset to count how often each word occurs in the text. Every solution has two implementations of the reducer step utilizing Map or ETS as accumulator.
-
eager.ex
— the most straightforward approach: loading the whole file into memory for processing all data at once (this will fail on a large datasets) -
lazy.ex
— this approach usesStream
to read file content lazily and process data string by string -
flow.ex
— file will be read lazily string by string and all strings will be processed concurrently in a set of processes utilizingFlow
-
flow_window_trigger.ex
— contains sample code to show Windows and Triggers concepts for aFlow
-
timer.ex
,words.ex
andets.ex
— contains some helper functions.
Set path_to_file
and path_to_dir
to point to the text files You want to use as a data sources to test Your functions. You can comment/uncomment parts of the code to run only particular functions that You're interested to compare.
This directory contains a text files that can be used as an input for testing Your functions:
small.txt
— ~5.33 MB TXT Ebook from Project Gutenbergmedium.txt
— ~32.6 MB concatenation of 50 Project Gutenberg EBookslarge.txt
— ~196 MB multiple concatenations ofmedium.txt
parts_medium
andparts_large
— it's justmedium.txt
andlarge.txt
split into 3 parts