Python implementation of the MapReduce concept using books sourced from https://www.gutenberg.org/
This application currently does:
- Read input files or directories.
- Parse the given file(s).
- Filtering out all multiple spaces.
- Removal of all punctuation.
- A statistical analysis of the average characters and words per line.
- A statistical analysis of the median characters and words per line.
- Scatter plot generation of the statistical analyses.
Using the data sets in the ./data/ directory, a static analysis is done to calculate the average and median character counts per line (c/l) and the average and median word counts per line (w/l).
Features to be implemented:
- Mapping standard input.
- Filtering standard input.
- Reducing the data.