Skip to content

Python implementation of the WordCount MapReduce concept using books sourced from https://www.gutenberg.org/

License

Notifications You must be signed in to change notification settings

allenerocha/pythonmapreduce

Repository files navigation

pythonmapreduce

Travis (.org) Code style: black Codecov https://img.shields.io/badge/license-AGPLv3-green:alt:License:AGPLv3

Python implementation of the MapReduce concept using books sourced from https://www.gutenberg.org/

Features

This application currently does:

  • Read input files or directories.
  • Parse the given file(s).
  • Filtering out all multiple spaces.
  • Removal of all punctuation.
  • A statistical analysis of the average characters and words per line.
  • A statistical analysis of the median characters and words per line.
  • Scatter plot generation of the statistical analyses.

Sample outputs:

Using the data sets in the ./data/ directory, a static analysis is done to calculate the average and median character counts per line (c/l) and the average and median word counts per line (w/l).

Average c/l Median c/l Average w/l Average w/l

TODO

Features to be implemented:

  • Mapping standard input.
  • Filtering standard input.
  • Reducing the data.

About

Python implementation of the WordCount MapReduce concept using books sourced from https://www.gutenberg.org/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published