Skip to content

Latest commit

 

History

History
14 lines (11 loc) · 262 Bytes

README.md

File metadata and controls

14 lines (11 loc) · 262 Bytes

Spark-lean

Spark-lean, an interactive PySpark-based Data Cleaning Library

Features

  • Data versioning
  • Missing value detection
  • Text cleaning
  • Featurization
  • String Matching
  • Anomaly detectation

Installation

pip install spark-lean