Releases: capitalone/DataProfiler
Releases · capitalone/DataProfiler
v0.4.0
New Features
- Reduce profiling memory usage by ~50%
- Reduce profiling runtime by >75%
- Improve delimiter and header detection in delimited (CSV) data
- Add progress notifications for profiling
Fixes
- Adds warnings for sampling
- Selects proper options on profile mergers
- Fix repeated tensorflow warnings
- Thresholds input for large CSV files by bytes or lines (whichever is smaller)
v0.3.5
- Enhancement: 50-90% reduced profiling time
- Improved methods for unique row and null-in-row prediction(s)
- Enhancement: Users can now select header row for delimited files
- Bug Fix: Added header detection on delimited files with only strings
v0.3.4
- Significantly improved header detection on structured datasets
- Updated model
- New entities:
DATE
,TIME
,US_STATE
,DRIVERS_LICENSE
- Removed entities:
INTEGER_BIG
- New entities:
- New [easier] way to extend labels to the model
- ML requirements installed separately via
pip install dataprofiler[ml]
- required for labeler - Profiler & Labeler only load TensorFlow when necessary
- Minor bug fixes & improved testing
v0.3.2
- TensorFlow only runs when a labeler executes
- Improved CSV detection
- 2-8x memory reduction in profiling
- Various bug fixes
v0.3.1
- Dramatically reduced memory requirements for the data labeler
- Renamed the module: data_profiler -> dataprofiler
- Improved delimiter (CSV) file detection
v0.3.0
Initial Data Profiler release.
Load a file. Extract profile. Save output.
See README.md for full information regarding release.