Skip to content

Releases: capitalone/DataProfiler

v0.4.0

25 Mar 03:04
f76ed25
Compare
Choose a tag to compare

New Features

  • Reduce profiling memory usage by ~50%
  • Reduce profiling runtime by >75%
  • Improve delimiter and header detection in delimited (CSV) data
  • Add progress notifications for profiling

Fixes

  • Adds warnings for sampling
  • Selects proper options on profile mergers
  • Fix repeated tensorflow warnings
  • Thresholds input for large CSV files by bytes or lines (whichever is smaller)

v0.3.5

16 Mar 21:06
f63cad6
Compare
Choose a tag to compare
  • Enhancement: 50-90% reduced profiling time
    • Improved methods for unique row and null-in-row prediction(s)
  • Enhancement: Users can now select header row for delimited files
  • Bug Fix: Added header detection on delimited files with only strings

v0.3.4

12 Mar 19:28
5e5f64e
Compare
Choose a tag to compare
  • Significantly improved header detection on structured datasets
  • Updated model
    • New entities: DATE, TIME, US_STATE, DRIVERS_LICENSE
    • Removed entities: INTEGER_BIG
  • New [easier] way to extend labels to the model
  • ML requirements installed separately via pip install dataprofiler[ml] - required for labeler
  • Profiler & Labeler only load TensorFlow when necessary
  • Minor bug fixes & improved testing

v0.3.2

04 Mar 05:09
7c05449
Compare
Choose a tag to compare
  • TensorFlow only runs when a labeler executes
  • Improved CSV detection
  • 2-8x memory reduction in profiling
  • Various bug fixes

v0.3.1

23 Feb 20:49
93a9b6e
Compare
Choose a tag to compare
  • Dramatically reduced memory requirements for the data labeler
  • Renamed the module: data_profiler -> dataprofiler
  • Improved delimiter (CSV) file detection

v0.3.0

11 Feb 20:01
07e8b3b
Compare
Choose a tag to compare

Initial Data Profiler release.
Load a file. Extract profile. Save output.
See README.md for full information regarding release.