Skip to content

v0.1.21

Compare
Choose a tag to compare
@nsthorat nsthorat released this 23 Nov 03:04
· 259 commits to main since this release

Features

  • Signal computations are now cached. If a signal fails half-way through, it will be resumed.
  • Source loading is much faster, up to 40x faster for some sources (e.g. HuggingFace)
  • Map dtype is now supported for parquet sources.

Details

  • Add jsonl intermediate caching to signals. Introduce a central spot for this cache abstraction. by @nsthorat in #858
  • Rename fast_process to load_to_parquet by @brilee in #862
  • Implement fast_process for parquet sources by @brilee in #860
  • Implement CSV direct to parquet by @brilee in #863
  • Implement fast json source by @brilee in #865
  • Add map<key, value> dtype. No support in the UI yet. by @dsmilkov in #870
  • Implement fast processing for huggingface datasets by @brilee in #869

Bug Fixes & Other Changes

  • add development docs on profiling by @brilee in #861
  • Add docs for settings and compare mode by @dsmilkov in #859
  • Add a nest_under field to dataset.map(). by @nsthorat in #866
  • Avoid computing stats for every single field on page load by @dsmilkov in #873
  • Fix a sample_size yaml bug by @dsmilkov in #874
  • UI fixes for expanding long rows. by @nsthorat in #875
  • Fix small bug with compute signal / concepts and filtering by valid dtypes. by @nsthorat in #877
  • Add support for map field in the schema and UI by @dsmilkov in #878
  • Fix a bug with previewing and comparing on repeated values. by @nsthorat in #879
  • Allow custom signals to work with dask processes. by @nsthorat in #880

Full Changelog: v0.1.20...v0.1.21