v0.1.21
Features
- Signal computations are now cached. If a signal fails half-way through, it will be resumed.
- Source loading is much faster, up to 40x faster for some sources (e.g. HuggingFace)
- Map dtype is now supported for parquet sources.
Details
- Add jsonl intermediate caching to signals. Introduce a central spot for this cache abstraction. by @nsthorat in #858
- Rename fast_process to load_to_parquet by @brilee in #862
- Implement fast_process for parquet sources by @brilee in #860
- Implement CSV direct to parquet by @brilee in #863
- Implement fast json source by @brilee in #865
- Add
map<key, value>
dtype. No support in the UI yet. by @dsmilkov in #870 - Implement fast processing for huggingface datasets by @brilee in #869
Bug Fixes & Other Changes
- add development docs on profiling by @brilee in #861
- Add docs for settings and compare mode by @dsmilkov in #859
- Add a nest_under field to dataset.map(). by @nsthorat in #866
- Avoid computing stats for every single field on page load by @dsmilkov in #873
- Fix a sample_size yaml bug by @dsmilkov in #874
- UI fixes for expanding long rows. by @nsthorat in #875
- Fix small bug with compute signal / concepts and filtering by valid dtypes. by @nsthorat in #877
- Add support for
map
field in the schema and UI by @dsmilkov in #878 - Fix a bug with previewing and comparing on repeated values. by @nsthorat in #879
- Allow custom signals to work with dask processes. by @nsthorat in #880
Full Changelog: v0.1.20...v0.1.21