v4.2.0
API:
- Add
tfds build
to the CLI. See documentation. - DownloadManager now returns Pathlib-like objects
- Datasets returned by
tfds.as_numpy
are compatible withlen(ds)
- New
tfds.features.Dataset
to represent nested datasets - Add
tfds.ReadConfig(add_tfds_id=True)
to add a unique id to the exampleex['tfds_id']
(e.g.b'train.tfrecord-00012-of-01024__123'
) - Add
num_parallel_calls
option totfds.ReadConfig
to overwrite to defaultAUTOTUNE
option tfds.ImageFolder
now supporttfds.decode.SkipDecoder
- Add multichannel audio support to
tfds.features.Audio
- Better
tfds.as_dataframe
visualization (ffmpeg video if installed, bounding boxes,...) - Add
try_gcs
totfds.builder(..., try_gcs=True)
- Simpler
BuilderConfig
definition: classVERSION
andRELEASE_NOTES
are applied to allBuilderConfig
. Config description is now optional.
Breaking compatibility changes:
- Removed configs for all text datasets. Only plain text version is kept. For example:
multi_nli/plain_text
->multi_nli
. - To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to
str
,bytes
andint
). New errors likely indicates an issue in the dataset implementation. tfds.core.benchmark
now returns apd.DataFrame
(instead of adict
)tfds.units
is not visible anymore from the public API
Bug fixes:
- Support 0-len sequence with images of dynamic shape (Fix #2616)
- Progression bar correctly updated when copying files.
- Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
- Better debugging and error message (e.g. human readable size,...)
- Allow
max_examples_per_splits=0
intfds build --max_examples_per_splits=0
to test_split_generators
only (without_generate_examples
).
And of course, many new datasets and datasets updates.
Thank you the community for their many valuable contributions and to supporting us in this project!!!