Skip to content

v4.0.0

Compare
Choose a tag to compare
@Conchylicultor Conchylicultor released this 06 Oct 19:15

API changes, new features:

  • Dataset-as-folder: Dataset can now be self-contained module in a folder with checksums, dummy data,... This simplify implementing datasets outside the TFDS repository.
  • tfds.load can now load dataset without using the generation class. So tfds.load('my_dataset:1.0.0') can work even if MyDataset.VERSION == '2.0.0' (See #2493).
  • Add a new TFDS CLI (see https://www.tensorflow.org/datasets/cli for detail)
  • tfds.testing.mock_data does not require metadata files anymore!
  • Add tfds.as_dataframe(ds, ds_info) with custom visualisation (example)
  • Add tfds.even_splits to generate subsplits (e.g. tfds.even_splits('train', n=3) == ['train[0%:33%]', 'train[33%:67%]', ...]
  • Add new DatasetBuilder.RELEASE_NOTES property
  • tfds.features.Image now supports PNG with 4-channels
  • tfds.ImageFolder now supports custom shape, dtype
  • Downloaded URLs are available through MyDataset.url_infos
  • Add skip_prefetch option to tfds.ReadConfig
  • as_supervised=True support for tfds.show_examples, tfds.as_dataframe

Breaking compatible changes:

  • tfds.as_numpy() now returns an iterable which can be iterated multiple times. To migrate next(ds) -> next(iter(ds))
  • Rename tfds.features.text.Xyz -> tfds.deprecated.text.Xyz
  • Remove DatasetBuilder.IN_DEVELOPMENT property
  • Remove tfds.core.disallow_positional_args (should use Py3 *, instead)
  • tfds.features can now be saved/loaded, you may have to overwrite FeatureConnector.from_json_content and FeatureConnector.to_json_content to support this feature.
  • Stop testing against TF 1.15. Requires Python 3.6.8+.

Other bug fixes:

  • Better archive extension detection for dl_manager.download_and_extract
  • Fix tfds.__version__ in TFDS nightly to be PEP440 compliant
  • Fix crash when GCS not available
  • Script to detect dead-urls
  • Improved open-source workflow, contributor guide, documentation
  • Many other internal cleanups, bugs, dead code removal, py2->py3 cleanup, pytype annotations,...

And of course, new datasets, datasets updates.

A gigantic thanks to our community which has helped us debugging issues and with the implementation of many features, especially vijayphoenix@ for being a major contributor.