Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed schema related issues for census example. #267

Closed

Conversation

pritamdodeja
Copy link

Summary of Changes

It appears some of the schema related functionality was pointing to old locations, which was causing census_example_v2.py to break. This pull request fixes these schema related issues and makes it so that census_example_v2.py now works with

python census_example_v2.py --input_data_dir data --working_dir working_area

Details

dataset_metadata has been replaced by schema_utils in imports and
RecordBatchToExamplesEncoder is now imported from tfx_bsl.coders.example_coder.
schema_from_feature_spec is now being called from schema_utils, and
RecordBatchToExamplesEncoder is now passed the schema, which was missing
earlier.
_RAW_DATA_METADATA schema creation also fixed by pointing it to
schema_utils.from_feature_spec. The net result is census_example_v2.py
now works as expected.

dataset_metadata has been replaced by schema_utils in imports and
RecordBatchToExamplesEncoder is now imported from tfx_bsl.coders.example_coder.
schema_from_feature_spec is now being called from schema_utils, and
RecordBatchToExamplesEncoder is now passed the schema, which was missing
earlier.
_RAW_DATA_METADATA schema creation also fixed by pointing it to
schema_utils.from_feature_spec.  The net result is census_example_v2.py
now works as expected.
@zoyahav
Copy link
Member

zoyahav commented May 4, 2022

Were you testing it previously against tfx_bsl's master branch or just a specific release of it? (similarly for the tensorflow_transform version)
Note that these examples are tested continuously against the latest master branch versions.

Using dataset_metadata.DatasetMetadata.from_feature_spec is actually correct, it was introduced after the latest release of tensorflow_transform, see https://github.com/tensorflow/transform/blob/master/RELEASE.md#major-features-and-improvements.
Similar story for tfx_bsl.public.tfxio.RecordBatchToExamplesEncoder - https://github.com/tensorflow/tfx-bsl/blob/93743faf28953a8b32780d45eb3014f525f5cb5d/tfx_bsl/public/tfxio/__init__.py#L22.
Its schema argument is also optional, did you see a need to pass schema to RecordBatchToExamplesEncoder?

@pritamdodeja
Copy link
Author

Thanks for your feedback. I was working through a lab for this course, and some of tft's functionality appeared broken in qwiklabs, so I came to this repo to understand better. I'm going to rebuild my environment so I can get these examples to run without issue, and go back to that class to see if I can figure out why things weren't working as expected. Thank you!

@pritamdodeja
Copy link
Author

Everything works as expected after re-installing tensorflow-transform, apache-beam[gcp], pyarrow, tensorflow-metadata, and tfx-bsl to match exact versions listed on support matrix. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants