Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DataCatalog private methods with DataCatalog 2.0 API public methods #2274

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

SajidAlamQB
Copy link
Contributor

Description

Related to: #2273

This PR replaces references to the private _get_dataset() with Kedro’s public catalog.get() method.

To maintain backward compatibility for older Kedro versions that do not include KedroDataCatalog, introduced a version/feature check that falls back to _get_dataset() only if necessary.

Development notes

  • Added an IS_DATACATALOG_2 flag to detect whether KedroDataCatalog is available.
  • Where possible, replaced _get_dataset() calls with catalog.get().
  • Fallback to _get_dataset() is used only for older Kedro versions.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Sajid Alam <[email protected]>
@@ -6,6 +6,13 @@

from kedro.io import DataCatalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to import this only as a fallback if there's an importError when we are in an older kedro version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need DataCatalog as the “baseline” import so our code is backward compatible with older Kedro (which doesn’t have KedroDataCatalog)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the line 7 be inside except ImportError at line 13 ? Normally that is how we were doing bc before. Wouldn't this raise error if DataCatalog is not available in future ?

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

Signed-off-by: Sajid Alam <[email protected]>
dataset = self._catalog.get(dataset_name)
else:
dataset = self._catalog._get_dataset(dataset_name)
metadata = getattr(dataset, "metadata", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we handling metadata in 2.0 ?

@@ -91,7 +98,10 @@ def resolve_dataset_factory_patterns(

for dataset_name in datasets:
try:
catalog._get_dataset(dataset_name, suggest=False)
if IS_DATACATALOG_2 and isinstance(catalog, KedroDataCatalog):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not regarding this PR but I remember Elena mentioned resolve_dataset_factory_patterns will happen on kedro side. Do we still need to have this function here? Can we call this function only when it is not IS_DATACATALOG_2 ?

@ravi-kumar-pilla
Copy link
Contributor

Hi @SajidAlamQB , Nice work ! I have left few comments. May be you can also update release note for tracking this work ?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace DataCatalog private methods with DataCatalog 2.0 API public methods.
3 participants