Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Configuration Loader to Handle Paths Containing dots #3004

Closed
Tracked by #3123
erwinpaillacan opened this issue Sep 3, 2023 · 9 comments · Fixed by #3145
Closed
Tracked by #3123

Fix Configuration Loader to Handle Paths Containing dots #3004

erwinpaillacan opened this issue Sep 3, 2023 · 9 comments · Fixed by #3145
Assignees
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@erwinpaillacan
Copy link

erwinpaillacan commented Sep 3, 2023

Description

When working within Databricks workflows, it's possible to create a job that utilizes a notebook hosted inside a repository. Each time the job is executed, the code is automatically cloned to a path that depends on the commit. In my case, I have hosted a notebook that runs the Kedro project within the .azuredevops directory.

To locate the current project root, we can employ a regular expression (regex) pattern:

pattern = r"^/Workspace/Repos/.internal/(?!./.azuredevops)."

With this regex, we can effectively identify the project root. For example, one possible path is:

/Workspace/Repos/.internal/02476ba86f_commits/20adcb96c8bec99ac2fad8b78b25158e7d968fa4/

However, there's an issue that arises when a period (dot) appears within the path. This problem is not limited to Databricks; it can also occur locally. It's important to note that the OmegaConfigLoader did not encounter any issues with this path until Kedro version 0.18.12.

Context

Just running a normal kedro project. Which was working untill 0.18.12

Steps to Reproduce

  1. modify you project directory adding some ., for example if you a have a kedro project in Users/yourname/kedro_project you can create a new folder with some point Users/yourname/.somefolder/kedro_project
  2. [Second Step] Enable OmageConfigLoader
  3. [And so on...] Kedro run

Expected Result

kedro should load the session

Actual Result

The config loader is not be able to find any config file

`No files of YAML or JSON format found in /Workspace/Repos/.internal/02476ba86f_commits/20adcb96c8bec99ac2fad8b78b25158e7d968fa4/conf/base or /Workspace/Repos/.internal/02476ba86f_commits/20adcb96c8bec99ac2fad8b78b25158e7d968fa4/conf/databricks_dev matching the glob pattern(s): ['catalog*', 'catalog*/**', '**/catalog*']`

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.18.13
  • Python version used (python -V): 3.10.12
  • Operating system and version: mac
@erwinpaillacan erwinpaillacan reopened this Sep 3, 2023
@ankatiyar ankatiyar added the Issue: Bug Report 🐞 Bug that needs to be fixed label Sep 3, 2023
@noklam
Copy link
Contributor

noklam commented Sep 3, 2023

@erwinpaillacan Potential conflict requirements with #2977

Does it work with other ConfigLoader? And what will be the expected behavior?

We can hard coded for. ipynb_checkpoints but doesn't seem ideal to me.

@erwinpaillacan
Copy link
Author

erwinpaillacan commented Sep 3, 2023

I think we need to ignore internal folders but after project root, not before.
to allow to have project root with dots, which is useful in databricks workflows.

The expected behaviour would be to able to run the kedro project.

If I have my project_root in ~/Projects/Workspace/Repos/.internal/02476ba86f_commits/e3754f960754e93d5dadcd17a74901cf7fb65fda/ I would expect to ignore .folders inside this directory, but dont ignore .internal since is part of the project root

Just tested TemplatedConfigLoader and it works fine.

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Sep 3, 2023
@noklam
Copy link
Contributor

noklam commented Sep 3, 2023

@erwinpaillacan I agree that but we need something more generic. Conf is not necessary inside a project. While it may not be common, one may actually have their conf folder inside a hidden folder but outside project root.

I have some idea to fix this but will need further investigation.

@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Sep 6, 2023
@astrojuanlu
Copy link
Member

Potential root cause (not checked):

if not self._is_hidden(each):
paths.append(Path(each))

def _is_hidden(self, path: str):
"""Check if path contains any hidden directory or is a hidden file"""
path = Path(path).resolve().as_posix()
parts = path.split(self._fs.sep) # filesystem specific separator
HIDDEN = "."
# Check if any component (folder or file) starts with a dot (.)
return any(part.startswith(HIDDEN) for part in parts)

@merelcht
Copy link
Member

When addressing this we'd ideally find a solution for this problem as well as #2977. That fix seems to be the cause of this.

@merelcht merelcht moved this to To Do in Kedro Framework Sep 18, 2023
@ankatiyar ankatiyar moved this from To Do to In Progress in Kedro Framework Oct 5, 2023
@ankatiyar ankatiyar moved this from In Progress to To Do in Kedro Framework Oct 6, 2023
@IngerMathilde
Copy link
Contributor

IngerMathilde commented Oct 8, 2023

I fixed this locally as a quick bugfix like this.

    def _is_hidden(self, path: str) -> bool:
        """Check if path contains any hidden directory or is a hidden file.

        Have to overwrite to fix a bug.
        """
        file_path = Path(path).resolve()
        conf_path = Path(self.conf_source).resolve()
        if file_path.is_relative_to(conf_path):
            relative_path = file_path.relative_to(conf_path).as_posix()
        else:
            # In the rare case the env points to a path.
            # We use if for unit testing
            env_path = Path(self.env).resolve()
            relative_path = file_path.relative_to(env_path).as_posix()
        # filesystem specific separator
        parts = relative_path.split(self._fs.sep)
        hidden = "."
        # Check if any component (folder or file) starts with a dot (.)
        return any(part.startswith(hidden) for part in parts)

Maybe a usefull starting point :). Believe it also still works for the original problem. Could make a PR if you would like.

@ankatiyar ankatiyar moved this from To Do to In Progress in Kedro Framework Oct 9, 2023
@ankatiyar
Copy link
Contributor

@IngerMathilde Thank you for investigating this and sharing the solution! ⭐
Since this ticket is a part of the current sprint and a blocker for the next release, I'd like to take care of this PR!
If you do have the capacity, please check out the Hacktober tickets! We'd love to see contributions from you! :)

@ankatiyar
Copy link
Contributor

@IngerMathilde I've opened a PR for this #3145, would you mind sharing your email id so we can add you as a co-author of that PR? :)

@IngerMathilde
Copy link
Contributor

Hi!
Thanks a lot! You can put it on: [email protected]

@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants