Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: python imports #438

Merged
merged 9 commits into from
Jun 2, 2024
Merged

Conversation

nohehf
Copy link
Contributor

@nohehf nohehf commented May 28, 2024

Hello @hendrikvanantwerpen, this fixes #430.
This issue was that a symbol was poped for each part of the file path, but it was pushed (on import) only for each part of the import. (eg. it pushed Users, nohehf, module, foo and popped module, foo).
Languages like python needs knowledge about the "root path", as modules are resolved relative to the "input script" as stated here: https://docs.python.org/3/tutorial/modules.html#the-module-search-path

Here is my fix proposal, and I believe it could be useful for several other languages:

  • introduces a ROOT_PATH global variable, that allows to compute the relative file path. This is particularly useful for interpreted languages like python that creates the modules paths relative to a base path.
  • adds a test that reproduces the error as described in Python module resolution not working via tree-sitter-stack-graphs-python cli #430, it did fail before the changes and now passes
  • FILE_PATH is now passed explicitly, rather than extracted from stack_graph[self.file] as it's stated in
    /// Adds a file to the stack graph. There can only ever be one file with a particular name in
    /// the graph. If a file with the requested name already exists, we return `Err`; if it
    /// doesn't already exist, we return `Ok`. In both cases, the value of the result is the
    that it is not necessarily the file path.

Let me know what you think.

@nohehf nohehf requested a review from a team as a code owner May 28, 2024 09:09
@nohehf nohehf changed the title feat: add root path and pass file path explicitely fix: python imports May 28, 2024
@hendrikvanantwerpen
Copy link
Collaborator

I think in general this is the right idea. I'll look into the details later today.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a PR! The idea is correct, but I think it's better to move setting FILE_PATH and ROOT_PATH up instead of pushing them down. I've left detailed comments on what I have in mind. I think this might make the change a little less invasive as well. Let me know if you have questions!

Comment on lines 439 to 440
source_path,
source_root,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pushing source path and root path all the way down, I think it's better to set FILE_PATH_VAR and ROOT_PATH_VAR in the globals object here. I expect that at some point this will need to be more flexible still, to allow language specific logic for computing source roots and relative paths (but that's a long term thing). But moving the place where we set those variables is in line with that idea. I think it will also make the change less invasive.

Comment on lines 561 to 562
source_path: &Path,
source_root: &Path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters won't be necessary anymore, and other places where they are propagated.

Comment on lines 662 to 674
if globals.get(&FILE_PATH_VAR.into()).is_none() {
let file_name = self.stack_graph[self.file].to_string();
let file_name = self.source_path.to_str().unwrap().to_string();
globals
.add(FILE_PATH_VAR.into(), file_name.into())
.expect("Failed to set FILE_PATH");
}

if globals.get(&ROOT_PATH_VAR.into()).is_none() {
let root_path = self.source_root.to_str().unwrap().to_string();
globals
.add(ROOT_PATH_VAR.into(), root_path.into())
.expect("Failed to set ROOT_PATH");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block would be removed here, and move into the indexer. The conditionals won't be needed there anymore.

Comment on lines 336 to 337
let mut globals = Variables::new();
Self::load_globals_from_config_str(&config, &mut globals)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add to globals here as well, but I think it is nicer to just require those variables to be set in builtins.cfg. See for example:

https://github.com/github/stack-graphs/blob/558d1da3c27dfe50a45cc9acbee66bc5d898d314/languages/tree-sitter-stack-graphs-typescript/src/builtins.cfg.

@@ -13,6 +13,7 @@
;; ^^^^^^^^^^^^^^^^

global FILE_PATH
global ROOT_PATH
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set a default value here, it makes life a bit easier because you don't have to set ROOT_PATH unless you need to.

Suggested change
global ROOT_PATH
global ROOT_PATH = ""

@@ -428,10 +428,19 @@ impl<'a> Indexer<'a> {
cancellation_flag: &dyn CancellationFlag,
) -> std::result::Result<(), BuildErrorWithSource<'b>> {
let relative_source_path = source_path.strip_prefix(source_root).unwrap();
// here the file should also have stripped the source_root from its path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to use the relative path for the file. This would very easily lead to conflicts, for example if you index multiple source directories with the CLI. But if we set the global variables explicitly, it won't be an issue that the scope graph records the full path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep 100% agreed, this is an old comment I forgot to remove

@nohehf
Copy link
Contributor Author

nohehf commented May 29, 2024

Thanks for the review, will look into it and make those changes by the end of the weekend I believe. I agree moving it in the globals is overall a good approach for the reasons you mentioned, however don't you think that this could lead to some runtime errors as it will not be checked anymore ? I think at least file path (but probably root path too) are standard and should be required somewhere (/ checked at compile time) ?

@nohehf
Copy link
Contributor Author

nohehf commented May 30, 2024

I implemented the changes, indeed it's a far better cascade. I also thought that we might want to create variables directly from the file path, as it is always needed Variables::from_file_path(&Path) or something like this (it would reduce a bit of the code duplication but not sure it's needed). Also, this approach makes it that FILE_PATH could be missing, and that will crash at runtime (but tests should ensure this I believe)

@hendrikvanantwerpen
Copy link
Collaborator

don't you think that this could lead to some runtime errors as it will not be checked anymore

It is true that callers can now forget this, which would lead to a runtime error. I think the impact of this problem will be low, since any attempt to run the TSG without the right variables set will fail. (It's not silently set to a default unless one is explicitly set in the TSG.) So any basic test triggering this code path should immediately reveal the problem.

The idea we have for long term is that callers wouldn't have to set these variables explicitly. Instead, languages would have some kind of source folder analyzer as part of their configuration. That would determine which files in a directory can be analyzed as part of the languages and which variables are set for them.

That being said, with these changes we are in an awkward middle. We might keep the FILE_PATH fallback in the library, so that when it's not set already, we set it to the file path associated with the file.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments. I think this is going in the right direction! I left a few more remarks, but these are relatively small. The overall structure looks good now.

Comment on lines 651 to 655

// FILE_PATH is mandatory
globals
.get(&FILE_PATH_VAR.into())
.expect("FILE_PATH not set");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is the right approach. While FILE_PATH is required in all TSG's we actually use, there's nothing inherently here that requires it.

I'm thinking we might just keep the old code here. It will act as a fallback and we don't unnecessarily introduce a breaking change here.

Comment on lines 80 to 86
let builtins_path_var = Path::new("<builtins>");
builtins_globals
.add(
FILE_PATH_VAR.into(),
builtins_path_var.to_str().unwrap().into(),
)
.expect("failed to add file path variable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not using this anywhere else, I'd just inline `builtins_path_var here.

Suggested change
let builtins_path_var = Path::new("<builtins>");
builtins_globals
.add(
FILE_PATH_VAR.into(),
builtins_path_var.to_str().unwrap().into(),
)
.expect("failed to add file path variable");
builtins_globals
.add(
FILE_PATH_VAR.into(),
"<builtins>".into(),
)
.expect("failed to add file path variable");

Comment on lines 338 to 345
let file_name = path.to_string_lossy();
let file: stack_graphs::arena::Handle<stack_graphs::graph::File> =
graph.add_file(&file_name).unwrap();
let mut globals = Variables::new();

globals
.add(FILE_PATH_VAR.into(), path.to_str().unwrap().into())
.expect("failed to add file path variable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builtins are a bit special and the actual path is not relevant. They are usually definitions that are implicitly defined, like a runtime library or something.

I'm not sure why we are using path here in add_file. I think we should be using "<builtins>" here as well, both in add_file and as the value in globals. As a little code cleanup we could define a top-level constant BUILTINS_FILENAME and replace the literals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if having chevrons could cause issues (as it's not a standard path character I believe). But it was working as is so let's keep it

Comment on lines 18 to 19
static FILE_PATH_VAR: &'static str = "FILE_PATH";

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the constant in the crate public instead of redefining it here.

Comment on lines 438 to 441
static ROOT_NODE_VAR: &'static str = "ROOT_NODE";
static JUMP_TO_SCOPE_NODE_VAR: &'static str = "JUMP_TO_SCOPE_NODE";
static FILE_PATH_VAR: &'static str = "FILE_PATH";
static ROOT_PATH_VAR: &'static str = "ROOT_PATH";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static ROOT_NODE_VAR: &'static str = "ROOT_NODE";
static JUMP_TO_SCOPE_NODE_VAR: &'static str = "JUMP_TO_SCOPE_NODE";
static FILE_PATH_VAR: &'static str = "FILE_PATH";
static ROOT_PATH_VAR: &'static str = "ROOT_PATH";
/// Name of the variable used to pass the root node.
pub static ROOT_NODE_VAR: &'static str = "ROOT_NODE";
/// Name of the variable used to pass the jump-to-scope node.
pub static JUMP_TO_SCOPE_NODE_VAR: &'static str = "JUMP_TO_SCOPE_NODE";
/// Name of the variable used to pass the file path.
/// If a root path is given, it should be a descendant the root path.
pub static FILE_PATH_VAR: &'static str = "FILE_PATH";
/// Name of the variable used to pass the root path.
/// If given, should be an ancestor of the file path.
pub static ROOT_PATH_VAR: &'static str = "ROOT_PATH";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made those consts

@nohehf
Copy link
Contributor Author

nohehf commented Jun 1, 2024

Thanks for the review, I added the requested changes

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Two places where I would change setting the file path variable to a fallback, so config can override it.

Some small style nits, but this looks good!

Comment on lines 337 to 338
let file: stack_graphs::arena::Handle<stack_graphs::graph::File> =
graph.add_file(&file_name).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the type annotation is necessary here, that should be possible to infer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my bad, probably miss-clicked on the annotation in my IDE

Comment on lines 341 to 343
globals
.add(FILE_PATH_VAR.into(), BUILTINS_FILENAME.into())
.expect("failed to add file path variable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move after the load globals below, and make conditional, so the config can override the value.

Copy link
Contributor Author

@nohehf nohehf Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it before was intended, load_globals_from_config_str will override the default value if any, no need for a conditional I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, overriding with add will return an error if it already exists. I believe just overriding could be nice but will do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yes, if the load functions would add or set, that would also work.

Comment on lines 82 to 84
builtins_globals
.add(FILE_PATH_VAR.into(), BUILTINS_FILENAME.into())
.expect("failed to add file path variable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: move below the load globals and make optional so the config can override it.

@@ -4,11 +4,11 @@
// Licensed under either of Apache License, Version 2.0, or MIT license, at your option.
// Please see the LICENSE-APACHE or LICENSE-MIT files in this distribution for license details.
// ------------------------------------------------------------------------------------------------

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this.

Copy link
Collaborator

@hendrikvanantwerpen hendrikvanantwerpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, except the conditions are inverted. I'll commit those suggestions myself and start a CI run to see how we are doing.

tree-sitter-stack-graphs/src/loader.rs Outdated Show resolved Hide resolved
tree-sitter-stack-graphs/src/loader.rs Outdated Show resolved Hide resolved
Comment on lines 103 to 108
globals
.add(
FILE_PATH_VAR.into(),
fragments.path.to_str().unwrap().into(),
)
.expect("failed to add file path variable");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failure is probably because this is still before loading the globals from the test, which can set file path too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a add_if_not_set (not sure about the naming) convenience function to Variables would be a nice addition. Let me fix this

@nohehf
Copy link
Contributor Author

nohehf commented Jun 2, 2024

Okay, except the conditions are inverted. I'll commit those suggestions myself and start a CI run to see how we are doing.

Oh sorry my bad on this one, did this late and did not pay attention, sorry

@hendrikvanantwerpen hendrikvanantwerpen merged commit 3c4d1a6 into github:main Jun 2, 2024
8 checks passed
@hendrikvanantwerpen
Copy link
Collaborator

🎉 Thanks for the contribution @nohehf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python module resolution not working via tree-sitter-stack-graphs-python cli
2 participants