-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support globbing in imports #2376
base: master
Are you sure you want to change the base?
Conversation
55a246a
to
94fd369
Compare
cd7c5a5
to
dedf056
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. The main one is that we have to handle non-unicode paths, so we should look at a different glob
crate.
src/compiler.rs
Outdated
@@ -75,14 +75,39 @@ impl Compiler { | |||
.join(Self::expand_tilde(&relative.cooked)?) | |||
.lexiclean(); | |||
|
|||
if import.is_file() { | |||
let import_base_str = import.to_string_lossy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just
is expected to work with non-unicode paths, so we can't do a lossy conversation here. I'd suggest checking out the globset
crate, which I believe handles non-unicode paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globset
unfortunately doesn't have a method to iterate over a file system entry and select only paths that match the glob, like the iterator that glob_with
returns with the glob
crate. Looking at how glob_with
is implemented in glob
, it seems like this code is a bit tricky to write and I'd rather avoid it (or if I did write that code, I think it might work out to just implementing path-globbing functionality from scratch, which is also undesirable).
There's also this crate: https://crates.io/crates/wax , which is utf-8 only but claims to reasonably handle sane non-utf-8 text internally, and also seems to have the file system iterator I want. It also hasn't been updated in a bit and claims to be experimental, so I'm kind of iffy about this crate.
Maybe I'm missing something obvious that would make this easier to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's lame. A concern about wax
, any syntax that wax
supports becomes part of the public-facing interface of just
, and so can't be added and removed without a backwards-incompatible change. wax
supports some pretty exotic syntax, like repetitions <a*/:0,>
and alternatives {a?c,x?z,foo}
, which I doubt would be incredibly useful for use in import paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like even globset expects to be given the glob as a string: https://docs.rs/globset/latest/globset/struct.GlobBuilder.html#method.new . Also see this 2019 comment from the maintainer of globset
on glob
: rust-lang/glob#78 (comment) and this globset
issue: BurntSushi/ripgrep#1250 .
I'm starting to think that there's no principled way to support globbing on non-utf8 filesystems without basically submitting a patch to some upstream glob project to do it, but maybe few enough people are using these tools with non-utf8 globs that no one has previously run into it?
The to_string_lossy
function turns any non-utf8 sequence into the unicode replacement character �. I wonder if there is some other function that could be called to try to convert a PathBuf
into a meaningful Rust utf-8 string (maybe doing something smart with guessing the original encoding). The quick solution would be to move this conversion into the has_glob
block, which would mean that people using glob imports and file paths with non-utf8 names would run into issues but at least no existing just behavior would break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit lame, but we could just throw an error if you try to glob with a path which can't be converted to unicode. Non-unicode paths are very rare and basically just crazy, so not supporting them is probably fine, especially if they're only unsupported if you use a pretty niche feature.
I'm tempted to land this feature as unstable initially, both because of the unicode path issue, but also because there's some choice about which glob syntaxes we support, and it might be good to make compare them all and make a principled choice about what to support.
for import in import_paths { | ||
let Ok(import) = import else { continue }; | ||
|
||
if import.is_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a glob matching a directory should be an error. Just because we can backwards-compatibly make it not an error, but not backwards-compatibly turn it into an error.
bce3e52
to
ee26676
Compare
Alright, I gated glob imports behind unstable, went back to using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some initial review, see comments.
src/analyzer.rs
Outdated
if let Some(absolute) = absolute { | ||
stack.push(asts.get(absolute).unwrap()); | ||
Item::Import { absolute_paths, .. } => { | ||
for p in absolute_paths { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for p in absolute_paths { | |
for path in absolute_paths { |
src/unstable_feature.rs
Outdated
@@ -3,6 +3,7 @@ use super::*; | |||
#[derive(Copy, Clone, Debug, PartialEq, Ord, Eq, PartialOrd)] | |||
pub(crate) enum UnstableFeature { | |||
FormatSubcommand, | |||
GlobImport, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobImport, | |
ImportPathGlob, |
src/unstable_feature.rs
Outdated
@@ -11,6 +12,7 @@ impl Display for UnstableFeature { | |||
fn fmt(&self, f: &mut Formatter) -> fmt::Result { | |||
match self { | |||
Self::FormatSubcommand => write!(f, "The `--fmt` command is currently unstable."), | |||
Self::GlobImport => write!(f, "Glob imports are currently unstable"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self::GlobImport => write!(f, "Glob imports are currently unstable"), | |
Self::ImportPathGlob => write!(f, "Globs in import paths are currently unstable."), |
src/parser.rs
Outdated
@@ -365,7 +365,7 @@ impl<'run, 'src> Parser<'run, 'src> { | |||
let optional = self.accepted(QuestionMark)?; | |||
let (path, relative) = self.parse_string_literal_token()?; | |||
items.push(Item::Import { | |||
absolute: None, | |||
absolute_paths: vec![], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolute_paths: vec![], | |
absolute_paths: Vec::new(), |
(No real reason, I just like avoiding macros.)
@@ -7,7 +7,7 @@ pub(crate) enum Item<'src> { | |||
Assignment(Assignment<'src>), | |||
Comment(&'src str), | |||
Import { | |||
absolute: Option<PathBuf>, | |||
absolute_paths: Vec<PathBuf>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually just keep this called absolute
. It's a vec of paths, so we know that they're paths and there's potentially more than one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument for changing it is that "absolute" is a pretty general word out of context, so it's hard to guess what it might mean when you see it as a random property access somewhere in code. I don't feel super strongly about it however.
@@ -75,14 +86,65 @@ impl Compiler { | |||
.join(Self::expand_tilde(&relative.cooked)?) | |||
.lexiclean(); | |||
|
|||
if import.is_file() { | |||
let has_glob = import.as_os_str().as_encoded_bytes().contains(&b'*'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because glob supports more meta-characters than *
. It supports ?
, *
, [
, !
, and ]
. I suppose we should check for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume that none of those characters will show up in a legitimate justfile? I think that's safe to assume for *
but maybe not for characters like [
and ]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there was a slightly different version of the import
keyword that signaled that shell globs might be used, or an argument to it? import glob "my/path/*.just"
or something like that?
src/compiler.rs
Outdated
})?; | ||
let glob_options = glob::MatchOptions { | ||
case_sensitive: true, | ||
require_literal_separator: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be true
. If false
, then /
can be matched by *
, which is the bash
default and zsh
default. **
can be used to match an arbitrary directory path, including *
.
src/compiler.rs
Outdated
let import_base_str = | ||
import | ||
.as_os_str() | ||
.to_str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do to_str
on the path without first converting to an OsStr
.
src/compiler.rs
Outdated
} | ||
})?; | ||
for import in import_paths { | ||
let Ok(import) = import else { continue }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This swallows I/O errors. If we encounter an I/O error we should fail.
}); | ||
} | ||
} | ||
} else if import.is_file() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this circular import test duplicated with the one above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but I think that trying to de-duplicate those lines of code would add more complexity than it's worth, since the logic here is just constructing the CircularImport
error if necessary and then manipulating some local variables.
I keep coming back to this. I have these worries:
I wonder if maybe it would be better if there was some syntax to enable the glob at the import, and it was that syntax which was experimental, instead of a setting which turns it on for all imports Something like: glob import '*/mod.just' |
looking forward to see something like this merged. |
@neunenak What are you thinking about this? I think we should do this, and I think the combination of both making it unstable, and opting into explicitly for each import (syntax tbd, but |
Implements shell-style globs in import statements using the Rust
glob
crate. Addresses #1641