Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Loop files/directories (glob) #463

Open
Mte90 opened this issue Sep 5, 2024 · 19 comments · May be fixed by #511
Open

[Feature] Loop files/directories (glob) #463

Mte90 opened this issue Sep 5, 2024 · 19 comments · May be fixed by #511
Assignees
Labels
enhancement New feature or request stdlib syntax

Comments

@Mte90
Copy link
Member

Mte90 commented Sep 5, 2024

Right now to loop files in Amber:

let std = unsafe $/usr/bin/ls "/your-folder/"$
let stdlib = split(std, "\n")

loop v in stdlib {
    if (contains(v, ".ab") and file_exist("/your-folder/{v}")) {
        echo v
        echo "\n"
    }
}

Requires stdlib for a single function.
Maybe we can do a dedicated builtin or a new stdlib function?

Something like:

loop v in glob("/your-folder/*.ab") {
    if (file_exist("/your-folder/{v}")) {
        echo v
        echo "\n"
    }
}

In this way we can have also wildcard. Maybe is just a wrapper around ls like I already did in Amber?
file_exist it is needed as return also . and .. so maybe we can clean up also that so the developer can do just:

loop v in glob("/your-folder/*.ab") {
    echo v
    echo "\n"
}
@Mte90 Mte90 added enhancement New feature or request stdlib syntax labels Sep 5, 2024
@Mte90
Copy link
Member Author

Mte90 commented Sep 5, 2024

Nevermind ls -A don't return the dots so for sure we can do a wrapper easily to ls.

@b1ek
Copy link
Member

b1ek commented Sep 7, 2024

i think that this glob can be implemented as an stdlib thingy which returns [Text] which can be looped like you shown

@hdwalters
Copy link
Contributor

Also, wrapping ls is an incredibly bad idea; very tricky to get right if filenames have spaces in them.

@hdwalters hdwalters self-assigned this Sep 14, 2024
@hdwalters
Copy link
Contributor

hdwalters commented Sep 14, 2024

Also, wrapping ls is an incredibly bad idea; very tricky to get right if filenames have spaces in them.

Okay, I see that the sample Amber script in Mte90's original post relies on ls detecting that stdout is a pipe not a terminal, and writing each filename on its own line (thus avoiding the whitespace splitting problem). It then splits the output on "\n". I created some files to test this:

$ pwd
/home/hwalters/git/amber
$ touch 'FIRST FILE WITH SPACES'
$ touch 'SECOND FILE WITH SPACES'

You can list files with spaces by escaping the whitespace in the glob:

$ ls ~/git/amber/*WITH\ SPACES* | cat
/home/hwalters/git/amber/FIRST FILE WITH SPACES
/home/hwalters/git/amber/SECOND FILE WITH SPACES

However, this does not work when you pass the glob via a local variable, as we would in the transpiled Bash:

$ path=~/git/amber/*WITH\ SPACES*
$ ls $path | cat
ls: cannot access '/home/hwalters/git/amber/*WITH': No such file or directory
ls: cannot access 'SPACES*': No such file or directory

The only advice I've been able to find online is to "quote your variables", but this doesn't work with globs either:

$ ls "$path" | cat
ls: cannot access '/home/hwalters/git/amber/*WITH SPACES*': No such file or directory

I also tried using eval:

$ eval "ls $path" | cat
ls: cannot access '/home/hwalters/git/amber/*WITH': No such file or directory
ls: cannot access 'SPACES*': No such file or directory

By the way, there's another reason to use plain globs rather than wrapping ls, which is that it's more efficient if we don't have to call a subprocess:

$ for file in ~/git/amber/*.md; do test -e "$file" && echo "$file"; done
/home/hwalters/git/amber/CONTRIBUTING.md
/home/hwalters/git/amber/LICENSE.md
/home/hwalters/git/amber/NIX.md
/home/hwalters/git/amber/README.md

However, this does not fix the problem described above:

$ for file in $path; do echo "$file"; done
/home/hwalters/git/amber/*WITH
SPACES*
$ for file in "$path"; do echo "$file"; done
/home/hwalters/git/amber/*WITH SPACES*

Anyone have any suggestions?

@Ph0enixKM
Copy link
Member

Ph0enixKM commented Sep 14, 2024

I found this snippet working

path='~/git/amber/*WITH\ SPACES*'
eval "ls $path"

I believe that this works because single quotes preserve the slash... But yeah... we wanted to store glob directly in a variable, or is this good enough?

@hdwalters
Copy link
Contributor

Ah, the one permutation I didn't try. Thanks! Will try to call that from an Amber script.

@b1ek
Copy link
Member

b1ek commented Sep 15, 2024

➜  ~ $ d=$(mktemp -d)
➜  ~ $ cd $d
➜  tmp.dfgbyxAhY9 $ touch a
➜  tmp.dfgbyxAhY9 $ touch ab
➜  tmp.dfgbyxAhY9 $ touch cb 
➜  tmp.dfgbyxAhY9 $ find . -name '*b'
./ab
./cb
➜  tmp.dfgbyxAhY9 $ find . -name 'a*'
./a
./ab

@hdwalters
Copy link
Contributor

This works:

#!/usr/bin/env amber
import * from "std/fs"
import * from "std/text"

fun glob(path: Text): [Text] {
    let files = unsafe $eval "for file in {path}; do test -e \\\"\\\$file\\\" && echo \\\"\\\$file\\\"; done"$
    return split(files, "\n")
}

echo "[Markdown]"
loop file in glob("~/git/amber/*.md") {
    echo file
}

echo "[Whitespace]"
loop file in glob("~/git/amber/*WITH\ SPACES*") {
    echo file
}

echo "[Missing]"
loop file in glob("~/git/amber/*MISSING*") {
    echo file
}
$ ./amber.ab 
[Markdown]
/home/hwalters/git/amber/CONTRIBUTING.md
/home/hwalters/git/amber/LICENSE.md
/home/hwalters/git/amber/NIX.md
/home/hwalters/git/amber/README.md
[Whitespace]
/home/hwalters/git/amber/FIRST FILE WITH SPACES
/home/hwalters/git/amber/SECOND FILE WITH SPACES
[Missing]

@hdwalters
Copy link
Contributor

@b1ek you suggested using find. I think we should generally try to take the minimalist approach and use Bash builtins where possible, rather than calling external processes (also using find is like using a sledgehammer to crack a nut).

The one thing find gives you that builtin glob doesn't is finding files in subdirectories, but perhaps that's a task for a different standard library function?

@Ph0enixKM
Copy link
Member

I was thinking if we should introduce glob as a builtin or not. I think that we should first introduce it as a function and optionally deprecate it later rather than introduce a new language feature that we have to also maintain it.

So the solution to this issue is to introduce the glob function that @hdwalters proposed here

@Ph0enixKM Ph0enixKM changed the title [Feature]Loop files/directories (glob) [Feature] Loop files/directories (glob) Sep 16, 2024
@Mte90
Copy link
Member Author

Mte90 commented Sep 17, 2024

I don't like the actual proposal.

Using eval is something that is discouraged everywhere and we are working on a transpiler.
This opens to a lot of flaws and issues that we can't handle right now.

I want to remember that the project is in alpha stage and the stdlib can be changed any time, so use find that is in any Unix install like sed it shouldn't be a problem.

https://www.gnu.org/software/findutils/

If we start to reimplement everything in pure bash in alpha stage we are becaming very complicate, without the real needs.

@Ph0enixKM
Copy link
Member

@Mte90 ref in Amber is handled using eval. I don't see any problem with this since this is compiled to Bash and not being touched or exposed to the user. What is the biggest vulnerability when it comes to eval in this case?

@Mte90
Copy link
Member Author

Mte90 commented Sep 18, 2024

There is a variable passed to the string that is the loop with an eval.

I don't think that is the case to expose the project to an eval for something that we can handle with find that is any Unix installation.

@b1ek
Copy link
Member

b1ek commented Sep 18, 2024

eval is not even needed here, it works fine when calling the for loop directly

@KrosFire
Copy link
Member

I think that it needs to be a builtin so that we don't make double loop. We need to detect via syntax that a programmer wants to iterate over files and generate a native loop over given glob. Something like this:

loop file in "*.txt" {
  echo file
}

Will transpile to something like this:

for file in "*.txt"; do
echo file;
done;

We can think of some syntax over the glob, but I think that string in the iterator position can be good enough signal to tell that we mean path with wildcards

@Ph0enixKM
Copy link
Member

@KrosFire then the string should behave like glob in for loop. Like:

let value = "hello"
loop item in value {
  echo item
}
// hello

Then if

let value = "path/to/*"

We'd get with the same loop:

// path/to/file1
// path/to/file2
// ...

@hdwalters
Copy link
Contributor

hdwalters commented Sep 18, 2024

Ok, on consideration I agree that using eval is a bad idea. I wrote this Amber script showing that injection attacks are possible:

#!/usr/bin/env amber
import * from "std/fs"

main {
    let path = "missing; do touch inject.xxx; done; for file in missing"
    let files = glob(path)?
}
$ ls -l inject.*
-rwxrwxr-x 1 hwalters hwalters 155 Sep 18 20:09 inject.ab
$ ./inject.ab
$ ls -l inject.*
-rwxrwxr-x 1 hwalters hwalters 155 Sep 18 20:09 inject.ab
-rw-rw-r-- 1 hwalters hwalters   0 Sep 18 20:13 inject.xxx

Back to the drawing board then. I'll see what I can do about @KrosFire's suggestion above.

@hdwalters
Copy link
Contributor

hdwalters commented Sep 18, 2024

@b1ek wrote:

eval is not even needed here, it works fine when calling the for loop directly

For the record, the reason I had to use eval was that Bash doesn't seem to expand globs passed via local variables; see discussion in #463 (comment).

@hdwalters
Copy link
Contributor

However, I would suggest not just using @KrosFire's syntax directly. I think it would be better to do:

loop file in glob("*.txt") {
    echo file
}

Because at some point I would like to implement line by line reading with similar syntax like this:

loop line in cat("*.txt") {
    echo line
}

@Ph0enixKM Ph0enixKM added this to the Amber 0.4.0-alpha milestone Oct 4, 2024
@hdwalters hdwalters linked a pull request Oct 11, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment