Skip to content

Avoid destination directory creation when using glob double asterisk(**) #1321

Closed
@john-jam

Description

@john-jam

I would like to avoid creating the last folder of the source path at the destination when glob double asterisk(**) to select files with specific extensions are used. When using them in the copy method (also happens on get or put) on the LocalFileSystem (and other remote fs implementation as well), a destination folder (the last folder of the source path) is created in several unwanted situations.

Example

Code:

from fsspec.implementations.local import LocalFileSystem

fs = LocalFileSystem(auto_mkdir=True)

# List source files
print(f"Source files: {fs.glob('/tmp/data/**.txt')}")

# Copy to a location with trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out/")
print(f"Test1: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)

# Copy to a location without trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
print(f"Test2: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)

# Copy twice to a location without trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
print(f"Test3: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)

Output:

Source files: ['/tmp/data/a/b/f3.txt', '/tmp/data/a/f2.txt', '/tmp/data/f1.txt']
Test1: ['/tmp/out/data/a/b/f3.txt', '/tmp/out/data/a/f2.txt', '/tmp/out/data/f1.txt']
Test2: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/f1.txt']
Test3: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/data/a/b/f3.txt', '/tmp/out/data/a/f2.txt', '/tmp/out/data/f1.txt', '/tmp/out/f1.txt']

When a trailing slash is added to the destination (Test1), the folder data is created. When the trailing slash is not added and we copy only once (Test2), the folder data is not created (expected results). But when we do twice the copy operation, the folder data is created once again.

Expect: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/f1.txt'] even when we call copy twice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions