Description
I would like to avoid creating the last folder of the source path at the destination when glob double asterisk(**) to select files with specific extensions are used. When using them in the copy
method (also happens on get
or put
) on the LocalFileSystem
(and other remote fs implementation as well), a destination folder (the last folder of the source path) is created in several unwanted situations.
Example
Code:
from fsspec.implementations.local import LocalFileSystem
fs = LocalFileSystem(auto_mkdir=True)
# List source files
print(f"Source files: {fs.glob('/tmp/data/**.txt')}")
# Copy to a location with trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out/")
print(f"Test1: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)
# Copy to a location without trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
print(f"Test2: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)
# Copy twice to a location without trailing slash
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
fs.copy(path1="/tmp/data/**.txt", path2="/tmp/out")
print(f"Test3: {fs.glob('/tmp/out/**.txt')}")
fs.rm(path="/tmp/out", recursive=True)
Output:
Source files: ['/tmp/data/a/b/f3.txt', '/tmp/data/a/f2.txt', '/tmp/data/f1.txt']
Test1: ['/tmp/out/data/a/b/f3.txt', '/tmp/out/data/a/f2.txt', '/tmp/out/data/f1.txt']
Test2: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/f1.txt']
Test3: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/data/a/b/f3.txt', '/tmp/out/data/a/f2.txt', '/tmp/out/data/f1.txt', '/tmp/out/f1.txt']
When a trailing slash is added to the destination (Test1), the folder data
is created. When the trailing slash is not added and we copy only once (Test2), the folder data
is not created (expected results). But when we do twice the copy operation, the folder data
is created once again.
Expect: ['/tmp/out/a/b/f3.txt', '/tmp/out/a/f2.txt', '/tmp/out/f1.txt']
even when we call copy twice.