Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractSftpSubsystemHelper getAttribute unable to read Path attribute from fileList #643

Open
dragonknight88 opened this issue Dec 11, 2024 · 8 comments
Milestone

Comments

@dragonknight88
Copy link

Version

2.13.1

Bug description

When performing an ls operation Path iterator is populating the paths with attributes associated, yet the AbstractSftpSubsystemHelper getAttributes calls the Provider readAttributes to resolve the path attributes.

Actual behavior

Provider readAttributes gets called for each Path object that needs attributes read.

Expected behavior

DirectoryHandle fileList path attributes should be referred if path exists on the fileList.

Relevant log output

No response

Other information

No response

@dragonknight88
Copy link
Author

I am having an issue on the provider where Iterator on directory handle is executed but by the time readAttribute is called subsequently the underlying file on storage is being deleted and whole ls operation fails.

@tomaswolf
Copy link
Member

Not sure I understand. Did you see the explanations given in the documentation?

Please show some code, and also explain your setup. I'm a bit confused by the mention of AbstractSftpSubsystemHelper, which is a server-side abstraction.

Do you have a client and a single server, where the files reside?

Or do you chain servers (client connects to first server, which has an SftpFileSystem which would go to a second server where the files reside)? If so, on which server do you observe this?

When you run with debug logging, do you see STAT or LSTAT requests for the files being made?

@dragonknight88
Copy link
Author

I have mina sftp server side implementation on server with S3 provider similar to https://github.com/carlspring/s3fs-nio/blob/master/src/main/java/org/carlspring/cloud/storage/s3fs/S3FileSystemProvider.java; files reside on S3.

Difference is S3 provider I got isnt caching the file attributes on S3Path (implementation of Path). I do however populate path attributes on S3Path Iterator which gets populated on DirectoryHandle().

No server chaining. sftp client connects to Server (mina sftp server with S3 provider and SftpSubsystemFactory) performs an ls.

Problem Im trying to figure out is if theres a way to get these attributes passed from Iterator on to readAttributes() call that happens on ls operation.

Of course, easier alternative is to implement caching of attr on S3Path but I somehow feel theres better way to work around it or I am simply missing something trivial.

@tomaswolf
Copy link
Member

So your server is using a custom remote filesystem to serve SFTP files from. The S3 provider you linked does have an attribute cache in the latest version (2.0.0).

What might perhaps work is that you wrap the S3 filesystem by your own which returns SftpPath instances wrapping S3Path. The AbstractSftpSubsystemHelper then should try to use the attributes cached on the path at

if (f instanceof SftpPath) {
SftpClient.Attributes attributes = ((SftpPath) f).getAttributes();
if (attributes != null) {
entries.put(shortName, f);
writeDirEntry(session, id, buffer, nb, f, shortName, attributes);
nb++;
continue;
}
}

If it helps we could factor out this getAttributes() method into a separate interface, say WithFileAttributes.

Apart from that I don't see this as a bug or problem in Apache MINA sshd; it's a basic problem of any remote file system.

@dragonknight88
Copy link
Author

yes, 2.0.0 does have the cache but it comes with the unknowns. Hence trying to open it up for discussion where we could work with mina and make it happen :) Not a bug on Mina certainly.

Second approach works for me. Could we cut a test release with the changes suggested for second approach? Tysm @tomaswolf

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Dec 14, 2024
Remote file systems may have a need to cache file attributes on Path
instances. A typical use case is iterating over all files in a
directory: the directory listing returns paths, but the underlying
remote listing operation may also return file attributes for each entry.
This is the case in Sftp, but may also occur for other file systems, for
instance a file system wrapping Amazon S3.

It would be unfortunate and inefficient if iterating through the paths
returned and doing something with the attributes would have to re-fetch
the attributes again if they were already available.

By implementing WithFileAttributes of its Paths, a file system can
associate file attributes with a path instance, and client code can
access them. If a file system also makes its paths implement the second
interface WithFileAttributeCache, then the SftpSubsystem uses it
internally to avoid making repeated remote calls to get file attributes.
@tomaswolf
Copy link
Member

We're not going to cut a release for experimental stuff. PR #645 shows what I had in mind. Your own file system could have its paths implement the two new interfaces, and our SftpSubsystem would then use its internal optimizations for attributes that it already uses for SftpPath.

Give it a try; and if it does help your use case, we can merge this, and it'll be available in the snapshot release from the Apache snapshots repository. And if it needs more to be effective for your S3 case, we can amend the PR before merging.

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Dec 14, 2024
Remote file systems may have a need to cache file attributes on Path
instances. A typical use case is iterating over all files in a
directory: the directory listing returns paths, but the underlying
remote listing operation may also return file attributes for each entry.
This is the case in Sftp, but may also occur for other file systems, for
instance a file system wrapping Amazon S3.

It would be unfortunate and inefficient if iterating through the paths
returned and doing something with the attributes would have to re-fetch
the attributes again if they were already available.

By implementing WithFileAttributes of its Paths, a file system can
associate file attributes with a path instance, and client code can
access them. If a file system also makes its paths implement the second
interface WithFileAttributeCache, then the SftpSubsystem uses it
internally to avoid making repeated remote calls to get file attributes.
@dragonknight88
Copy link
Author

Yup, this works. I would be able to confirm once I could test this out on my end. Could we merge this to get the snapshot release for testing?

Will mina take care of the caching of attribute if SftpPath instance thats returned on iterator? Or I would have to implement the WithAttributeCache separately on my end?

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Dec 17, 2024
Remote file systems may have a need to cache file attributes on Path
instances. A typical use case is iterating over all files in a
directory: the directory listing returns paths, but the underlying
remote listing operation may also return file attributes for each entry.
This is the case in Sftp, but may also occur for other file systems, for
instance a file system wrapping Amazon S3.

It would be unfortunate and inefficient if iterating through the paths
returned and doing something with the attributes would have to re-fetch
the attributes again if they were already available.

By implementing WithFileAttributes of its Paths, a file system can
associate file attributes with a path instance, and client code can
access them. If a file system also makes its paths implement the second
interface WithFileAttributeCache, then the SftpSubsystem uses it
internally to avoid making repeated remote calls to get file attributes.
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Dec 18, 2024
Remote file systems may have a need to cache file attributes on Path
instances. A typical use case is iterating over all files in a
directory: the directory listing returns paths, but the underlying
remote listing operation may also return file attributes for each entry.
This is the case in Sftp, but may also occur for other file systems, for
instance a file system wrapping Amazon S3.

It would be unfortunate and inefficient if iterating through the paths
returned and doing something with the attributes would have to re-fetch
the attributes again if they were already available.

By implementing WithFileAttributes of its Paths, a file system can
associate file attributes with a path instance, and client code can
access them. If a file system also makes its paths implement the second
interface WithFileAttributeCache, then the SftpSubsystem uses it
internally to avoid making repeated remote calls to get file attributes.
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Dec 18, 2024
Remote file systems may have a need to cache file attributes on Path
instances. A typical use case is iterating over all files in a
directory: the directory listing returns paths, but the underlying
remote listing operation may also return file attributes for each entry.
This is the case in Sftp, but may also occur for other file systems, for
instance a file system wrapping Amazon S3.

It would be unfortunate and inefficient if iterating through the paths
returned and doing something with the attributes would have to re-fetch
the attributes again if they were already available.

By implementing WithFileAttributes of its Paths, a file system can
associate file attributes with a path instance, and client code can
access them. If a file system also makes its paths implement the second
interface WithFileAttributeCache, then the SftpSubsystem uses it
internally to avoid making repeated remote calls to get file attributes.
@tomaswolf
Copy link
Member

Done; it's available now in the snapshot.

If your file system wrapping the S3 paths uses SftpPathImpl, then you'll get the very limited attribute caching that we do for Sftp. Basically we only make sure we have the attributes on the paths in a directory listing, and using the withAttributeCache()methods we sometimes cache attributes for a short scope to avoid repeated remote calls.

If you use your own Path implementation that also implements WithFileAttributeCache, then you have to implement your own caching.

@tomaswolf tomaswolf added this to the 2.14.1 milestone Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants