-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent file behavior with mountpoint-s3 #1038
Comments
Thanks for opening an issue, @akhilesh-delphix. I note that you are not using any kind of caching so should not be impacted by any relaxed consistency guarantees from that.
If the two applications are performing these concurrently, one thread may see the file while a delete (aka. To avoid seeing this behavior, there needs to be a synchronization point between the two threads so that the second thread will always see the new state in S3.
There are some differing behaviors based on how the file was opened, documented in our Semantics Documentation. If the application has written some data and it has not duplicated the file descriptor, closing the file should be synchronous and block the system call. I suspect you are in this situation and so should not be experiencing the non-blocking behavior. I'm assuming you're using Java's FileWriter class, which I presume will also block its Apologies for the rather complex answers, please do ask if any clarifications are required. |
hi @dannycjones I think lets address/discuss one issue at a time. Issue one (which is solved by adding 1 second of gap between the two threads, pls note that 2nd thread only executes ones the first one completes, still i had to had 1 second of delay): Here is the scenario Thread 1 runs to delete these files directory '1' recursively. Once its complete successfully. Here is the stack (apologies for sharing java stack here)
Pls note our code base works fine when we use NFS to store/manage files. Will it be possible to have a connect over slack or teams or any other channel on this topic (in case you think the discussion is too specific)? |
hi @dannycjones Mount point logs
Driver Log
My Application Log (indicating that its not able to open file right after deleting file with same name)
|
Hey @akhilesh-delphix, the failure with Later on thread 2, your application was trying write to a new file in a non-existing directory so it failed with It still doesn't explain why putting a delay between the two threads make things work though. Probably because there are some other threads that create the deleted directories during that time. We will need debug logs to confirm our assumption and for further investigation. To enable debug logs, you can pass CLI arguments from CSI Driver to Mountpoint in the
I actually don't think this is related to the CSI Driver specifically, so just running your application against Mountpoint directly should work too. |
hi @monthonk However thread 2, first creates file with same name/path and then tries to open it. Its done by Java FileOutputStream API (i had attached stack for it earlier). Our application code simply passes the path/name. Here is code for at java.base/java.io.FileOutputStream.(FileOutputStream.java:123)
at java.base/java.io.FileOutputStream.(FileOutputStream.java:235) Following code successfully validates file path and validity of file and then in 2nd last line
|
The check on |
hi @monthonk I have stripped the log file to include logs for duration when 1st thread executed delete operation and 2nd thread executes the create and open file operation. Here i am posting the exception received by 2nd thread (this exception message includes the file name which its trying to create, along with time, it should help in mapping it to log).
|
Hey, thanks for providing the logs. I have reviewed them and here is what I got from the logs:
I still don't understand why adding a delay would could make it work. Are there any logic in thread 2 that create the directory if not exist? Also would be helpful to get debug logs from this case too. |
Hi @monthonk ,
Pls note that we have not defined any cache while defining volume mount options.
Is it possible to connect over slack or any other communication channel, so that we can discuss and identify root cause? |
hi @monthonk , I was thinking about point 4 that you had mentioned in your response. Could that result into following case:
this just my hypothesis. |
Hey @akhilesh-delphix, I think the missing information is that we don't know how Java API interacts with the file system. Like Danny said, a tool like strace could help in this case. Other communication channel we have is through AWS support. Please feel free to open a support case if you prefer a private channel. |
From point 4, I meant the parent directory of the file could appear to exist where it's already gone. I think this behavior is surprising for users and we may want to improve that. I opened #1055 for tracking. Unfortunately, I don't have a good workaround for your use case right now. From what I understand, thread 1 is supposed to delete all the files and directories recursively. Would it be possible to have thread 2 wait until you can confirm that the directory has been deleted before creating a new file? |
Thread 2 starts to run after Thread 1 complete deletion on files and directories recursively, and deletion of files is a blocking call. Hence thread 2 only starts after files and directories are deleted. If adding wait is the workaround, i will add a wait of 1 second (which worked for me earlier) . It, however, is little challenging as i need to add this at multiple places in code. Meanwhile, we have created a support ticket. I can share that support ticket number here if you want. Lets discuss issue 2 and workaround for issue 1 as part of support ticket. |
Thanks @akhilesh-delphix, please share the ticket number so we can follow up there. I'm closing this issue now. We can create new ones for tracking specific issues if any were found. |
hi @monthonk : support ticket number : 172865712600531 |
/kind bug
What happened?
We have started using https://github.com/awslabs/mountpoint-s3-csi-driver in EKS cluster to mount s3 bucket as volume. Facing some issues while handling files.
What you expected to happen?
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?:
Environment
kubectl version
):- Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15",
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15",
The text was updated successfully, but these errors were encountered: