-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file write concurrency #578
base: master
Are you sure you want to change the base?
Conversation
return written_characters | ||
|
||
|
||
def append_to_file(filepath=None, data=None, mode='a', **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is strictly append -- might be good to take out kwarg and just pass in mode='a' util.to write_to_file
def write_to_file(filepath=None, data=None, mode='w', **kwargs) -> int: | ||
''' | ||
Concurrency safe function for writing data to the file. | ||
Param mode: file open mode ('a' for appending, 'w' for writing) | ||
''' | ||
assert mode == 'w' or mode == 'a' | ||
|
||
with open(filepath, mode, **kwargs) as f: | ||
fcntl.flock(f, fcntl.LOCK_EX) | ||
written_characters = f.write(data) | ||
fcntl.flock(f, fcntl.LOCK_UN) | ||
return written_characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question comes from my lack of knowledge:
If another process tries to access file while it is locked does it lead to an IOError? Or does it automatically wait for lock to be released?
with open(filepath, mode, **kwargs) as f: | ||
fcntl.flock(f, fcntl.LOCK_EX) | ||
written_characters = f.write(data) | ||
fcntl.flock(f, fcntl.LOCK_UN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth explicitly closing file
@JulianPinzaru awesome work! I left a couple comments. Should be good to go after addressing those! Let me know if you have any questions! |
I did some tests with this approach. Although it ensures that one process finishes writing before the other process starts, the output file could still contain data from both processes if the first process writes more data than the second process. Here's an example script to show how this can happen: foo.py:
Test two processes writing at once; the first process that opens the file writes more data than the second: The output file contains a bunch of b's followed by a bunch of Instead of using This approach also ensures that read_from_file will always see a consistent version of the file without needing to acquire a lock even if it is being written to concurrently by another process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review feedback from this thread.
Fixes #565.
Proposed changes
Using native python module
fcntl
to put locks on writing data to a file. This way we can prevent multiple processes writing to the same file simultaneously and erasing each other's data.