Fix file write concurrency #578

JulianPinzaru · 2020-02-23T19:33:28Z

Fixes #565.

Proposed changes

Using native python module fcntl to put locks on writing data to a file. This way we can prevent multiple processes writing to the same file simultaneously and erasing each other's data.

… funcs (fixes #565)

bmblumenfeld · 2020-02-27T03:27:15Z

backend/models/util.py

+    return written_characters
+
+
+def append_to_file(filepath=None, data=None, mode='a', **kwargs):


Since it is strictly append -- might be good to take out kwarg and just pass in mode='a' util.to write_to_file

bmblumenfeld · 2020-02-27T04:40:00Z

backend/models/util.py

+def write_to_file(filepath=None, data=None, mode='w', **kwargs) -> int:
+    '''
+    Concurrency safe function for writing data to the file.
+    Param mode: file open mode ('a' for appending, 'w' for writing)
+    '''
+    assert mode == 'w' or mode == 'a'
+
+    with open(filepath, mode, **kwargs) as f:
+        fcntl.flock(f, fcntl.LOCK_EX)
+        written_characters = f.write(data)
+        fcntl.flock(f, fcntl.LOCK_UN)
+    return written_characters


This question comes from my lack of knowledge:
If another process tries to access file while it is locked does it lead to an IOError? Or does it automatically wait for lock to be released?

bmblumenfeld · 2020-02-27T04:41:02Z

backend/models/util.py

+    with open(filepath, mode, **kwargs) as f:
+        fcntl.flock(f, fcntl.LOCK_EX)
+        written_characters = f.write(data)
+        fcntl.flock(f, fcntl.LOCK_UN)


Maybe worth explicitly closing file

bmblumenfeld · 2020-02-27T04:44:42Z

@JulianPinzaru awesome work! I left a couple comments. Should be good to go after addressing those! Let me know if you have any questions!

youngj · 2020-02-27T06:03:18Z

I did some tests with this approach. Although it ensures that one process finishes writing before the other process starts, the output file could still contain data from both processes if the first process writes more data than the second process.

Here's an example script to show how this can happen:

foo.py:

import argparse
import time
import fcntl

def write_to_file(filepath=None, data=None, mode='w', **kwargs) -> int:
    with open(filepath, mode, **kwargs) as f:
        fcntl.flock(f, fcntl.LOCK_EX)
        for i in range(1, 1000):
            f.write(data)
        time.sleep(3)
        for i in range(1, 1000):
            f.write(data)
        fcntl.flock(f, fcntl.LOCK_UN)

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('--foo', required=True, help='foo')
    args = parser.parse_args()

    write_to_file('foo.txt', args.foo + '\n')

Test two processes writing at once; the first process that opens the file writes more data than the second:
(python foo.py --foo=aaaaaaaaaaaaaaaaaaaaaa &); sleep 1; python foo.py --foo=b

The output file contains a bunch of b's followed by a bunch of \0 characters, followed by a bunch of a's.

Instead of using flock, I think the best approach is to write a temporary file with a unique name in the same directory, then rename it to the desired filename when it is complete. With this approach, flock isn't necessary because each writer has a unique filename to write to, and the filesystem will ensure that the rename is atomic.

This approach also ensures that read_from_file will always see a consistent version of the file without needing to acquire a lock even if it is being written to concurrently by another process.

hathix

Please review feedback from this thread.

JulianPinzaru added 2 commits February 22, 2020 20:55

Added locks for writting to files, created read,write and append util…

ee6feaf

… funcs (fixes #565)

Merge branch 'master' into fix-file_write_concurrency

1dc8e1f

JulianPinzaru requested review from jtanquil and youngj as code owners February 23, 2020 19:33

hathix requested a review from bmblumenfeld February 27, 2020 02:49

bmblumenfeld reviewed Feb 27, 2020

View reviewed changes

hathix self-requested a review June 5, 2020 12:37

hathix requested changes Jun 5, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix file write concurrency #578

Fix file write concurrency #578

JulianPinzaru commented Feb 23, 2020

bmblumenfeld Feb 27, 2020

bmblumenfeld Feb 27, 2020

bmblumenfeld Feb 27, 2020

bmblumenfeld commented Feb 27, 2020

youngj commented Feb 27, 2020

hathix left a comment

		return written_characters


		def append_to_file(filepath=None, data=None, mode='a', **kwargs):

Fix file write concurrency #578

Are you sure you want to change the base?

Fix file write concurrency #578

Conversation

JulianPinzaru commented Feb 23, 2020

Proposed changes

bmblumenfeld Feb 27, 2020

Choose a reason for hiding this comment

bmblumenfeld Feb 27, 2020

Choose a reason for hiding this comment

bmblumenfeld Feb 27, 2020

Choose a reason for hiding this comment

bmblumenfeld commented Feb 27, 2020

youngj commented Feb 27, 2020

hathix left a comment

Choose a reason for hiding this comment