Syncs two folders lazily.
- Sync a
remote
folder that is mounted in locally but slow to access because it is limited by the network connection, e.g. nfs, sshfs, davfs, and alocal
folder that is fast to access because it is on a local file system. - Sync lazily, i.e. only download data that is requested.
- enum34
- jsonpickle
python lazysync.py -r /remote/ -l /local/
python ~/Code/lazysync/lazysync.py -h
usage: lazysync.py [-h] -r RMT -l LCL [-L {y,n}]
Syncs lazily a remote folder and a local folder
optional arguments:
-h, --help show this help message and exit
-r RM, --remote RM Path where the remote data is located
-l LC, --local LC Path where the local data is located
-L {y,n}, --lazy {y,n}
Sync lazily (on access) or not (always download)
- Syncing works for folders, files and symlinks.
- Files are not directly deleted, but versioned and kept until manually deleted.
- Sync is automatically paused if paths are not yet mounted on start, or are unmounted during its run.
- Problems:
- Relative symlinks
local
->remote
are not updated. (LazySync creates symlinks with absolute paths.) - A file access can be missed and the file won't be downloaded if the access/open time is too short to be detected.
- Relative symlinks
- To Do:
- Better logging levels and user adjustable logging.
- Make sleep time user adjustable.
- Syncing (user created) symlinks.
- Dry-run mode.
- RSync-based copy and rate-limiting speed of copying.
- Size limit for downloaded files.
- Daemonization, definition of API for controling the daemon, implementation of a client.
- Tests.
- Syncing lazily works by creating symlinks in the
local
folder that point to the corresponding file in theremote
folder. This avoids downloading files whose content is not needed (yet). - If a file is read, it is downloaded, the symlink is replaced with a copy of the file contents. That a file is read is determined by checking the Linux kernel's list of open files.
- This assumes that the remote file is cached locally once it is downloaded and not downloaded again. This is true for davfs2.
- A file that is opened and read can be missed if the open time is too short to be detected. If the file access is missed, the file won't be downloaded.
- The local copy of the file is kept until a change to the
remote
file occurs, at which point thelocal
copy is replaced by a symlink.
-
Inotify does not emit events for remote filesystems, which make the inotify approach unusable (no create/modify/delete events, but access events are emitted); this leaves scanning the filesystem as the only option.
-
Based on the differences of a filesystem scan compared to the tracking information stored from the last scan, the following actions are implemented:
- Change to files (dirs)
location\event create mtime atime delete
remote
symlink (mkdir) lazy:symlink/non:download (ignored) ignored (ignored) remove (rmdir)local
upload (mkdir) upload (ignored) ignored (ignored) remove (rmdir)
- Creation and deletion of files and dirs is assumed based on the previous tracking information
- If path existed before (last scan): assume path was deleted
- If path did not exist before: assume new file
- In lazy mode, a symlink is replaced with a local copy (download), if the file was accessed (detected as open with ofnotify).
- Change to files (dirs)
location\event create mtime atime delete
-
User symlinks are synced.
- A user symlink is any symlink that is not a symlink
local
->remote
. - If a symlink's target is outside
remote
orlocal
, they will appear as dead.
- A user symlink is any symlink that is not a symlink
-
Box/webdav
- When syncing files local to
remote
, theremote
mtime will be what it was synced to based on thelocal
file until the webdav is unmounted; on unmount and remount, theremote
mtime will be the upload time, which will be newer than thelocal
mtime, so lazysync will thinkremote
was updated and updatelocal
(ln
/cp
). - When syncing files while LazySync is running, davfs2 will return two different atimes on upload (either when copying
to
remote
folder, or even when uploading though Box' webinterface), so that LazySync will conclude that theremote
file was accessed (through the symlink) and download it tolocal
.
- When syncing files local to
- Deleted files are not directly deleted, but kept in
{remote,local}/.lazysync/<backup_hash>
.<backup_hash>
is a hash based on the original filename and the deletion date and time. Information how each <backup_hash> relates back to the original filename is stored in{remote,local}/.lazysync/data
- Scan the list of open files for all processes in regular intervals to detect newly opened and closed files on the local system only without any influences by other accesses to the remote files.
- This is achieved by scanning /proc//fd/ by using psutil. By comparing with the previous scan, files that have been opened or closed are detected and events are created accordingly.
- The time interval should correlate with the file size for a given filesystem and network connection. If the time to read a file is longer than the time interval, this file should be detected and open and close events created (a file that is read faster, will only be detected if the scan for open files happens between opening and closing of this file.)
- On the first scan, all already open files will be treated like they were just opened, even if they have been open for a long time.
#!/usr/bin/env python
from __future__ import print_function
import ofnotify
class my_event_processor(ofnotify.event_processor):
def process_event(self, event):
print("process_event: path='%s' type=%s" % (event.path, event.type))
if __name__ == "__main__":
n = ofnotify.notifier(my_event_processor(), ['/path/1/', '/path/2/'])
n.loop()
#!/usr/bin/env python
from __future__ import print_function
import ofnotify, time
class my_event_processor(ofnotify.event_processor):
def process_event(self, event):
print("process_event: path='%s' type=%s" % (event.path, event.type))
# main
if __name__ == "__main__":
n = ofnotify.threaded_notifier(my_event_processor(), ['/path/1/', '/path/2/'])
n.start()
while 1:
try:
time.sleep(2)
except KeyboardInterrupt:
n.stop()
break
except:
n.stop()
raise