-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression for sucatalog files #50
base: main
Are you sure you want to change the base?
Conversation
…speed. Requires python gzip module and "with" support (available since python 2.7, i.e. since OS X 10.7).
This sounds really interesting. Unfortunately, my production Reposado server is still on Python 2.6 and won't be upgraded for a while, so I won't be using your code as-is in production for a while. I might be able to set up a test box. In the meantime, I have a few questions:
|
If the only 2.7 dependency is the use of the def createCompressedFileCopy(local_file_path, copy_only_if_missing=False):
'''creates a gzipped copy of the given file at the same location with .gz suffix'''
local_gz_file_path = local_file_path + '.gz'
if not (copy_only_if_missing and os.path.exists(local_gz_file_path)):
f_in = open(local_file_path, 'rb')
f_out = gzip.open(local_gz_file_path, 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close() Less pretty, but functionally equivalent. Both versions of the function need more error handling, though -- what happens if, for whatever reason, createCompressedFileCopy fails? Even if you handle the failure such that any exception is handled, what happens with clients that are requesting the gzipped files? |
I have to admin that i do not yet use the code in production. I had this idea in the back of my head for some time and just wanted to get it out into the open; the python 2.7 requirement was not intended. Thus i can't give any meaningful numbers about traffic reduction or speedup. However, a quick check of the downloaded files on my development NetSUS shows the zipped files add 1.6M to the already existing 19.5M, or less then 10% of the existing size (which in reverse could mean that clients download 90% less metadata):
As for error checking, you are certainly right. Should createCompressedFileCopy raise an ReplicationError, or should that be sepearated into the calling functions? I also just noticed that deleteBranchCatalogs will also have to be updated to remove the compressed catalog file. Purging a product will already take care of removing the compressed dist file by deleting the whole tree. |
@gregneagle Thoughts on merging this in if the conflicts are fixed? I'll try and give it a test run early next week. |
The softwareupdate framework can handle gzip compressed sucatalogs, and indicates this by sending an "Accept-Encoding: gzip, deflate" to the server. This reduces network load and speeds up process of checking for new software.
My modifications create a compressed copy of every catalog, as well as of all .dist files, which do compress very well. The compressed versions are saved in the same file system location as the normal ones. Creating static gzipped files needs a little more disk space, but this should be preferred over on-the-fly encoding and its associated CPU cost, because a lot of clients will request the same file.
When using apache, you can now adapt the Rewrite rules to optionally serve the gzipped file version if the client is capable of receiving it. To allow proper processing, the server also needs to send the Content-Encoding: gzip and Content-Type: text/plain headers.