Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression for sucatalog files #50

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Compression for sucatalog files #50

wants to merge 6 commits into from

Conversation

cvgs
Copy link
Contributor

@cvgs cvgs commented Aug 3, 2016

The softwareupdate framework can handle gzip compressed sucatalogs, and indicates this by sending an "Accept-Encoding: gzip, deflate" to the server. This reduces network load and speeds up process of checking for new software.

My modifications create a compressed copy of every catalog, as well as of all .dist files, which do compress very well. The compressed versions are saved in the same file system location as the normal ones. Creating static gzipped files needs a little more disk space, but this should be preferred over on-the-fly encoding and its associated CPU cost, because a lot of clients will request the same file.

When using apache, you can now adapt the Rewrite rules to optionally serve the gzipped file version if the client is capable of receiving it. To allow proper processing, the server also needs to send the Content-Encoding: gzip and Content-Type: text/plain headers.

@gregneagle
Copy link
Contributor

This sounds really interesting. Unfortunately, my production Reposado server is still on Python 2.6 and won't be upgraded for a while, so I won't be using your code as-is in production for a while. I might be able to set up a test box.

In the meantime, I have a few questions:

  1. Could the code be reworked to also run on Python 2.6?
  2. How much additional disk space is used by the extra gzipped files?
  3. What is the measured speed up on the clients when gzipped files are available?

@gregneagle
Copy link
Contributor

If the only 2.7 dependency is the use of the with statement...

def createCompressedFileCopy(local_file_path, copy_only_if_missing=False):
    '''creates a gzipped copy of the given file at the same location with .gz suffix'''
    local_gz_file_path = local_file_path + '.gz'
    if not (copy_only_if_missing and os.path.exists(local_gz_file_path)):
        f_in = open(local_file_path, 'rb')
        f_out = gzip.open(local_gz_file_path, 'wb')
        f_out.writelines(f_in)
        f_out.close()
        f_in.close()

Less pretty, but functionally equivalent. Both versions of the function need more error handling, though -- what happens if, for whatever reason, createCompressedFileCopy fails? Even if you handle the failure such that any exception is handled, what happens with clients that are requesting the gzipped files?

@cvgs
Copy link
Contributor Author

cvgs commented Aug 4, 2016

I have to admin that i do not yet use the code in production. I had this idea in the back of my head for some time and just wanted to get it out into the open; the python 2.7 requirement was not intended.

Thus i can't give any meaningful numbers about traffic reduction or speedup. However, a quick check of the downloaded files on my development NetSUS shows the zipped files add 1.6M to the already existing 19.5M, or less then 10% of the existing size (which in reverse could mean that clients download 90% less metadata):

#find ./content/downloads -name '*.dist' -print0 | xargs -0 du -sch | tail -1
1.5M    total
# find ./content/catalogs/ -name 'index*.sucatalog' -print0 | xargs -0 du -sch | tail -1
18M total

# find ./content/downloads -name '*.dist.gz' -print0 | xargs -0 du -sch | tail -1
528K    total
# find ./content/catalogs/ -name 'index*.sucatalog.gz' -print0 | xargs -0 du -sch | tail -1
1.1M    total

As for error checking, you are certainly right. Should createCompressedFileCopy raise an ReplicationError, or should that be sepearated into the calling functions?

I also just noticed that deleteBranchCatalogs will also have to be updated to remove the compressed catalog file. Purging a product will already take care of removing the compressed dist file by deleting the whole tree.

@weswhet
Copy link
Contributor

weswhet commented Sep 29, 2017

@gregneagle Thoughts on merging this in if the conflicts are fixed? I'll try and give it a test run early next week.

@gregneagle gregneagle changed the base branch from master to main July 1, 2020 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants