Migration to Pillow and huge performance improvements #137

QSchulz · 2021-03-03T19:05:02Z

A notable change is that the resize option for images only accepts
percentages for now.

Another notable change is that the .copy() function actually also
applies the quality setting, unlike the implementation with
graphicsmagick.

This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD
7.0.0.post3.

Here are the different benchmarks. The setup is the following: ~1400 photos spread among 31 galleries. Building everything from scratch. Graphicsmagick means the current implementation in prosopopee. "Built Pillow 8.1.0" can be reproduced by installing libjpegturbo and then running pip3 install --no-binary :all: --force-reinstall pillow. Please follow https://pillow.readthedocs.io/en/stable/installation.html#building-from-source to make sure you have all the packages installed in your distribution prior to trying to compile Pillow.

Computer	Graphicsmagick	Pillow 8.1.0	Built Pillow 8.1.0	Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33)	1:37:13.06	26:57.71	17:43.66	N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33)	5:35:32.57	1:44:21.93	1:16:32.42	N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33)	1:42:33.79	46:10.00	26:10.30	17:30.49
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2)	33:01.63	6:00.16	3:40.09	2:16.03
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10)	3:44:57.00	44:43.67	33:29.86	N/A

Regenerating only one gallery of 71 photos:

Computer	Graphicsmagick	Pillow 8.1.0	Built Pillow 8.1.0	Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33)	3:59.49	1:37.47	1:13.93	N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33)	14:59.40	6:09.36	5:04.45	N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33)	4:33.75	2:31.85	1:45.11	1:18.00
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2)	1:31.09	21.179	12.373	8.665
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10)	10:26.63	2:23.45	2:22.50	N/A

Currently, thumbnail generation is done in a single thread while parsing
the galleries by calling graphicsmagick for every thumbnail to be
generated. This is suboptimal even though graphicsmagick spreads its
payload over all available CPU cores.

After a quick and dirty benchmarking, it was found that multiprocessed
Pillow for generating thumbnails was much more efficient than
graphicsmagick.

This PR adds support for generation of tuhmbnails with multiprocessed
Pillow.

Multiple processes have to be used and not multiple threads because
Python still uses the Global Interpreter Lock (GIL) for threads, meaning
they cannot concurrently be running, which is what one wants for CPU
intensive tasks such as thumbnail generation.

Multiprocess brings its own set of challenges because most data
structures cannot be shared between processes, such as the cache for
example. All data modified by any of the processes should be of a type
handled by multiprocess.Manager data structures.

In order to have the best performances, all thumbnails for an image
should be generated at once, so that the original image is opened only
once. This therefore requires to keep track of images and add thumbnails
to be created to the original image. This can be done via a factory
which is passed to the Jinja templates so that they can request
thumbnails for given images without knowing more than the original path,
name of the original image and the parameters of the thumbnails to
create.

The ImageFactory keeps all of those original images in a dictionary
which consists of a virtual path made from the original image name and a
CRC32 of all the options that applies to its thumbnails. This gives
prosopopee the ability to group thumbnails per options (e.g. if options
are passed in gallery settings.yaml).

The original image (or BaseImage) is returned by the ImageFactory and
the templates can then request .copy() or .thumbnail() for it.

The thumbnails are kept in a dictionary whose keys are the name of the
thumbnail which is made out of the original name plus its size and the
crc32 of the original image and the options that apply to it. This way,
thumbnails are guaranteed to be unique even if requested multiple times
by templates.

The size is now read with imagesize.getsize() only once when ratio
property or .copy() is called on the image so that the performance impact
is minimal.

Since multiprocess.Pool.map splits iterables into pre-defined chunks
which are then assigned to processes, it is needed for best performance
to have processes with more or less the same taskload so that one or
more processes aren't idle when one is working 100%. For that, the
original images whose thumbnails are all cached should be removed from
the list of images to generate thumbnails from before the list is passed
to multiprocess.Pool.map so that each process has more or less the same
taskload.

Thanks,
Quentin

This move makes sense if one wants to reuse remove_superficial_options since it can be not specific to cache.py only. This prepares prosopopee for Pillow support. Signed-off-by: Quentin Schulz <[email protected]>

Dry runs (`prosopopee test`) shouldn't dump the cache since nothing's done except creating the HTML files which means the cache is more or less meaningless in that case. Let's dump the cache only when doing a normal build run. Signed-off-by: Quentin Schulz <[email protected]>

For images, calls to copy() is only needed when later in the template {{ image }} is used. Removing those copy() as they trigger creation of thumbnails that will never be used. Signed-off-by: Quentin Schulz <[email protected]>

Big gallery covers should be used for lines where only one gallery cover appears. With the current logic, if there is a prime number of galleries (except 2 and 3), first one and all galleries whose index is prime (except 2nd and 3rd) will have a big cover. In the end, all it matters is that if the galleries_line contains only one gallery, that gallery should have a big cover. Signed-off-by: Quentin Schulz <[email protected]>

Loggers work by hierarchy. The parent always overrides whatever the child logger has already defined. This applies to the loglevel, which is changed in prosopopee according to the --log-level argument. Since the root logger (gotten with logger = logging.getLogger()) is the parent of ALL loggers which could be declared in any third party module, prosopopee's loglevel also applies to those modules which is usually not wanted especially when prosopopee's default loglevel is the highest available. This is very annoying with Pillow since it's pretty verbose when saving files. Instead, let's declare a logger for prosopopee only. Unfortunately, since the package layout is unconventional (all *.py files in the same directory, instead of subdirs), the recommended logger = logging.getLogger(__name__) cannot be used because __name__ is __main__ in prosopopee.py, and the filename of the file in which it is used (e.g. in cache.py, it'll be cache). Which means they're not related in the eyes of the logging module and prosopopee.py's loglevel will not apply to other *.py files in the project. Instead the expected value of __name__ for more conventional packaging layouts is simulated by appending prosopopee. in front of __name__ except for prosopopee.py which is the parent logger and thus will be simply named prosopopee. Since prosopopee's logger is not the root logger anymore, NOTSET loglevel cannot be used anymore because its meaning is basically "offload messages to parent logger" and the root logger has a default loglevel of WARNING, meaning prosopopee's default loglevel will not print anything labelled as INFO or DEBUG. c.f. https://stackoverflow.com/a/50755200 Signed-off-by: Quentin Schulz <[email protected]>

In order to prepare for multiprocess support, migrate Cache.cache from a simple dict to a Manager().dict which is one of the data type that can be modified safely from other processes. Signed-off-by: Quentin Schulz <[email protected]>

…uration json.dumps() which is used to write the cache dict to a file transforms tuples into a list. With the current implementation, if a tuple is supposed to be cached, the needs_to_be_generated method will always return True even though it might not be correct. In order to support tuples in cache entries, let's pass the options passed as parameter to the method through json.loads(json.dumps()) to have the same format between cached options and to-be-compared options. This will be used in a later commit which adds a tuple (width, height) to the cache. Signed-off-by: Quentin Schulz <[email protected]>

Currently, thumbnail generation is done in a single thread while parsing the galleries by calling graphicsmagick for every thumbnail to be generated. This is suboptimal even though graphicsmagick spreads its payload over all available CPU cores. After a quick and dirty benchmarking, it was found that multiprocessed Pillow for generating thumbnails was much more efficient than graphicsmagick. This patch adds support for generation of tuhmbnails with multiprocessed Pillow. Multiple processes have to be used and not multiple threads because Python still uses the Global Interpreter Lock (GIL) for threads, meaning they cannot concurrently be running, which is what one wants for CPU intensive tasks such as thumbnail generation. Multiprocess brings its own set of challenges because most data structures cannot be shared between processes, such as the cache for example. All data modified by any of the processes should be of a type handled by multiprocess.Manager data structures. In order to have the best performances, all thumbnails for an image should be generated at once, so that the original image is opened only once. This therefore requires to keep track of images and add thumbnails to be created to the original image. This can be done via a factory which is passed to the Jinja templates so that they can request thumbnails for given images without knowing more than the original path, name of the original image and the parameters of the thumbnails to create. The ImageFactory keeps all of those original images in a dictionary which consists of a virtual path made from the original image name and a CRC32 of all the options that applies to its thumbnails. This gives prosopopee the ability to group thumbnails per options (e.g. if options are passed in gallery settings.yaml). The original image (or BaseImage) is returned by the ImageFactory and the templates can then request .copy() or .thumbnail() for it. The thumbnails are kept in a dictionary whose keys are the name of the thumbnail which is made out of the original name plus its size and the crc32 of the original image and the options that apply to it. This way, thumbnails are guaranteed to be unique even if requested multiple times by templates. The size is now read with imagesize.getsize() only once when ratio property or .copy() is called on the image so that the performance impact is minimal. A notable change is that the resize option for images only accepts percentages for now. Another notable change is that the .copy() function actually also applies the quality setting, unlike the implementation with graphicsmagick. Since multiprocess.Pool.map splits iterables into pre-defined chunks which are then assigned to processes, it is needed for best performance to have processes with more or less the same taskload so that one or more processes aren't idle when one is working 100%. For that, the original images whose thumbnails are all cached should be removed from the list of images to generate thumbnails from before the list is passed to multiprocess.Pool.map so that each process has more or less the same taskload. This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD 7.0.0.post3. Signed-off-by: Quentin Schulz <[email protected]>

Signed-off-by: Quentin Schulz <[email protected]>

…ration Generating thumbnails is done in parallel threads via multiprocessing.Pool. By default, Pool schedules tasks on as many threads as there are cpu threads on the host machine. Let's allow users to select the number of threads Pool can use. Signed-off-by: Quentin Schulz <[email protected]>

QSchulz force-pushed the multiprocess branch 3 times, most recently from 75cae06 to 12194b5 Compare March 21, 2021 13:51

QSchulz mentioned this pull request Apr 2, 2021

exposure: full-picture: fix deformed thumbnails for videos #143

Merged

QSchulz added 11 commits October 10, 2021 18:18

move remove_superficial_options into utils

2c2895c

This move makes sense if one wants to reuse remove_superficial_options since it can be not specific to cache.py only. This prepares prosopopee for Pillow support. Signed-off-by: Quentin Schulz <[email protected]>

themes: remove unnecessary calls to copy()

ef3a5ee

For images, calls to copy() is only needed when later in the template {{ image }} is used. Removing those copy() as they trigger creation of thumbnails that will never be used. Signed-off-by: Quentin Schulz <[email protected]>

travis: remove now useless graphicsmagick

9566424

Signed-off-by: Quentin Schulz <[email protected]>

docs: update based on migration from GraphicsMagick to Pillow

0de1e40

Signed-off-by: Quentin Schulz <[email protected]>

QSchulz force-pushed the multiprocess branch from 12194b5 to ba91bcb Compare October 10, 2021 16:22

QSchulz mentioned this pull request Oct 21, 2022

AttributeError: module 'jinja2.ext' has no attribute 'with_' #147

Open

QSchulz mentioned this pull request Aug 28, 2023

EOF error when running on macOS recitale/recitale#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration to Pillow and huge performance improvements #137

Migration to Pillow and huge performance improvements #137

QSchulz commented Mar 3, 2021

Migration to Pillow and huge performance improvements #137

Are you sure you want to change the base?

Migration to Pillow and huge performance improvements #137

Conversation

QSchulz commented Mar 3, 2021