clip-gaze

An art analysis tool powered by CLIP.

Motivation

Diffusion models (such as Stable Diffusion) used OpenAI's CLIP in order to perform textual analysis of their training data. Precisely what these machine learning systems actually learned from their training data is opaque. This tool helps us understand how CLIP, and therefore the models that use CLIP, see images.

What does it do?

Given an image and a series of (text) phrases it calculates the relative likelihood of each phrase to be a good description of the image. Note that this is not the same thing as "given a text phrase, calculate the accuracy of that phrase".

An example

Let's show it the painting "Brücke über die Marne bei Creteil" by Cézanne. If we download the 2,175 × 1,713 pixel version of the painting and open it (e.g. using PIL.Image.open from the package pillow) as image we can then pass it to the gaze command.

# Assuming you have already saved the image to `image`
import clip_gaze
pprint.pprint(
    {
        "artist": clip_gaze.gaze(image, clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE[:200]),
        "surface": clip_gaze.gaze(image, clip_gaze.SURFACES),
        "movement": clip_gaze.gaze(image, clip_gaze.MOVEMENTS)
    }

# Returns
{'artist': ['by paul cézanne (82%)',
            'by clyfford still (07%)',
            'by arnold böcklin (04%)',
            'by franz kline (01%)',
            'by giorgio de chirico (01%)'],
 'movement': ['tonalism movement (16%)',
              'impressionism movement (09%)',
              'american scene painting movement (09%)',
              'modern european ink painting movement (09%)',
              'post-impressionism movement (09%)'],
 'surface': ['on canvas (86%)',
             'on paperboard (11%)',
             'on vellum (01%)',
             'on wood (01%)',
             'on card stock (00%)']}

As you can see CLIP suggests that, of the options provided, the terms "by paul cézanne", "tonalism movement", and "on canvas" are the most likely to describe the input image.

gaze works by having CLIP assess the relative likelihood of the options within each category. Here is a table of lists built into the module.

Variable	Description	Example
ARTISTS_BY_NAME	List of 6000 artists in alphabetical order.	Sandra Chevrier
ARTISTS_BY_TRAINING_PREVALENCE	List of 900 artists in the order of prevalence in the training data (most prevalent first). Sometimes the most famous artists are credited without a first name, and so you may find those as separate entries alongside their full name.	Sandra Chevrier
MOVEMENTS	Artistic movement	Afrofuturism
PAINTING_MATERIALS	Materials for creating paintings	Acrylic Paint
PRINTING_TECHNIQUES	Technique for creating an impression	Aquatint
QUALITIES	Subjective (even more-so than the others) assessment of artwork	Exceptional
SCULTPURE_MATERIALS	Material that is sculpted into artwork	Bronze
SITES	Art websites, each of which have their own tastes (and phrasing)	Popular on Reddit
SURFACES	Material to which the artistic material is applied	Canvas
TOOLS	Object this is used to apply the material to the surface	Brush

For example:

clip_gaze.MOVEMENTS # A list of the prompts describing art history movements

Arguments for `gaze`

Argument	Description	Default
image	The image to inspect	Required
prompts	A list of prompts, see earlier table for examples	Required
batch_size	Limit how many prompts to inspect at once. This defaults to `None` (meaning all inputs are inspected at the same time). If you have insufficient vram then consider setting this to `10` to start with.	`None`
only_show_best	Show only this many results in each category, set it to `None` for no limit	`5`
format_output	Turn the output into something easier for people to read (e.g. percentage in brackets)	`True`
device	Defaults to `"cuda"` (which will run on the gpu) and falls back to `"cpu"` if `cuda` is not available.	`"cuda"`

Finer control

The clip_gaze.gaze command wraps multiple calls to clip_gaze.probabilities, selecting the highest-probabilitiy options and formatting text. If you want raw results based then skip gaze and instead use:

clip_gaze.probabilities(image, clip_gaze.ARTISTS_BY_NAME)

# Returns the probability scores for all 6000 or so artists in the list

How does it work?

CLIP is a tool provided by OpenAI that calculates the similarity between an image and some text. This is a machine learning system trained on an enormous amount of data, and that data will contain biases (intentional and unintentional). It is not a source of truth, but a useful tool to give you ideas about where to search next.

This tool works by downloading CLIP onto your computer and running it locally. This is not an easy task for all computers, especially older ones. See the "Arguments for gaze" section above for a way to change memory load.

Biases

This software is built on a machine learning system, and the biases in this tool come in two parts:

CLIP itself comes with its own biases, and we refer the user to OpenAI's own work on explaining and mitigating that bias
The lists of chosen phrases

The lists used in this software are primarily from Wikipedia and from the training data that CLIP used. Neither of these sources are perfect, and care should be taken when using this software to account for these biases where possible. Although the lists are long (e.g. the list of 6000 artists) there are no claims of completeness or relative importance made.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clip_gaze		clip_gaze
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PYPI_README.md		PYPI_README.md
README.md		README.md
example.ipynb		example.ipynb
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clip-gaze

Motivation

What does it do?

An example

Arguments for `gaze`

Finer control

How does it work?

Biases

About

Releases

Languages

License

hmillerbakewell/clip-gaze

Folders and files

Latest commit

History

Repository files navigation

clip-gaze

Motivation

What does it do?

An example

Arguments for gaze

Finer control

How does it work?

Biases

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages

Arguments for `gaze`