Skip to content

ArWeHei/first_results

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview of results so far

Basics and local stuff

built TF from source for 2x speed increase on local machine

modifying adaptive-style-transfer

added functionality for reencoding with the following features:

  • --reencodes n reencodes the data n times
  • --reencode_steps n save every n-th step of the reencoding process
  • --embeddings save the embeddings as numpy binary file for every n-th step specified in --reencode_steps
  • --log log interesting variables for reencoding such as the norm of the feature vector as well as the distance between consecutive feature vectors and mainly log the feature vectors themselves for visualization in TensorBoard which allows for PCA and t-SNE

results

After around 10 iterations the original image content is hardly recognisable anymore.

TODO: Insert original image and after 10 iterations, van gogh and picasso

While the abstraction is more obvious for the first few iterations, the image seem to vary less for later iterations.

s = 0 s = 1 s = 5 s = 10 s = 20

The resampled images for different original images seem to be similar in their abstract appearance, yet they do not become more similar over time.

original van-gogh s=100 picasso s=100 cezanne s=100 el-greco s=100 gauguin s=100 kandinsky s=100 kirchner s=100 monet s=100 morisot s=100 peploe s=100 pollock s=100 roerich s=100

Looking at 100 iterations for two styles of the same picture...

visualizing embeddings/information

the results have been visualized using the following methods:

  • For few test images scikit-learn's t-SNE and matplotlib give simple 2D images that indicate that embeddings are rather drifting apart:
visualizaion and parameters van-gogh s=100 picasso s=100 cezanne s=100 el-greco s=100 gauguin s=100 kandinsky s=100 kirchner s=100 monet s=100 morisot s=100 peploe s=100 pollock s=100 roerich s=100
t-SNE
n_components=2
verbose=1
perplexity=50
n_iter=1000
UMAP
n_components=2
n_neighbors=50
Embeddings were taken from the same run as the pictures before
  • For more advanced plots (3D) TensorBoard offers nice visualization for embeddings as well as scalar values at the cost of at times very high latency and long loading times for projecting embeddings.
This image shows the distance between subsequent embeddings in feature space for resampling the van-gogh style. Intuitively, the value should go down with resampling the same image and only jump for multiples of 100. Tensorboard gives a different picture as the distance between images in feature space appears to be random. But Tensorboard also withholds information for many datapoints in its representation so a comparison with the embeddings would be appropriate. For only 100 datapoints all information is displayed but the randomness remains.
PCA t-SNE
Looking at the feature space Tensorboard does not give nice t-SNE plots while the PCA plots fit the appearance of the following UMAP plots.

Tensorboard seems to be problematic when using more than 1000 datapoints especially as the checkpoint files tend to reach 10s of GB. It is easier and more practical to use bokeh for thi, especially when Tensorboard runs not locally and has to send all of its data to the user.

  • For validation purposes use again UMAP. This advertises the use of a different visualization tool. In this case bokeh was used, which itself uses vis.js for plotting 3D graphs (n_components=3 n_neighbors=150)
UMAP 3D for van-gogh UMAP 3D roerich

Conclusion

Contrary to my expectations, the process of reencoding image data does not tend to converge. Interestingly, after only ~10 (maybe a bit more) reencodings it is almost impossible for a human to infer the original image or the next 10 reencoding steps. Looking at the scalar data and the plots, it even looks like the embeddings diverge to a certain degree although the reencodings of different original images look similar to the human eye. Even for low resolution pictures (30x30 px) neither the image data nor the embeddings converge for 10,000 iterations. Looking at the t-SNE plot the data looks like a raveled ball of wool.

This is the original picture for 10,000 iterations. It was taken from the Places365 database and cropped to 30x30 px.

This is the result after 10,000 iterations in picasso style.

10000 iterations in Tensorboard using t-SNE 10000 iterations in bokeh using UMAP. 10000 iterations in bokeh using UMAP only using every 10th data point

Ideas for what could still be done

  • parallelize the process of inference

    This is not really practical, as the increase in speed would be lower compared to starting multiple processes on different GPUs for different images.

    Maybe this is applicable to training instead of inference? -> also difficult

  • log the discriminators certainty of each reencoding!

    Still a good idea to certain degree, might show strengths/weaknesses of network.

    Problem: The discriminators weights are not maintianed (?!) and one would have to train a new disrciminator

  • replace transformer block with i.e. flower vs. plane discriminator and then see if flowers/planes are better stylized

    This should be talked about in advance, how much benefit this might give. The main problem is the very long training time for the original model (300,000 iterations per style).

About

short summary of results so far

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages