built TF from source for 2x speed increase on local machine
added functionality for reencoding with the following features:
--reencodes n
reencodes the data n times--reencode_steps n
save every n-th step of the reencoding process--embeddings
save the embeddings as numpy binary file for every n-th step specified in--reencode_steps
--log
log interesting variables for reencoding such as the norm of the feature vector as well as the distance between consecutive feature vectors and mainly log the feature vectors themselves for visualization in TensorBoard which allows for PCA and t-SNE
After around 10 iterations the original image content is hardly recognisable anymore.
TODO: Insert original image and after 10 iterations, van gogh and picasso
While the abstraction is more obvious for the first few iterations, the image seem to vary less for later iterations.
![]() |
![]() |
![]() |
![]() |
![]() |
s = 0 | s = 1 | s = 5 | s = 10 | s = 20 |
The resampled images for different original images seem to be similar in their abstract appearance, yet they do not become more similar over time.
original | van-gogh s=100 | picasso s=100 | cezanne s=100 | el-greco s=100 | gauguin s=100 | kandinsky s=100 | kirchner s=100 | monet s=100 | morisot s=100 | peploe s=100 | pollock s=100 | roerich s=100 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Looking at 100 iterations for two styles of the same picture...
the results have been visualized using the following methods:
- For few test images scikit-learn's t-SNE and matplotlib give simple 2D images that indicate that embeddings are rather drifting apart:
visualizaion and parameters | van-gogh s=100 | picasso s=100 | cezanne s=100 | el-greco s=100 | gauguin s=100 | kandinsky s=100 | kirchner s=100 | monet s=100 | morisot s=100 | peploe s=100 | pollock s=100 | roerich s=100 |
t-SNEn_components=2 verbose=1 perplexity=50 n_iter=1000 |
||||||||||||
UMAPn_components=2 n_neighbors=50 |
- For more advanced plots (3D) TensorBoard offers nice visualization for embeddings as well as scalar values at the cost of at times very high latency and long loading times for projecting embeddings.
PCA | t-SNE |
Tensorboard seems to be problematic when using more than 1000 datapoints especially as the checkpoint files tend to reach 10s of GB. It is easier and more practical to use bokeh for thi, especially when Tensorboard runs not locally and has to send all of its data to the user.
- For validation purposes use again UMAP. This advertises the use of a different visualization tool. In this case
bokeh
was used, which itself usesvis.js
for plotting 3D graphs (n_components=3 n_neighbors=150
)
Contrary to my expectations, the process of reencoding image data does not tend to converge. Interestingly, after only ~10 (maybe a bit more) reencodings it is almost impossible for a human to infer the original image or the next 10 reencoding steps. Looking at the scalar data and the plots, it even looks like the embeddings diverge to a certain degree although the reencodings of different original images look similar to the human eye. Even for low resolution pictures (30x30 px) neither the image data nor the embeddings converge for 10,000 iterations. Looking at the t-SNE plot the data looks like a raveled ball of wool.
This is the original picture for 10,000 iterations.
It was taken from the Places365 database and cropped to 30x30 px.
This is the result after 10,000 iterations in picasso style.
-
parallelize the process of inference
This is not really practical, as the increase in speed would be lower compared to starting multiple processes on different GPUs for different images.
Maybe this is applicable to training instead of inference? -> also difficult
-
log the discriminators certainty of each reencoding!
Still a good idea to certain degree, might show strengths/weaknesses of network.
Problem: The discriminators weights are not maintianed (?!) and one would have to train a new disrciminator
-
replace transformer block with i.e. flower vs. plane discriminator and then see if flowers/planes are better stylized
This should be talked about in advance, how much benefit this might give. The main problem is the very long training time for the original model (300,000 iterations per style).