Skip to content

Latest commit

 

History

History
27 lines (23 loc) · 1.99 KB

ideas.md

File metadata and controls

27 lines (23 loc) · 1.99 KB
  • use (V)AE to compress face/eyes representations

  • use siamese network to compress face/eyes representations .....................................................

  • learnable codes of environment

    • create a custom layer for embeddings. It should be able to save and load embeddings from the folder.
    • for training, generate folder "train-context" and put there files with ContextID_global, ContextID_local, ContextID_sublocal as names "{id}-{hash of trajectory}.bin"
    • before testing, drop contexts/embeddings and perform fine-tuning of embeddings (1k small batches, 1 epoch, linear annealing lr from 1e-1 to 1e-5, trainable only embeddings). embeddings should be saved to "trained-context" folder.
  • create test dataset. Its a folder with npz files. Each file is a single batch of clean samples.

  • create pretrain dataset. Its a single npz file. It contains sampled batches of augmented samples. I.e. it is a just like a batch of infinite size.

This idea partially implemented in this version. .....................................................

  • reduce the number of face keypoints

    • only eyes
    • eyes + face "corners" (4 points?)
  • add extra data ("center" between eyes, angles, etc)

  • normalize face keypoints

  • auxiliary loss for face keypoints, because they have currently minor impact on the final result (we can set them to -1 and get almost the same result)

  • normalize eyes images (by transforming them according to keypoints)

  • add keypoints as an additional image channel (maybe, it will help to learn better representation)

  • dataset filtration

    • use pretrained model to filter out frames which produce extremely diverged predictions (i.e. if we take several sets of frames, including "target" frame, we should get similar predictions for all of them)
    • smooth trajectories
  • split training dataset into smaller datasets (take 25% of frames for first training iteration, then 50%, then 75%, then 100%)