WIP: Adding Domain adaptation #51

bruce-edelman · 2023-07-06T19:57:42Z

!!This PR is still a WIP!!

Adding Domain Adaptation following what was done for SIA and ReLEARN from https://www.biorxiv.org/content/10.1101/2023.03.01.529396v1 (their code lives at https://github.com/ziyimo/popgen-dom-adapt)

This requires two major changes to diploshic:

Forking the network architecture after feature extraction
- Discriminator fork of model has GRL to encourage the training to do a bad job at discriminating real/fake data
- use masked loss functions so that each task is only done for data that makes sense
Adjusting the data generators to enable the inclusion of empirical data (target data) that your simulated data (source data) wants to 'adapt' to
- this involves setting up a second 'Y' or target values for the discriminator prediction outputs

That should be it for the major implementation changes. The rest of this PR is small changes to the interfacing script that handles the logic of using the original model by default and then switching to the domain adaptive model with the CLI argument --domain-adaptation

Currently by default if you turn on domain adaptation then the code assumes that you have .fvec feature vector files created from your target domain data and stored in your training directory named empirical.fvec

Current steps left undone:

Construct different simulated data that is 'mis-matched' with current training data to see test the increased performance with domain adaptation if there is a mis-specification of your simulated data and data you want to do predictions on. -- this needs to be simulated data so we have labels to evaluate any changes in performance
Compare DA model with original in the mis-specification experiment
Compare predictions on the REAL data from soup to nuts example with original and DA data.

This reverts commit a9cbf17.

bruce-edelman · 2023-07-07T19:27:11Z

Just added small fixes to the bugs you found @andrewkern -- one of the bugs was because train_test_split needs all the same length arrays input so this requires the number of your observations in emprical.fvec need to be the same as your training sets.

For the current hack of using the neut.fvec as our fake target domain data I just copied these 2000 data points 5 times to give 10000 obs to match the simulations.

With this change and a few array shape fixes the model begins training with --domain-adaptation on just fine for me now

bruce-edelman · 2023-07-07T19:28:22Z

diploshic/diploSHIC

+    if argsDict["domain_adaptation"]:
+        empirical = np.loadtxt(trainingDir + "empirical.fvec", skiprows=1)
+        emp = np.reshape(empirical, (empirical.shape[0], nDims, numSubWins))
+        emp1 = np.concatenate((emp,emp,emp,emp,emp))


This is the copy 5x line that should be removed in the future when user passes in empirical target domain data the same length of their training set simulations

andrewkern · 2023-07-07T20:08:23Z

running this now! one warning I'm getting is

WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available. Available metrics are: loss,predictor_loss,discriminator_loss,predictor_accuracy,discriminator_accuracy

this has to do with the metrics on the early stopping criterion.

bruce-edelman · 2023-07-07T21:42:20Z

Fixed the callback issue -- have code change from 'val_accuracy' to 'val_predictor_accuracy' for checkpointing and early stopping when using domain adaptation

…gs up

…only publish to PyPi with tagged versions

bruce-edelman added 5 commits July 6, 2023 19:14

changes to enable domain adaptation

5b60f95

move imports back to save time when not in train/predict mode

6339e21

fix imports and make sure old version is still same output

1970c9e

replace imports to old locations

a081eb8

tie loss fcts to model output names for clarity

a15d1a5

bruce-edelman self-assigned this Jul 6, 2023

bruce-edelman added 2 commits July 6, 2023 21:15

try to make pypi publish only happen for changes to main

a9cbf17

Revert "try to make pypi publish only happen for changes to main"

943a14a

This reverts commit a9cbf17.

andrewkern self-requested a review July 6, 2023 23:37

bruce-edelman marked this pull request as draft July 7, 2023 18:43

fix small errors

1802e82

bruce-edelman commented Jul 7, 2023

View reviewed changes

fix loss bug in masked cce with reduce_all

ecb54a6

bruce-edelman marked this pull request as ready for review July 7, 2023 21:44

bruce-edelman added 12 commits July 18, 2023 15:51

add ignore

4aed9a2

get Domain adaptation fully working -- some refactoring to clean thin…

bb1d967

…gs up

cleanup

5a351b1

remove unused import

f9a6668

rework gh actions so it will test installations correctly on PRs and …

8a989ee

…only publish to PyPi with tagged versions

bump version number

f07537f

finish bumpign version add badges on README

3d0efac

fix action badge:

707f1f3

no dropout in discriminator branch

e58cd48

small changes

7504223

add more plots and training metrics to output

cc8a1ee

small changes -- save acc file

5dcf59d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Adding Domain adaptation #51

WIP: Adding Domain adaptation #51

bruce-edelman commented Jul 6, 2023

bruce-edelman commented Jul 7, 2023

bruce-edelman Jul 7, 2023

andrewkern commented Jul 7, 2023

bruce-edelman commented Jul 7, 2023

WIP: Adding Domain adaptation #51

Are you sure you want to change the base?

WIP: Adding Domain adaptation #51

Conversation

bruce-edelman commented Jul 6, 2023

bruce-edelman commented Jul 7, 2023

bruce-edelman Jul 7, 2023

Choose a reason for hiding this comment

andrewkern commented Jul 7, 2023

bruce-edelman commented Jul 7, 2023