Release 1.1.0 (#56)

* Updated synthetic image mirror generation script, created helper function for generating images iin SyntheticImageGenerator class, moved notebooks to new notebooks dir. * Restored ensure_save_path func back to annotation_utils.py * Added latency tracking, max images generated field for synthetic image mirror pipeline. * Added mean synthetic image gen latency print statement * Update example arg inputs * Fix imports * Fixed and reformatted args * Suppressed TensorFlow warnings, fixed image gen from annotation * Index and loop bugfixes * Index, looping, args logic fixes. * Add load diffuser function call * Clear gpu after using synthetic image generator * Always load from Hugging Face. * Batch processing for memory optimization. Added optional name field to generate_image in SyntheticImageGenerato for future customization. * Memory optimizations (saving annotation .jsons to disk), added args for chunking, pm2 examples * Fix save as json on disk, ensure no hanging reference when gpu is cleared in SyntheticImageGenerator * Replaced generic DiffusionPipeline with StableDiffusionPipeline that inherits from it. Specified generated image dimensions in diffuser call params. * convert diffuser to float32 before moving onto cpu, fixed duplicate image count logs * Added a testing function to save images from real image dataset, changed annotations 'index' field to 'id' for consistency, various data loading and parameter fixes * Added pipeline for diffusion models to constants.py, and dynamic pipeline loading and image size customization to generation. * Fixed Hugging Face authentication errors. Added instruction to authenticate with huggingface-cli login * Fixed all annotations being used to generate mirrors regardless of start and end indices * Added a new load_and_sort_dataset function to handle Hugging Face dataset rows being ordered by filename string-wise instead of numerically. Added generate_synthetic_images arg and updated dataset naming conventions for parallelization-friendliness. Disabled diffusion pipeline progress bars. Added const for progress updates in terminal. * Removed extra disable progress bar call. Added ceil import for progress calculation. * Adjust Hugging Face annotations dataset name * Reverted annotations dataset name to have data range, now requiring start_index and end_index args. * Re-removed data range from annotations * Update 'index' to 'id' * Fixed loading annotations from Hugging Face and savng specified indices to disk. * Utils refactored, smaller functions. Added resize arg. Added combine_datasets script to put together all generated splits into one Hugging Face dataset. * Replace hardcoded name * Fix fstring * Fix args * Fixed typos * Updated combine_datasets.py to match Hugging Face dataset nomenclature. * removing unused files * initial validator forward pytest * initial ci.yml * new mock classes for ci workflow * temporarily removing old version of generate_synthetic_data.py * rename get_mock_image() -> create_random_image() * adding test_mock.py * renaming build -> test step in ci.yml * test_rewards.py * parameterizing fake_prob to allow intentionally testing real/synth image flows in vali fwd * forcing vali fwd through real and synth image flows * fake_prob -> _fake_prob * using dot operator to read config in mock vali until I replace namespace cfg with bt.config * allowing mock code to skip force_register_neuron in the case that the neuron was already registered in previous test instance * removing unused circleci dir from template repo * image transforms tests * fixing setting of mock dentrite process_time * adding test_mock.py * reset mock chain state in between test cases * cleaning up state management for MockSubtensor * __init__.py * replacing hardcoded string with random image b64 * Fixed saving synthetic images after resizing. * new auto update implementation from sn19 * inital self heal script from sn19 * Flag for downloading annotations from HuggingFace * fixing reference to self.config * Enforcing no watermarking in all cases * self heal in autoupdate script * making autoupdate scripts executable * self heal restart 6 -> 6.5 * typo * allowing --no-auto-update and --no-self-heal for validators * combining run scripts into run_neuron.py * replacing neuron type with --validator and --miner * documentation updates for new run script * docs update * adding wandb to docs * Arg for skipping annotation generation * Prompt truncation for annotations longer than max token length * Suppress token max exceeded warning, cleaned up error logging * Removed all tqdm loading bars, cleaned imports, updated fake dataset paths to parquet versions. * Improved annotation cleanliness with inter-prompt spacing and stripped endings. * removing fixtures reference from mock.py * read btcli args from .env * docs update * Formatting * fixing fixtures import * adding .env file creation to install script * moving network (test/finney) into .env, reducing script redundancy * missing netuid arg for MockSubtensor/MockMetagraph inits in test * adding .env to .gitignore * AXON_PORT -> MINER_AXON_PORT env var rename * docs updates to reflect latest run_neuron.py updates * updating .env paths * small docs update * Fixed annotation json filenames not starting with start_idx arg * locking down version numbers * Added docstrings and comments * fixing image_index field for wanbd logging * try except for wandb init * adding retries for nan images * fixing image isnan check by adding np.any * rename wandb fields *image_id -> *image_name * Updated failure case for generating annotations. * Adjusted TF logging level to include error messages. Cleaned up unnecessary imports. Simplified clear_gpu to not moving tensor to CPU. * Reverted deletion of necessary diffusion pipeline imports. Adjusted TF logging level in dataset generation script to be consistent with synthetic generation classes. * adding a sleep to reduce metagraph resync freq * fixing edge case that occurs when only 1 miner has nonzero weight * bump version to 1.0.2 * fixing download_data extension * Update fake dataset paths * replacing conda activate with /home/user/mambaforge/envs/tensorml * Base miner training improvements and Content-Aware Model Orchestration (CAMO) Framework (#55) * Added DeepfakeBench submodule to base_miner dir * Added initial adaptation of pretrained UCF inference. Refactored NPR files into new dir. * Added setup readme and a sample image for inference. * Added loss functions and backbone network. Updated readme. * Enable loading model checkpoints from Hugging Face. * migrated training scripts from DeepfakeBench * Added package initialization, renaming configs to config. * Added fix to missing weights directory * Finished ucf_test on sample images. * Added dlib shape detector for face detection and alignment * Added face_recognition implementation of face alignment * Fixed variable names * Update dlib requirements * Implemented ucf_miner and created a class for the pretrained UCF model * Renamed files for clarity. Added unit test for pretrained UCF. * Migrated train utils from NPR base miner, modified train_ucf.py to use BitMind datasets * Fix image input type errors * Added xception training backbone, logging files * Detectors module path fix * bug fixes for live miner * BitMind data load and restructure for integrated DeepfakeBench train loop * Added DeepfakeBench training logs to .gitignore * Removed unused import, local data saving * Fixed prediction_class referenced before assignment * Fixed test metric using logits and not class labels * Train source labels for learning specific forgery, added separate test and validation loops * Corrected variable name typo * Refactored eval in training loop, renamed test stage to validation * Implemented source label mapping in UCF training splits * Added test stage, source labels to training data dict for learning dataset specific forgery features * Added gpu cache cleanup, now using configs for batch size and data loader workers. * Batch to cpu after train loader iteration, logging cleanup * Fixed test metrics not logging * Added logging for train and test time * Added image normalization for training data * Re-added check for data label during inference. * Adjusted UCF image normalization to be in line with config. Fixed processing of local images for UCF testing. * Adjusted image preprocessing for experiments. * Added face cropping and alignment to preprocess images for UCF detection. * Typo fixes, added readme to credit face shape predictor file. * Made face crop and align False by default. * New miner script for running UCF-BitMind * Added handling for the case when face_detector does not find any images. Reduced warning messages. * Removed duplicate function. * Adapted face detection and extraction functions to UCF class for modularity. Updated and refactored test, miner scripts. * First iteration of context_aware_miner.py * Fixed ucf_miner import error by simplifying path and import statement for UCF module. * Fixed dlib predictor path, explicitly define map_location for torch.load * Fixed imports in ucf_bitmind_miner and removed rounding or predictions. * Fixed imports for context aware miner. * Fix NPR model weight variable name. * Typo * Added Context-Aware v2 with UCF-BitMind for general images * Remove unused import * Moved UCF model loading to init function of Context Aware Miner v2 * Added free memory function to manage resources for multi-model miners. * Added script for miners to test their model loading and inference latency. * Moved model loading to init functions to avoid reloading. * Updated minimum miner requirements to require GPU * Fixed indents, load UCF-DFB model in init func * Update check for faces to be consistent with DFB preprocessing * Release 1.0.2 (#50) * Updated synthetic image mirror generation script, created helper function for generating images iin SyntheticImageGenerator class, moved notebooks to new notebooks dir. * Restored ensure_save_path func back to annotation_utils.py * Added latency tracking, max images generated field for synthetic image mirror pipeline. * Added mean synthetic image gen latency print statement * Update example arg inputs * Fix imports * Fixed and reformatted args * Suppressed TensorFlow warnings, fixed image gen from annotation * Index and loop bugfixes * Index, looping, args logic fixes. * Add load diffuser function call * Clear gpu after using synthetic image generator * Always load from Hugging Face. * Batch processing for memory optimization. Added optional name field to generate_image in SyntheticImageGenerato for future customization. * Memory optimizations (saving annotation .jsons to disk), added args for chunking, pm2 examples * Fix save as json on disk, ensure no hanging reference when gpu is cleared in SyntheticImageGenerator * Replaced generic DiffusionPipeline with StableDiffusionPipeline that inherits from it. Specified generated image dimensions in diffuser call params. * convert diffuser to float32 before moving onto cpu, fixed duplicate image count logs * Added a testing function to save images from real image dataset, changed annotations 'index' field to 'id' for consistency, various data loading and parameter fixes * Added pipeline for diffusion models to constants.py, and dynamic pipeline loading and image size customization to generation. * Fixed Hugging Face authentication errors. Added instruction to authenticate with huggingface-cli login * Fixed all annotations being used to generate mirrors regardless of start and end indices * Added a new load_and_sort_dataset function to handle Hugging Face dataset rows being ordered by filename string-wise instead of numerically. Added generate_synthetic_images arg and updated dataset naming conventions for parallelization-friendliness. Disabled diffusion pipeline progress bars. Added const for progress updates in terminal. * Removed extra disable progress bar call. Added ceil import for progress calculation. * Adjust Hugging Face annotations dataset name * Reverted annotations dataset name to have data range, now requiring start_index and end_index args. * Re-removed data range from annotations * Update 'index' to 'id' * Fixed loading annotations from Hugging Face and savng specified indices to disk. * Utils refactored, smaller functions. Added resize arg. Added combine_datasets script to put together all generated splits into one Hugging Face dataset. * Replace hardcoded name * Fix fstring * Fix args * Fixed typos * Updated combine_datasets.py to match Hugging Face dataset nomenclature. * removing unused files * initial validator forward pytest * initial ci.yml * new mock classes for ci workflow * temporarily removing old version of generate_synthetic_data.py * rename get_mock_image() -> create_random_image() * adding test_mock.py * renaming build -> test step in ci.yml * test_rewards.py * parameterizing fake_prob to allow intentionally testing real/synth image flows in vali fwd * forcing vali fwd through real and synth image flows * fake_prob -> _fake_prob * using dot operator to read config in mock vali until I replace namespace cfg with bt.config * allowing mock code to skip force_register_neuron in the case that the neuron was already registered in previous test instance * removing unused circleci dir from template repo * image transforms tests * fixing setting of mock dentrite process_time * adding test_mock.py * reset mock chain state in between test cases * cleaning up state management for MockSubtensor * __init__.py * replacing hardcoded string with random image b64 * Fixed saving synthetic images after resizing. * new auto update implementation from sn19 * inital self heal script from sn19 * Flag for downloading annotations from HuggingFace * fixing reference to self.config * Enforcing no watermarking in all cases * self heal in autoupdate script * making autoupdate scripts executable * self heal restart 6 -> 6.5 * typo * allowing --no-auto-update and --no-self-heal for validators * combining run scripts into run_neuron.py * replacing neuron type with --validator and --miner * documentation updates for new run script * docs update * adding wandb to docs * Arg for skipping annotation generation * Prompt truncation for annotations longer than max token length * Suppress token max exceeded warning, cleaned up error logging * Removed all tqdm loading bars, cleaned imports, updated fake dataset paths to parquet versions. * Improved annotation cleanliness with inter-prompt spacing and stripped endings. * removing fixtures reference from mock.py * read btcli args from .env * docs update * Formatting * fixing fixtures import * adding .env file creation to install script * moving network (test/finney) into .env, reducing script redundancy * missing netuid arg for MockSubtensor/MockMetagraph inits in test * adding .env to .gitignore * AXON_PORT -> MINER_AXON_PORT env var rename * docs updates to reflect latest run_neuron.py updates * updating .env paths * small docs update * Fixed annotation json filenames not starting with start_idx arg * locking down version numbers * Added docstrings and comments * fixing image_index field for wanbd logging * try except for wandb init * adding retries for nan images * fixing image isnan check by adding np.any * rename wandb fields *image_id -> *image_name * Updated failure case for generating annotations. * Adjusted TF logging level to include error messages. Cleaned up unnecessary imports. Simplified clear_gpu to not moving tensor to CPU. * Reverted deletion of necessary diffusion pipeline imports. Adjusted TF logging level in dataset generation script to be consistent with synthetic generation classes. * adding a sleep to reduce metagraph resync freq * fixing edge case that occurs when only 1 miner has nonzero weight * bump version to 1.0.2 * fixing download_data extension * Update fake dataset paths * replacing conda activate with /home/user/mambaforge/envs/tensorml --------- Co-authored-by: Benjamin <[email protected]> Co-authored-by: aliang322 <[email protected]> * TrainingDatasetProcessor class for loading, generating, and uploading preprocessing face only images into training datasets * Fixed transform dict var name * Removed normalize function in training dataset creation * Changed config dict to faces_only bool for clarity, changed hf repo type to dataset * Added splits instance variable, generalized function names, non face only processing * Script for interfacing with TrainingDatasetProcessor to create and upload preprocessed training datasets. * Added usage examples and explanation * Removed unused import * Simplified repo upload naming convention. * Added original image index column to training datasets. Consolidated transform datasets into HF subsets. * Added local save/load, upload repo destination options; Fixed dataset preprocessing to be performed in-place. * Created create_splits() helper function * Fixed not clearing dataset memory when loading pickle * Created Context-Aware Hierarchical Mixture-of-Agents (CAMO) miner. * Created modular helper function for loading detectors. * Rewording comments * Added YOLOv8 object detection for image classification * Renamed create_splits() to split_dataset() and added support for subset loading * Added HuggingFace subset download option * Restructured data loading for training. Support for face only subset loading with stratified splits. * Added YOLOv8 object detection experiments, renamed CAMO miner. * Updated paths for new CAMO model weights * Added object detection error checks * Switch debug to bt.logging, formatting * Set use_object_detection to False for current iteration of CAMO * Fixed assertion error on last batch of train epoch when incomplete batches present by dropping last in dataloaders. * Added data shuffling prior to splitting, check for disjoint stratified split indices. * Fixed faces not being used by face expert. * Improved clarity with face processing helper functions * Generalized to optionally include source label mapping * Consolidated real_fake_dataset.py into bitmind directory, updated references * Fixed docstring typo * Parameterized shuffling before splits, generalized params to adopt expert model terminology, made source label mapping optional * Renamed params appropriately, fixed generalist UCF not using source labels * Reformatted long lines, fixed split size printing num batches * Removed UCF-specific data utils, replaced with generalized utils at bitmind level * Standardized usage of bitmind.utils.train_data for data load/split across NPR and UCF base models * adding versions for new package * Update Mining.md * moving miner/validator specific dependencies to new requirements filse * setup scripts specific to miner/validator reqs * Relocated train/predict data processing scripts, updated imports and paths * Removed redundant face detection utils from UCF dir * Comment crediting original source of UCF training scripts * Removed DeepfakeBench submodule * Added auto download for backbone weights if not locally present * Removed redundant video metrics from validation logs * Updated README with script usage, removed deprecated manual weight download instructions * Cleaned up leftover DFB train configs, fixed training error when using only 1 real/fake train dataset * Deleted whitespace * Cleaned unused DFB config labels * adding subtensor.chain_endpoint to startup scripts * Added default value for specific task number for training * Standardized UCF paths with consts across UCF neurons and training files * Cleaned up experimental files. * Update Mining.md * Update Validating.md * Standardized neuron naming * Updated default miner to camo_miner.py * Fixed forgery dataset/method disentangling by setting specific_task_number value to num of fake datasets + 1 for real label * Added readme file for camo base miner * Fixed UCF constant weight path names * Removed non-validator dataset generation scripts, fixed camo readme. * Weights constant name typo * Fixed UCF miner imports * Added missing UCF weights import * Cleaned up utils and unnecessary files * adding testnet chain endpoint to docs * adding miner/vali dep installs to ci.yml --------- Co-authored-by: Benjamin <[email protected]> Co-authored-by: default <[email protected]> Co-authored-by: aliang322 <[email protected]> Co-authored-by: Ken Miyachi <[email protected]> Co-authored-by: Dylan Uys <[email protected]> * bump version 1.0.2 -> 1.1.0 * Removed sample images in UCF directory * parameterizing neuron filepath * Added base miner dir readme with CAMO information --------- Co-authored-by: Benjamin <[email protected]> Co-authored-by: aliang322 <[email protected]> Co-authored-by: default <[email protected]> Co-authored-by: aliang322 <[email protected]> Co-authored-by: Ken Miyachi <[email protected]>
BitMind-AI · Sep 5, 2024 · 21d65f3 · 21d65f3
1 parent 6cc7811
commit 21d65f3
Show file tree

Hide file tree

Showing 83 changed files with 5,208 additions and 565 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -28,6 +28,8 @@ jobs:
         python -m pip install --upgrade pip
         pip install flake8 pytest pytest-asyncio
         pip install -r requirements.txt
+        pip install -r requirements-miner.txt
+        pip install -r requirements-validator.txt
     - name: Lint with flake8
       run: |
         # stop the build if there are Python syntax errors or undefined names

diff --git a/.gitignore b/.gitignore
@@ -163,4 +163,7 @@ testing/
 data/
 checkpoints/
 .requirements_installed
-*.env
+base_miner/UCF/weights/*
+base_miner/UCF/logs/*
+miner_eval.py
+*.env
diff --git a/base_miner/__init__.py → .gitmodules b/base_miner/__init__.py → .gitmodules
diff --git a/base_miner/NPR.png → base_miner/NPR/NPR.png b/base_miner/NPR.png → base_miner/NPR/NPR.png
diff --git a/base_miner/NPR/README.md b/base_miner/NPR/README.md
@@ -0,0 +1,90 @@
+# Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
+
+<hr style="border:2px solid gray">
+
+Our base miner code is taken from the [NPR-DeepfakeDetection respository](https://github.com/chuangchuangtan/NPR-DeepfakeDetection). Huge thank you to the authors for their work on their CVPR paper and this codebase!<br>
+-- Bitmind Devs
+
+<hr style="border:2px solid gray">
+
+
+<p align="center">
+	<br>
+	Beijing Jiaotong University, YanShan University, A*Star
+</p>
+<br>
+
+
+<img src="./NPR.png" width="100%" alt="overall pipeline">
+
+Reference github repository for the paper [Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection](https://arxiv.org/abs/2312.10461).
+```
+@misc{tan2023rethinking,
+      title={Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection}, 
+      author={Chuangchuang Tan and Huan Liu and Yao Zhao and Shikui Wei and Guanghua Gu and Ping Liu and Yunchao Wei},
+      year={2023},
+      eprint={2312.10461},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
+
+## News 🆕
+- `2024/02`: NPR is accepted by CVPR 2024! Congratulations and thanks to my all co-authors!
+
+
+
+## Environment setup
+**Classification environment:** 
+We recommend installing the required packages by running the command:
+```sh
+pip install -r requirements.txt
+```
+In order to ensure the reproducibility of the results, we provide the following suggestions：
+- Docker image: nvcr.io/nvidia/tensorflow:21.02-tf1-py3
+- Conda environment: [./pytorch18/bin/python](https://drive.google.com/file/d/16MK7KnPebBZx5yeN6jqJ49k7VWbEYQPr/view) 
+- Random seed during testing period: [Random seed](https://github.com/chuangchuangtan/NPR-DeepfakeDetection/blob/b4e1bfa59ec58542ab5b1e78a3b75b54df67f3b8/test.py#L14)
+
+## Getting the data
+Download dataset from [CNNDetection CVPR2020](https://github.com/peterwang512/CNNDetection), [UniversalFakeDetect CVPR2023](https://github.com/Yuheng-Li/UniversalFakeDetect) ([googledrive](https://drive.google.com/drive/folders/1nkCXClC7kFM01_fqmLrVNtnOYEFPtWO-?usp=drive_link)), [DIRE 2023ICCV](https://github.com/ZhendongWang6/DIRE) ([googledrive](https://drive.google.com/drive/folders/1jZE4hg6SxRvKaPYO_yyMeJN_DOcqGMEf?usp=sharing)), [GANGen-Detection](https://github.com/chuangchuangtan/GANGen-Detection) ([googledrive](https://drive.google.com/drive/folders/11E0Knf9J1qlv2UuTnJSOFUjIIi90czSj?usp=sharing)), Diffusion1kStep [googledrive](https://drive.google.com/drive/folders/14f0vApTLiukiPvIHukHDzLujrvJpDpRq?usp=sharing).
+```
+pip install gdown==4.7.1
+
+chmod 777 ./download_dataset.sh
+
+./download_dataset.sh
+```
+
+## Training the model 
+```sh
+CUDA_VISIBLE_DEVICES=0 python train.py --name 4class-resnet-car-cat-chair-horse --dataroot {CNNDetection-Path} --classes car,cat,chair,horse --batch_size 32 --delr_freq 10 --lr 0.0002 --niter 50
+```
+
+## Testing the detector
+Modify the dataroot in test.py.
+```sh
+CUDA_VISIBLE_DEVICES=0 python test.py --model_path ./NPR.pth  -batch_size {BS}
+```
+<!-- 
+## Detection Results
+
+| <font size=2>Method</font>|<font size=2>ProGAN</font> |       |<font size=2>StyleGAN</font>|     |<font size=2>StyleGAN2</font>|    |<font size=2>BigGAN</font>|       |<font size=2>CycleGAN</font> |      |<font size=2>StarGAN</font>|       |<font size=2>GauGAN</font> |       |<font size=2>Deepfake</font>|    | <font size=2>Mean</font> |      |
+|:----------------------:|:-----:|:-----:|:------:|:---:|:-------:|:--:|:----:|:-----:|:-------:|:----:|:----: |:-----:|:---:  |:-----:|:----:|:----:|:----:|:----:|
+|                        | Acc.  | A.P.  | Acc.   | A.P.| Acc.  | A.P. | Acc.| A.P.   | Acc.    | A.P. | Acc.  | A.P.  | Acc.  | A.P.  | Acc. | A.P. | Acc. | A.P. |
+| CNNDetection           | 91.4  | 99.4  | 63.8   | 91.4| 76.4  | 97.5 | 52.9| 73.3   | 72.7    | 88.6 | 63.8  | 90.8  | 63.9  | 92.2  | 51.7 | 62.3 | 67.1 | 86.9 |
+| Frank                  | 90.3  | 85.2  | 74.5   | 72.0| 73.1  | 71.4 | 88.7| 86.0   | 75.5    | 71.2 | 99.5  | 99.5  | 69.2  | 77.4  | 60.7 | 49.1 | 78.9 | 76.5 |
+| Durall                 | 81.1  | 74.4  | 54.4   | 52.6| 66.8  | 62.0 | 60.1| 56.3   | 69.0    | 64.0 | 98.1  | 98.1  | 61.9  | 57.4  | 50.2 | 50.0 | 67.7 | 64.4 |
+| Patchfor               | 97.8  | 100.0 | 82.6   | 93.1| 83.6  | 98.5 | 64.7| 69.5   | 74.5    | 87.2 | 100.0 | 100.0 | 57.2  | 55.4  | 85.0 | 93.2 | 80.7 | 87.1 |
+| F3Net                  | 99.4  | 100.0 | 92.6   | 99.7| 88.0  | 99.8 | 65.3| 69.9   | 76.4    | 84.3 | 100.0 | 100.0 | 58.1  | 56.7  | 63.5 | 78.8 | 80.4 | 86.2 |
+| SelfBland              | 58.8  | 65.2  | 50.1   | 47.7| 48.6  | 47.4 | 51.1| 51.9   | 59.2    | 65.3 | 74.5  | 89.2  | 59.2  | 65.5  | 93.8 | 99.3 | 61.9 | 66.4 |
+| GANDetection           | 82.7  | 95.1  | 74.4   | 92.9| 69.9  | 87.9 | 76.3| 89.9   | 85.2    | 95.5 | 68.8  | 99.7  | 61.4  | 75.8  | 60.0 | 83.9 | 72.3 | 90.1 |
+| BiHPF                  | 90.7  | 86.2  | 76.9   | 75.1| 76.2  | 74.7 | 84.9| 81.7   | 81.9    | 78.9 | 94.4  | 94.4  | 69.5  | 78.1  | 54.4 | 54.6 | 78.6 | 77.9 |
+| FrePGAN                | 99.0  | 99.9  | 80.7   | 89.6| 84.1  | 98.6 | 69.2| 71.1   | 71.1    | 74.4 | 99.9  | 100.0 | 60.3  | 71.7  | 70.9 | 91.9 | 79.4 | 87.2 |
+| LGrad                  | 99.9  | 100.0 | 94.8   | 99.9| 96.0  | 99.9 | 82.9| 90.7   | 85.3    | 94.0 | 99.6  | 100.0 | 72.4  | 79.3  | 58.0 | 67.9 | 86.1 | 91.5 |
+| Ojha                   | 99.7  | 100.0 | 89.0   | 98.7| 83.9  | 98.4 | 90.5| 99.1   | 87.9    | 99.8 | 91.4  | 100.0 | 89.9  | 100.0 | 80.2 | 90.2 | 89.1 | 98.3 |
+| NPR(our)               | 99.8  | 100.0 | 96.3   | 99.8| 97.3  | 100.0| 87.5| 94.5   | 95.0    | 99.5 | 99.7  | 100.0 | 86.6  | 88.8  | 77.4 | 86.2 | 92.5 | 96.1 |
+-->
+
+## Acknowledgments
+
+This repository borrows partially from the [CNNDetection](https://github.com/peterwang512/CNNDetection).
diff --git a/base_miner/networks/__init__.py → base_miner/NPR/__init__.py b/base_miner/networks/__init__.py → base_miner/NPR/__init__.py
diff --git a/base_miner/download_dataset.sh → base_miner/NPR/download_dataset.sh b/base_miner/download_dataset.sh → base_miner/NPR/download_dataset.sh
diff --git a/base_miner/eval_detector.ipynb → base_miner/NPR/eval_detector.ipynb b/base_miner/eval_detector.ipynb → base_miner/NPR/eval_detector.ipynb
diff --git a/base_miner/NPR/networks/__init__.py b/base_miner/NPR/networks/__init__.py
diff --git a/base_miner/networks/base_model.py → base_miner/NPR/networks/base_model.py b/base_miner/networks/base_model.py → base_miner/NPR/networks/base_model.py
diff --git a/base_miner/networks/resnet.py → base_miner/NPR/networks/resnet.py b/base_miner/networks/resnet.py → base_miner/NPR/networks/resnet.py
diff --git a/base_miner/networks/trainer.py → base_miner/NPR/networks/trainer.py b/base_miner/networks/trainer.py → base_miner/NPR/networks/trainer.py
diff --git a/base_miner/options/__init__.py → base_miner/NPR/options/__init__.py b/base_miner/options/__init__.py → base_miner/NPR/options/__init__.py
diff --git a/base_miner/options/base_options.py → base_miner/NPR/options/base_options.py b/base_miner/options/base_options.py → base_miner/NPR/options/base_options.py
diff --git a/base_miner/options/test_options.py → base_miner/NPR/options/test_options.py b/base_miner/options/test_options.py → base_miner/NPR/options/test_options.py
diff --git a/base_miner/options/train_options.py → base_miner/NPR/options/train_options.py b/base_miner/options/train_options.py → base_miner/NPR/options/train_options.py
diff --git a/base_miner/requirements.txt → base_miner/NPR/requirements.txt b/base_miner/requirements.txt → base_miner/NPR/requirements.txt
diff --git a/base_miner/test.py → base_miner/NPR/test.py b/base_miner/test.py → base_miner/NPR/test.py
diff --git a/base_miner/train_detector.ipynb → base_miner/NPR/train_detector.ipynb b/base_miner/train_detector.ipynb → base_miner/NPR/train_detector.ipynb
diff --git a/base_miner/train_detector.py → base_miner/NPR/train_detector.py b/base_miner/train_detector.py → base_miner/NPR/train_detector.py
@@ -9,7 +9,7 @@
 import torch
 
 from bitmind.image_transforms import base_transforms, random_aug_transforms
-from util.data import load_datasets, create_real_fake_datasets
+from bitmind.dataset_processing.load_split_data import load_datasets, create_real_fake_datasets
 from options import TrainOptions
 
 

diff --git a/base_miner/util/__init__.py → base_miner/NPR/util/__init__.py b/base_miner/util/__init__.py → base_miner/NPR/util/__init__.py
diff --git a/base_miner/util/eval.py → base_miner/NPR/util/eval.py b/base_miner/util/eval.py → base_miner/NPR/util/eval.py
diff --git a/base_miner/validate.py → base_miner/NPR/validate.py b/base_miner/validate.py → base_miner/NPR/validate.py
diff --git a/base_miner/visualize_filters.ipynb → base_miner/NPR/visualize_filters.ipynb b/base_miner/visualize_filters.ipynb → base_miner/NPR/visualize_filters.ipynb
diff --git a/base_miner/README.md b/base_miner/README.md
@@ -1,90 +1,5 @@
-# Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
+## Base Miners
 
-<hr style="border:2px solid gray">
+This directory contains subfolders with model architectures and training loops for base miners.
 
-Our base miner code is taken from the [NPR-DeepfakeDetection respository](https://github.com/chuangchuangtan/NPR-DeepfakeDetection). Huge thank you to the authors for their work on their CVPR paper and this codebase!<br>
--- Bitmind Devs
-
-<hr style="border:2px solid gray">
-
-
-<p align="center">
-	<br>
-	Beijing Jiaotong University, YanShan University, A*Star
-</p>
-<br>
-
-
-<img src="./NPR.png" width="100%" alt="overall pipeline">
-
-Reference github repository for the paper [Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection](https://arxiv.org/abs/2312.10461).
-```
-@misc{tan2023rethinking,
-      title={Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection}, 
-      author={Chuangchuang Tan and Huan Liu and Yao Zhao and Shikui Wei and Guanghua Gu and Ping Liu and Yunchao Wei},
-      year={2023},
-      eprint={2312.10461},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
-}
-```
-
-## News 🆕
-- `2024/02`: NPR is accepted by CVPR 2024! Congratulations and thanks to my all co-authors!
-
-
-
-## Environment setup
-**Classification environment:** 
-We recommend installing the required packages by running the command:
-```sh
-pip install -r requirements.txt
-```
-In order to ensure the reproducibility of the results, we provide the following suggestions：
-- Docker image: nvcr.io/nvidia/tensorflow:21.02-tf1-py3
-- Conda environment: [./pytorch18/bin/python](https://drive.google.com/file/d/16MK7KnPebBZx5yeN6jqJ49k7VWbEYQPr/view) 
-- Random seed during testing period: [Random seed](https://github.com/chuangchuangtan/NPR-DeepfakeDetection/blob/b4e1bfa59ec58542ab5b1e78a3b75b54df67f3b8/test.py#L14)
-
-## Getting the data
-Download dataset from [CNNDetection CVPR2020](https://github.com/peterwang512/CNNDetection), [UniversalFakeDetect CVPR2023](https://github.com/Yuheng-Li/UniversalFakeDetect) ([googledrive](https://drive.google.com/drive/folders/1nkCXClC7kFM01_fqmLrVNtnOYEFPtWO-?usp=drive_link)), [DIRE 2023ICCV](https://github.com/ZhendongWang6/DIRE) ([googledrive](https://drive.google.com/drive/folders/1jZE4hg6SxRvKaPYO_yyMeJN_DOcqGMEf?usp=sharing)), [GANGen-Detection](https://github.com/chuangchuangtan/GANGen-Detection) ([googledrive](https://drive.google.com/drive/folders/11E0Knf9J1qlv2UuTnJSOFUjIIi90czSj?usp=sharing)), Diffusion1kStep [googledrive](https://drive.google.com/drive/folders/14f0vApTLiukiPvIHukHDzLujrvJpDpRq?usp=sharing).
-```
-pip install gdown==4.7.1
-
-chmod 777 ./download_dataset.sh
-
-./download_dataset.sh
-```
-
-## Training the model 
-```sh
-CUDA_VISIBLE_DEVICES=0 python train.py --name 4class-resnet-car-cat-chair-horse --dataroot {CNNDetection-Path} --classes car,cat,chair,horse --batch_size 32 --delr_freq 10 --lr 0.0002 --niter 50
-```
-
-## Testing the detector
-Modify the dataroot in test.py.
-```sh
-CUDA_VISIBLE_DEVICES=0 python test.py --model_path ./NPR.pth  -batch_size {BS}
-```
-<!-- 
-## Detection Results
-
-| <font size=2>Method</font>|<font size=2>ProGAN</font> |       |<font size=2>StyleGAN</font>|     |<font size=2>StyleGAN2</font>|    |<font size=2>BigGAN</font>|       |<font size=2>CycleGAN</font> |      |<font size=2>StarGAN</font>|       |<font size=2>GauGAN</font> |       |<font size=2>Deepfake</font>|    | <font size=2>Mean</font> |      |
-|:----------------------:|:-----:|:-----:|:------:|:---:|:-------:|:--:|:----:|:-----:|:-------:|:----:|:----: |:-----:|:---:  |:-----:|:----:|:----:|:----:|:----:|
-|                        | Acc.  | A.P.  | Acc.   | A.P.| Acc.  | A.P. | Acc.| A.P.   | Acc.    | A.P. | Acc.  | A.P.  | Acc.  | A.P.  | Acc. | A.P. | Acc. | A.P. |
-| CNNDetection           | 91.4  | 99.4  | 63.8   | 91.4| 76.4  | 97.5 | 52.9| 73.3   | 72.7    | 88.6 | 63.8  | 90.8  | 63.9  | 92.2  | 51.7 | 62.3 | 67.1 | 86.9 |
-| Frank                  | 90.3  | 85.2  | 74.5   | 72.0| 73.1  | 71.4 | 88.7| 86.0   | 75.5    | 71.2 | 99.5  | 99.5  | 69.2  | 77.4  | 60.7 | 49.1 | 78.9 | 76.5 |
-| Durall                 | 81.1  | 74.4  | 54.4   | 52.6| 66.8  | 62.0 | 60.1| 56.3   | 69.0    | 64.0 | 98.1  | 98.1  | 61.9  | 57.4  | 50.2 | 50.0 | 67.7 | 64.4 |
-| Patchfor               | 97.8  | 100.0 | 82.6   | 93.1| 83.6  | 98.5 | 64.7| 69.5   | 74.5    | 87.2 | 100.0 | 100.0 | 57.2  | 55.4  | 85.0 | 93.2 | 80.7 | 87.1 |
-| F3Net                  | 99.4  | 100.0 | 92.6   | 99.7| 88.0  | 99.8 | 65.3| 69.9   | 76.4    | 84.3 | 100.0 | 100.0 | 58.1  | 56.7  | 63.5 | 78.8 | 80.4 | 86.2 |
-| SelfBland              | 58.8  | 65.2  | 50.1   | 47.7| 48.6  | 47.4 | 51.1| 51.9   | 59.2    | 65.3 | 74.5  | 89.2  | 59.2  | 65.5  | 93.8 | 99.3 | 61.9 | 66.4 |
-| GANDetection           | 82.7  | 95.1  | 74.4   | 92.9| 69.9  | 87.9 | 76.3| 89.9   | 85.2    | 95.5 | 68.8  | 99.7  | 61.4  | 75.8  | 60.0 | 83.9 | 72.3 | 90.1 |
-| BiHPF                  | 90.7  | 86.2  | 76.9   | 75.1| 76.2  | 74.7 | 84.9| 81.7   | 81.9    | 78.9 | 94.4  | 94.4  | 69.5  | 78.1  | 54.4 | 54.6 | 78.6 | 77.9 |
-| FrePGAN                | 99.0  | 99.9  | 80.7   | 89.6| 84.1  | 98.6 | 69.2| 71.1   | 71.1    | 74.4 | 99.9  | 100.0 | 60.3  | 71.7  | 70.9 | 91.9 | 79.4 | 87.2 |
-| LGrad                  | 99.9  | 100.0 | 94.8   | 99.9| 96.0  | 99.9 | 82.9| 90.7   | 85.3    | 94.0 | 99.6  | 100.0 | 72.4  | 79.3  | 58.0 | 67.9 | 86.1 | 91.5 |
-| Ojha                   | 99.7  | 100.0 | 89.0   | 98.7| 83.9  | 98.4 | 90.5| 99.1   | 87.9    | 99.8 | 91.4  | 100.0 | 89.9  | 100.0 | 80.2 | 90.2 | 89.1 | 98.3 |
-| NPR(our)               | 99.8  | 100.0 | 96.3   | 99.8| 97.3  | 100.0| 87.5| 94.5   | 95.0    | 99.5 | 99.7  | 100.0 | 86.6  | 88.8  | 77.4 | 86.2 | 92.5 | 96.1 |
--->
-
-## Acknowledgments
-
-This repository borrows partially from the [CNNDetection](https://github.com/peterwang512/CNNDetection).
+Read about [CAMO (Content Aware Model Orchestration)](https://bitmindlabs.notion.site/CAMO-Content-Aware-Model-Orchestration-CAMO-Framework-for-Deepfake-Detection-43ef46a0f9de403abec7a577a45cd075), our generalized framework for creating “hard mixture of expert” models for deepfake detection. The latest and most performant iteration of our CAMO miner neuron uses finetuned expert and generalist UCF models.
diff --git a/base_miner/UCF/README.md b/base_miner/UCF/README.md
@@ -0,0 +1,14 @@
+## UCF
+
+This model has been adapted from [DeepfakeBench](https://github.com/SCLBD/DeepfakeBench).
+
+## 
+
+- **Train UCF model**:
+   - Use `train_ucf.py`, which will download necessary pretrained `xception` backbone weights from HuggingFace (if not present locally) and start a training job with logging outputs in `.logs/`.
+   - Customize the training job by editing `config/ucf.yaml`
+     - `pm2 start train_ucf.py --no-autorestart` to train a generalist detector on datasets from `DATASET_META`
+     - `pm2 start train_ucf.py --no-autorestart -- --faces_only` to train a face expert detector on preprocessed-face only datasets
+
+- **Miner Neurons**:
+   - The `UCF` class in `pretrained_ucf.py` is used by miner neurons to load and perform inference with pretrained UCF model weights.
diff --git a/base_miner/UCF/config/__init__.py b/base_miner/UCF/config/__init__.py
@@ -0,0 +1,7 @@
+import os
+import sys
+current_file_path = os.path.abspath(__file__)
+parent_dir = os.path.dirname(os.path.dirname(current_file_path))
+project_root_dir = os.path.dirname(parent_dir)
+sys.path.append(parent_dir)
+sys.path.append(project_root_dir)
diff --git a/base_miner/UCF/config/constants.py b/base_miner/UCF/config/constants.py
@@ -0,0 +1,19 @@
+import os
+
+# Path to the directory containing the constants.py file
+UCF_CONFIGS_DIR = os.path.dirname(os.path.abspath(__file__))
+
+# The base directory for UCF-related files, i.e., UCF directory
+UCF_BASE_PATH = os.path.abspath(os.path.join(UCF_CONFIGS_DIR, ".."))  # Points to bitmind-subnet/base_miner/UCF/
+# Absolute paths for the required files and directories
+CONFIG_PATH = os.path.join(UCF_BASE_PATH, "config/ucf.yaml")  # Path to the ucf.yaml file
+WEIGHTS_PATH = os.path.join(UCF_BASE_PATH, "weights/")        # Path to pretrained weights directory
+
+WEIGHTS_HF_PATH = "bitmind/ucf"
+DFB_CKPT = "ucf_best.pth"
+BM_CKPT = "ucf_bitmind_best.pth"
+BACKBONE_CKPT = "xception_best.pth"
+BM_FACE_CKPT = "ucf_bitmind_face.pth"
+BM_18K_CKPT = "ucf-bitmind-18k.pth"
+
+DLIB_FACE_PREDICTOR_PATH = os.path.abspath(os.path.join(UCF_BASE_PATH, "../../bitmind/dataset_processing/dlib_tools/shape_predictor_81_face_landmarks.dat"))
diff --git a/base_miner/UCF/config/train_config.yaml b/base_miner/UCF/config/train_config.yaml
@@ -0,0 +1,9 @@
+mode: train
+lmdb: True
+dry_run: false
+rgb_dir: './datasets/rgb'
+lmdb_dir:  './datasets/lmdb'
+dataset_json_folder: './preprocessing/dataset_json'
+SWA: False
+save_avg: True
+log_dir: ./logs/training/
diff --git a/base_miner/UCF/config/ucf.yaml b/base_miner/UCF/config/ucf.yaml
@@ -0,0 +1,73 @@
+# log dir 
+log_dir: ../debug_logs/ucf
+
+# model setting
+pretrained: ../weights/xception_best.pth   # path to a pre-trained model, if using one
+model_name: ucf   # model name
+backbone_name: xception  # backbone name
+encoder_feat_dim: 512  # feature dimension of the backbone
+
+#backbone setting
+backbone_config:
+  mode: adjust_channel
+  num_classes: 2
+  inc: 3
+  dropout: false
+
+compression: c23  # compression-level for videos
+train_batchSize: 32   # training batch size
+test_batchSize: 32   # test batch size
+workers: 8   # number of data loading workers
+frame_num: {'train': 32, 'test': 32}   # number of frames to use per video in training and testing
+resolution: 256   # resolution of output image to network
+with_mask: false   # whether to include mask information in the input
+with_landmark: false   # whether to include facial landmark information in the input
+save_ckpt: true   # whether to save checkpoint
+save_feat: true   # whether to save features
+specific_task_number: 5 # default num datasets in FF++ used by DFB, overwritten in training
+
+# mean and std for normalization
+mean: [0.5, 0.5, 0.5]
+std: [0.5, 0.5, 0.5]
+
+# optimizer config
+optimizer:
+  # choose between 'adam' and 'sgd'
+  type: adam
+  adam:
+    lr: 0.0002  # learning rate
+    beta1: 0.9  # beta1 for Adam optimizer
+    beta2: 0.999 # beta2 for Adam optimizer
+    eps: 0.00000001  # epsilon for Adam optimizer
+    weight_decay: 0.0005  # weight decay for regularization
+    amsgrad: false
+  sgd:
+    lr: 0.0002  # learning rate
+    momentum: 0.9  # momentum for SGD optimizer
+    weight_decay: 0.0005  # weight decay for regularization
+
+# training config
+lr_scheduler: null   # learning rate scheduler
+nEpochs: 5   # number of epochs to train for
+start_epoch: 0   # manual epoch number (useful for restarts)
+save_epoch: 1   # interval epochs for saving models
+rec_iter: 100   # interval iterations for recording
+logdir: ./logs   # folder to output images and logs
+manualSeed: 1024   # manual seed for random number generation
+save_ckpt: false   # whether to save checkpoint
+
+# loss function
+loss_func:
+ cls_loss: cross_entropy   # loss function to use
+ spe_loss: cross_entropy
+ con_loss: contrastive_regularization
+ rec_loss: l1loss
+losstype: null
+
+# metric
+metric_scoring: auc   # metric for evaluation (auc, acc, eer, ap)
+
+# cuda
+
+cuda: true   # whether to use CUDA acceleration
+cudnn: true   # whether to use CuDNN for convolution operations