Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.2.0 #155

Merged
merged 123 commits into from
Feb 20, 2025
Merged

Release 2.2.0 #155

merged 123 commits into from
Feb 20, 2025

Conversation

dylanuys
Copy link
Contributor

@dylanuys dylanuys commented Feb 20, 2025

Release 2.2.0 introduces a new component to SN34's original "AI vs. Not" binary classification problem, allowing miners to now earn additional rewards for distinguishing between fully- and semi-synthetic data. Communication protocols and scoring are backward compatible, meaning you can still predict a float and receive most of the rewards. The amount of incentive you can receive predicting only binary outputs will diminish over time as we give more weight to multiclass performance.

Update Steps

Validators

  • No action needed if you're on autoupdate
  • If you're not, please manualy pull the latest after tomorrow's release, run ./setup_env, and restart your validator. We will post here again when the release is available.

Miners

  • Upgrading is not required, but it is incentivized
  • Please see details below on how to format your responses

Mutliclass Challenges Overview

Miner Responses

  • Instead of returning a single float in [0., 1.], miners will now respond with a probability vector indicating [p_real, p_synthetic, p_semisynthetic].
  • The probabilities in this vector must sum to 1, and each probability in the vector must be in [0., 1.] (as with typical softmax outputs).
  • Prediction correctness will be determined by the argmax of this vector.

Backwards Compatibility

  • Float response p from a miner who hasn't upgraded their model capabilities will be handled by validators as [1-p, p, 0.]

Rewards

  • Given that semi-synthetic challenges (inpainting) currently make up 5% of the entire challenge distribution (10% of all image challenges, 20% of synthetic image challenges), multiclass performance will accordingly start off with a low weight.
  • As we have done with many of our releases, will progressively increase the weight given to multiclass performance as we increase the presence of semi-synthetics in our challenge distribution (semi-synthetic video data coming soon!)
  • Initially, the distribution will be 0.9*Binary_MCC + 0.1*Multiclass_MCC
    • Binary MCC treats both fully-synthetic and semi-synthetic predictions as "synthetic", emulating our original "AI vs Not" reward fn

dylanuys and others added 30 commits November 19, 2024 17:17
* adding rich arg, adding coldkeys and hotokeys

* moving rich to payload from headers

* bump version

---------

Co-authored-by: benliang99 <[email protected]>
Adding two finetuned image models to expand validator challenges
Updated transformers version to fix tokenizer initialization error
* Made gpu id specification consistent across synthetic image generation models

* Changed gpu_id to device

* Docstring grammar

* add neuron.device to SyntheticImageGenerator init

* Fixed variable names

* adding device to start_validator.sh

* deprecating old/biased random prompt generation

* properly clear gpu of moderation pipeline

* simplifying usage of self.device

* fixing moderation pipeline device

* explicitly defining model/tokenizer for moderation pipeline to avoid accelerate auto device management

* deprecating random prompt generation

---------

Co-authored-by: benliang99 <[email protected]>
bump version
* simple video challenge implementation wip

* dummy multimodal miner

* constants reorg

* updating verify_models script with t2v

* fixing MODEL_PIPELINE init

* cleanup

* __init__.py

* hasattr fix

* num_frames must be divisible by 8

* fixing dict iteration

* dummy response for videos

* fixing small bugs

* fixing video logging and compression

* apply image transforms uniformly to frames of video

* transform list of tensor to pil for synapse prep

* cleaning up vali forward

* miner function signatures to use Synapse base class instead of ImageSynapse

* vali requirements imageio and moviepy

* attaching separate video and image forward functions

* separating blacklist and priority fns for image/video synapses

* pred -> prediction

* initial synth video challenge flow

* initial video cache implementation

* video cache cleanup

* video zip downloads

* wip fairly large refactor of data generation, functionality and form

* generalized hf zip download fn

* had claude improve video_cache formatting

* vali forward cleanup

* cleanup + turning back on randomness for real/fake

* fix relative import

* wip moving video datasets to vali config

* Adding optimization flags to vali config

* check if captioning model already loaded

* async SyntheticDataGenerator wip

* async zip download

* ImageCache wip

* proper gpu clearing for moderation pipeline

* sdg cleanup

* new cache system WIP

* image/video cache updates

* cleaning up unused metadata arg, improving logging

* fixed frame sampling, parquet image extraction, image sampling

* synth data cache wip

* Moving sgd to its own pm2 process

* synthetic data gen memory management update

* mochi-1-preview

* util cleanup, new requirements

* ensure SyntheticDataGenerator process waits for ImageCache to populate

* adding new t2i models from main

* Fixing t2v model output saving

* miner cleanup

* Moving tall model weights to bitmind hf org

* removing test video pkl

* fixing circular import

* updating usage of hf_hub_download according to some breaking huggingface_hub changes

* adding ffmpeg to vali reqs

* adding back in video models in async generation after testing

* renaming UCF directory to DFB, since it now contains TALL

* remaining renames for UCF -> DFB

* pyffmpegg

* video compatible data augmentations

* Default values for level, data_aug_params for failure case

* switching image challenges back on

* using sample variable to store data for all challenge types

* disabling sequential_cpu_offload for CogVideoX5b

* logging metadata fields to w&b

* log challenge metadata

* bump version

* adding context manager for generation w different dtypes

* variable name fix in ComposeWithTransforms

* fixing broken DFB stuff in tall_detector.py

* removing unnecessary logging

* fixing outdated variable names

* cache refactor; moving shared functionality to BaseCache

* finally automating w&b project setting

* improving logs

* improving validator forward structure

* detector ABC cleanup + function headers

* adding try except for miner performance history loading

* fixing import

* cleaning up vali logging

* pep8 formatting video_utils

* cleaning up start_validator.sh, starting validator process before data gen

* shortening vali challenge timer

* moving data generation management to its own script & added w&B logging

* run_data_generator.py

* fixing full_path variable name

* changing w&b name for data generator

* yaml > json gang

* simplifying ImageCache.sample to always return one sample

* adding option to skip a challenge if no data are available in cache

* adding config vars for image/video detector

* cleaning up miner class, moving blacklist/priority to base

* updating call to image_cache.sample()

* fixing mochi gen to 84 frames

* fixing video data padding for miners

* updating setup script to create new .env file

* fixing weight loading after detector refactor

* model/detector separation for TALL & modifying base DFB code to allow device configuration

* standardizing video detector input to a frames tensor

* separation of concerns; moving all video preprocessing to detector class

* pep8 cleanup

* reformatting if statements

* temporarily removing initial dataset class

* standardizing config loading across video and image models

* finished VideoDataloader and supporting components

* moved save config file out of trian script

* backwards compatibility for ucf training

* moving data augmentation from RealFakeDataset to Dataset subclasses for video aug support

* cleaning up data augmentation and target_image_size

* import cleanup

* gitignore update

* fixing typos picked up by flake8

* fixing function name ty flake8

* fixing test fixtures

* disabling pytests for now, some are broken after refactor and its 4am
dylanuys and others added 26 commits January 26, 2025 22:38
Two new t2i generators: DeepFloyd-IF I-XL-v1.0 + II-L-v1.0 multistage pipeline and Janus Pro 7B
* Implementation of frame stitching for 2 videos

* ComposeWithParams fix

* vflip + hflip fix

* wandb video logging fix courtesy of eric

* proper arg passing for prompt moderation

* version bump

* i2i crop guardrails
Removing problematic resolution for CogVideoX5b
Mutli-video threshold 0.2
* multiclass protocols

* multiclass rewards

* facilitating smooth transition from old protocol to multiclass

* DTAO: Bittensor SDK 9.0.0 (#152)

* Update requirements.txt

* version bump

* moving prediction backwards compatibility to synapase.deserialize

* mcc-based reward with rational transform

* cast predictions to np array upon loading miner history

* version bump
* improved vali proxy with video endpoint

* renaming endpoints

* Fixing vali proxy initialization

* make vali proxy async again

* handling testnet situation of low miner activity

* BytesIO import

* upgrading transformers

* switching to multipart form data
* udpate vali proxy to return floats instead of vectors

* removing rational transform for now
Copy link
Contributor

@kenobijon kenobijon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved upon confirmation working with all our applications

@dylanuys dylanuys merged commit 091d4a1 into main Feb 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants