Skip to content

JachyLikeCoding/UPicker

Repository files navigation

UPicker: a Semi-Supervised Particle Picking Transformer Method for Cryo-EM Micrographs

GitHub top languageGitHub last commit Static Badge

Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy (cryo-EM) structure reconstruction. Here, we propose UPicker, a semi-supervised transformer-based particle-picking method with a two-stage training process: unsupervised pretraining and supervised fine-tuning. During the unsupervised pretraining, an Adaptive-LoG region proposal generator is proposed to obtain pseudo-labels from unlabeled data for initial feature learning. For the supervised fine-tuning, UPicker only needs a small amount of labeled data to achieve high accuracy in particle picking. To further enhance model performance, UPicker employs a contrastive denoising training strategy to reduce redundant detections and accelerate convergence, along with a hybrid data augmentation strategy to deal with limited labeled data.

Install

1. Download
git clone https://github.com/JachyLikeCoding/UPicker.git
2. Install Create conda environment with conda:
conda env create \
-f freeze.yml

conda activate upicker
Alternatively, create the environment with mamba:
mamba env create \
--file freeze.yml \
--channel-priority flexible -y

mamba activate upicker

Overview

Figure below demonstrates the particle picking workflow of UPicker.

Alt text

Dataset Preparation

You can download datasets from EMPIAR, CryoPPP or use your own dataset and organize them as following:

UPicker_Project/
└── data/
    └── DATASET1/
        ├── micrographs/
            ├── 0001.mrc
            ├── 0002.mrc
            ├── (...).mrc
        	└── xxxx.mrc
        └── annots/
        	├── 0001.star
            ├── 0002.star
            ├── (...).star
        	└── xxxx.star

The micrograph name should be consistent with the coordinate file name. We support star, box, txt, csv file formats to read and write coordinate files, details can be seen in the cryoEM/coord_io.py file.

Then, you need to preprocess the micrographs and get region proposals by running:

python cryoEM/preprocess.py --box_width BOXSIZE --images data/YOUR_DATASET/micrographs/ --output_dir data/YOUR_DATASET
Optional Arguments:
  --images (str): The folder of micrographs to be preprocessed.
  --bin (int, default: 1): Downsample bin.
  --output_dir (str, default: "output"): Output directory.
  --box_width (int, default:200): The box width. Usually choose 1.5 * particle diameter.
  --device (str, default: "cuda:0" if available, else "cpu"): Device for training (cuda:0 or cpu).
  --mode (str, default: "train", choices=['train','test']): If mode is test, no autopick schedule.
  --noequal (store_true): If need histogram equalization.
  --ifready (store_true): If the micrographs have been preprocessed.
  --denoise (str, default: "gaussian"): The denoise filter.

Make coco-style dataset for training and evaluation.

python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase pretrain --images_path data/YOUR_DATASET/micrographs/processed/

python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase train --images_path data/YOUR_DATASET/micrographs/processed/

python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase val --images_path data/YOUR_DATASET/micrographs/processed/

(OPTIONAL) Clean region proposals for pre-training

Need install the micrograph_cleaner_em package first.

python cryoEM/box_clean.py \
    --image_path data/YOUR_DATASET/micrographs \
    --boxsize BOXSIZE

📈 Pretrain with A-LoG region proposals

(NOTE: The dataset should be end with "pretrain".)

python -u main.py \
    --config_file config/UPICKER/UPICKER_4scale_50epoch.py \
    --output_dir exps/Upicker_exps/YOUR_DATASET/pretrain_YOUR_DATASET \
    --dataset YOUR_DATASET_pretrain \
    --dataset_file YOUR_DATASET \
    --strategy log \
    --box_width BOXSIZE \
    --lr_backbone 0

🗃️ Fine-tune with pretrained model

%%bash

python -u main.py \
    --config_file config/UPICKER/UPICKER_4scale_50epoch.py \
    --output_dir exps/Upicker_exps/YOUR_DATASET/finetune_YOUR_DATASET \
    --dataset_file YOUR_DATASET \
    --pretrain exps/Upicker_exps/YOUR_DATASET/pretrain_YOUR_DATASET/checkpoint.pth \
    --box_width BOXSIZE

🖥️ Inference

python -u inference.py \
    --dataset_file YOUR_DATASET \
    --output_dir outputs/finetune_YOUR_DATASET/ \
    --resume exps/Upicker_exps/YOUR_DATASET/finetune_YOUR_DATASET/checkpoint_best_regular.pth \
    -sth 0.25

🔭 Future Plans

  • Tutorial for some datasets.

Citation

If you find this viewer useful, please consider citing our work:

@article{zhang2025upicker,
  title={UPicker: a semi-supervised particle picking transformer method for cryo-EM micrographs},
  author={Zhang, Chi and Cheng, Yiran and Feng, Kaiwen and Zhang, Fa and Han, Renmin and Feng, Jieqing},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={1},
  pages={bbae636},
  year={2025},
  publisher={Oxford University Press}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages