Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy (cryo-EM)
structure reconstruction. Here, we propose UPicker, a semi-supervised transformer-based particle-picking method with a two-stage training process: unsupervised pretraining and supervised fine-tuning. During the
unsupervised pretraining, an Adaptive-LoG region proposal generator is proposed to obtain pseudo-labels from unlabeled
data for initial feature learning. For the supervised fine-tuning, UPicker only needs a small amount of labeled data
to achieve high accuracy in particle picking. To further enhance model performance, UPicker employs a contrastive
denoising training strategy to reduce redundant detections and accelerate convergence, along with a hybrid data
augmentation strategy to deal with limited labeled data.
1. Download |
git clone https://github.com/JachyLikeCoding/UPicker.git |
|
---|---|---|
2. Install |
Create conda environment with conda:
conda env create \
-f freeze.yml
conda activate upicker |
Alternatively, create the environment with mamba:
mamba env create \
--file freeze.yml \
--channel-priority flexible -y
mamba activate upicker |
Figure below demonstrates the particle picking workflow of UPicker.
You can download datasets from EMPIAR, CryoPPP or use your own dataset and organize them as following:
UPicker_Project/
└── data/
└── DATASET1/
├── micrographs/
├── 0001.mrc
├── 0002.mrc
├── (...).mrc
└── xxxx.mrc
└── annots/
├── 0001.star
├── 0002.star
├── (...).star
└── xxxx.star
The micrograph name should be consistent with the coordinate file name. We support star, box, txt, csv file formats to read and write coordinate files, details can be seen in the cryoEM/coord_io.py
file.
Then, you need to preprocess the micrographs and get region proposals by running:
python cryoEM/preprocess.py --box_width BOXSIZE --images data/YOUR_DATASET/micrographs/ --output_dir data/YOUR_DATASET
Optional Arguments:
--images (str): The folder of micrographs to be preprocessed.
--bin (int, default: 1): Downsample bin.
--output_dir (str, default: "output"): Output directory.
--box_width (int, default:200): The box width. Usually choose 1.5 * particle diameter.
--device (str, default: "cuda:0" if available, else "cpu"): Device for training (cuda:0 or cpu).
--mode (str, default: "train", choices=['train','test']): If mode is test, no autopick schedule.
--noequal (store_true): If need histogram equalization.
--ifready (store_true): If the micrographs have been preprocessed.
--denoise (str, default: "gaussian"): The denoise filter.
python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase pretrain --images_path data/YOUR_DATASET/micrographs/processed/
python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase train --images_path data/YOUR_DATASET/micrographs/processed/
python cryoEM/make_coco_dataset.py --coco_path data/YOUR_DATASET --box_width BOXSIZE --phase val --images_path data/YOUR_DATASET/micrographs/processed/
Need install the micrograph_cleaner_em
package first.
python cryoEM/box_clean.py \
--image_path data/YOUR_DATASET/micrographs \
--boxsize BOXSIZE
(NOTE: The dataset should be end with "pretrain".)
python -u main.py \
--config_file config/UPICKER/UPICKER_4scale_50epoch.py \
--output_dir exps/Upicker_exps/YOUR_DATASET/pretrain_YOUR_DATASET \
--dataset YOUR_DATASET_pretrain \
--dataset_file YOUR_DATASET \
--strategy log \
--box_width BOXSIZE \
--lr_backbone 0
%%bash
python -u main.py \
--config_file config/UPICKER/UPICKER_4scale_50epoch.py \
--output_dir exps/Upicker_exps/YOUR_DATASET/finetune_YOUR_DATASET \
--dataset_file YOUR_DATASET \
--pretrain exps/Upicker_exps/YOUR_DATASET/pretrain_YOUR_DATASET/checkpoint.pth \
--box_width BOXSIZE
python -u inference.py \
--dataset_file YOUR_DATASET \
--output_dir outputs/finetune_YOUR_DATASET/ \
--resume exps/Upicker_exps/YOUR_DATASET/finetune_YOUR_DATASET/checkpoint_best_regular.pth \
-sth 0.25
- Tutorial for some datasets.
If you find this viewer useful, please consider citing our work:
@article{zhang2025upicker,
title={UPicker: a semi-supervised particle picking transformer method for cryo-EM micrographs},
author={Zhang, Chi and Cheng, Yiran and Feng, Kaiwen and Zhang, Fa and Han, Renmin and Feng, Jieqing},
journal={Briefings in Bioinformatics},
volume={26},
number={1},
pages={bbae636},
year={2025},
publisher={Oxford University Press}
}