Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on replicability across versions #401

Open
dmj6288 opened this issue Dec 19, 2024 · 0 comments
Open

Question on replicability across versions #401

dmj6288 opened this issue Dec 19, 2024 · 0 comments

Comments

@dmj6288
Copy link

dmj6288 commented Dec 19, 2024

Hi All @Cellbender,

This is not a specific issue, but I wanted to learn your thoughts on CellBender producing two different types of log files.

I ran CellBender's remove-background last year on July 31, 2023 on a dataset and CellBender produced a log file of type:

cellbender:remove-background: Command:
cellbender remove-background --input /home/dennis00/Yi_lab/raw_h5_data/Khrameeva_h5/Human.h5 --output /home/dennis00/Yi_lab/cellbent_output_RAW/Khrameeva/H$cellbender:remove-background: 2023-07-31 16:57:14
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file /home/dennis00/Yi_lab/raw_h5_data/Khrameeva_h5/Human.h5
cellbender:remove-background: CellRanger v2 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Including 24029 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 243
cellbender:remove-background: Prior on counts for cells is 838
cellbender:remove-background: Excluding barcodes with counts below 121
cellbender:remove-background: Using 17000 probable cell barcodes, plus an additional 3000 barcodes, and 1546 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 243.0 UMI counts.
cellbender:remove-background: Running inference...
cellbender:remove-background: [epoch 001] average training loss: 2319.0409
cellbender:remove-background: [epoch 002] average training loss: 2145.5513 (16.2 seconds per epoch)
cellbender:remove-background: [epoch 003] average training loss: 2065.6477
cellbender:remove-background: [epoch 004] average training loss: 2037.0704
cellbender:remove-background: [epoch 005] average training loss: 2019.8957
cellbender:remove-background: [epoch 005] average test loss: 2010.6216

I ran it on the same dataset on Nov 4, 2024 after performing some updates, which generated a log:

cellbender:remove-background: Command:
cellbender remove-background --input /home/dennis00/Yi_lab/MarkerPaper/CellBender_RawDataMarkerPaper/ACCK/ACCK.h5 --output /home/dennis00/Yi_lab/MarkerPaper/CellBender_ProcessedDataMarkerPaper/ACCK/ACCK.h5 -$cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 5796bdb679)
cellbender:remove-background: 2024-10-30 20:50:59
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /home/dennis00/Yi_lab/MarkerPaper/CellBender_RawDataMarkerPaper/ACCK/ACCK.h5
cellbender:remove-background: CellRanger v2 format
cellbender:remove-background: WARNING: Only 21546 barcodes in the input file. Ensure this is a raw (unfiltered) file with all barcodes, including the empty droplets.
cellbender:remove-background: Features in dataset: 32893 NA
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 24029 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 875
cellbender:remove-background: Prior on counts for empty droplets is 249
cellbender:remove-background: Excluding 4982 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 19047 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 124
cellbender:remove-background: Using 17000 probable cell barcodes, plus an additional 3000 barcodes, and 1546 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 243 UMI counts.
cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmpkxtqxx9w
cellbender:remove-background: Successfully unpacked tarball to /tmp/tmpkxtqxx9w
/tmp/tmpkxtqxx9w/00c8a4e804_train.loaderstate
/tmp/tmpkxtqxx9w/00c8a4e804_model.torch
/tmp/tmpkxtqxx9w/00c8a4e804_optim.pyro
/tmp/tmpkxtqxx9w/00c8a4e804_test.loaderstate
/tmp/tmpkxtqxx9w/posterior.h5
/tmp/tmpkxtqxx9w/00c8a4e804_random.pyro
/tmp/tmpkxtqxx9w/00c8a4e804_params.pyro
/tmp/tmpkxtqxx9w/00c8a4e804_random.cuda
/tmp/tmpkxtqxx9w/00c8a4e804_args.npy
/tmp/tmpkxtqxx9w/00c8a4e804_optim.torch
cellbender:remove-background: Workflow hash does not match that of checkpoint.
cellbender:remove-background: No checkpoint loaded.
cellbender:remove-background: Running inference...
cellbender:remove-background: [epoch 001] average training loss: 2955.1805
cellbender:remove-background: [epoch 002] average training loss: 2792.7301 (6.1 seconds per epoch)
cellbender:remove-background: Will checkpoint every 70 epochs
cellbender:remove-background: [epoch 003] average training loss: 2693.7633
cellbender:remove-background: [epoch 004] average training loss: 2628.9629
cellbender:remove-background: [epoch 005] average training loss: 2576.8837

I can confirm that remove-background was executed on the same dataset with the same input parameters. But I am unable to recover the package/CellBender version used in the first execution in 2023. Could you show me how to recover that information? The runs also resulted in very different training trajectories (loss metrics were very different) and subsquent QC results in different count matrices (not super different, but substantial for some low expressing and cell types that were low in count to begin with).

I also observed that the .h5 file generated as a result of running CellBender in 2023 was much larger than the .h5 input file while that produced by the run in 2024, resulted in a file comparable to the input .h5 file. Would you be able to share your thoughts on this too?

Looking forward to your reply,
Respectfully,
Dennis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant