Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Bug in sciplex3 preprocessing? #4

Open
mughetto opened this issue Sep 14, 2021 · 2 comments
Open

Bug in sciplex3 preprocessing? #4

mughetto opened this issue Sep 14, 2021 · 2 comments

Comments

@mughetto
Copy link

mughetto commented Sep 14, 2021

Hi there,

I've been trying to reproduce the training on sciplex but I get this error with a brand new clone, datasets and conda env:

$ python -m compert.train --dataset_path datasets/sciplex3_new.h5ad       --save_dir /tmp --max_epochs 1  --doser_type sigm

Traceback (most recent call last):
  File "/home/kcvc236/miniconda3/envs/CPAvanilla/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/kcvc236/miniconda3/envs/CPAvanilla/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/kcvc236/CPAvanilla/CPA/compert/train.py", line 303, in <module>
    train_compert(parse_arguments())
  File "/home/kcvc236/CPAvanilla/CPA/compert/train.py", line 197, in train_compert
    autoencoder, datasets = prepare_compert(args)
  File "/home/kcvc236/CPAvanilla/CPA/compert/train.py", line 167, in prepare_compert
    datasets = load_dataset_splits(
  File "/home/kcvc236/CPAvanilla/CPA/compert/data.py", line 189, in load_dataset_splits
    "training": dataset.subset("train", "all"),
  File "/home/kcvc236/CPAvanilla/CPA/compert/data.py", line 129, in subset
    return SubDataset(self, idx)
  File "/home/kcvc236/CPAvanilla/CPA/compert/data.py", line 161, in __init__
    self.ctrl_name = dataset.ctrl_name[0]
IndexError: list index out of range

I have strong suspicion that there is a problem in the preprocessing of sciplex:
https://github.com/facebookresearch/CPA/blob/main/preprocessing/sciplex3.ipynb

The cell #6 is probably causing the troubles by making it impossible for adata.obs.control to be anything else than 0. Hence the error above.

Do you have a working version or fix you could share for this please?

Cheers

@bhomass
Copy link

bhomass commented Aug 25, 2023

I concur. sciplex3_new.h5ad which is created by processing sciplex_raw_chunk_{i}.h5ad does not have the value "Vehicle_1.0" in adata.obs.drug_dose_name.values at all, and therefore, there are no cells with adata.obs['control'] = 1. Without the control samples, the training crashes.

What is the solution? use some other drug_dose_name value as the control?

@bhomass
Copy link

bhomass commented Aug 25, 2023

I think should be control_0.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants