Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New caps dataset #562

Open
wants to merge 41 commits into
base: refactoring
Choose a base branch
from
Open

Conversation

HuguesRoy
Copy link
Contributor

Creation of a new folder caps_dataset_refactoring containing two .py files:

  • caps_dataset.py: low level class caps-dataset (work only for Image datasets)
  • concat_dataset.py: classes for concatenation, stacking and creation of dataset from control file

For the moment, these files are not connected to the rest of clinicadl.

dependabot bot and others added 3 commits April 22, 2024 10:03
Bumps [sqlparse](https://github.com/andialbrecht/sqlparse) from 0.4.4 to 0.5.0.
- [Changelog](https://github.com/andialbrecht/sqlparse/blob/master/CHANGELOG)
- [Commits](andialbrecht/sqlparse@0.4.4...0.5.0)

---
updated-dependencies:
- dependency-name: sqlparse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@HuguesRoy HuguesRoy added the refactoring ClinicaDL refactoring 2024 label Apr 22, 2024
@pep8speaks
Copy link

pep8speaks commented Apr 22, 2024

Hello @HuguesRoy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-05-15 09:52:28 UTC

Copy link
Collaborator

@thibaultdvx thibaultdvx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @HuguesRoy! Could you please try to write the docstrings with the template I gave you?

Copy link
Member

@NicolasGensollen NicolasGensollen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HuguesRoy !
I had a quick look and made some suggestions on caps_dataset.py, mainly on type hints.
Let me know if some things are not clear.

def elem_index(self):
pass

def label_fn(self, target: Union[str, float, int]) -> Union[float, int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the method can return None:

Suggested change
def label_fn(self, target: Union[str, float, int]) -> Union[float, int]:
def label_fn(self, target: Union[str, float, int]) -> Optional[Union[float, int]]:

label: value of the label usable in criterion.
"""
domain_code = {"t1": 0, "flair": 1}
return domain_code[str(target)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can really target be a float, an integer, or a string ? From the dict above, it seems it can only take two values: "t1" or "flair".

else:
return self.label_code[str(target)]

def domain_fn(self, target: Union[str, float, int]) -> Union[float, int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be missing things, but it seems it can only return integers, right ?

Suggested change
def domain_fn(self, target: Union[str, float, int]) -> Union[float, int]:
def domain_fn(self, target: Union[str, float, int]) -> int:

Args:
idx: row number of the meta-data contained in self.df
Returns:
dictionary with following items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all subclasses return dictionaries with the same structure, it might be valuable to build a custom type for this instead of relying on dictionaries.

@abc.abstractmethod
def __getitem__(self, idx: int) -> Sample:
    ....

@dataclass
class Sample:
    image: torch.Tensor
    label: int | float
    participant_id: str
    ...

WDYT?



##################################
# Transformations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth putting these in a separate module as this file is already pretty large. WDYT ?



################################
# TSV files loaders
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same kind of comment here

################################
# TSV files loaders
################################
def load_data_test(test_path: Path, diagnoses_list, baseline=True, multi_cohort=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_data_test(test_path: Path, diagnoses_list, baseline=True, multi_cohort=False):
def load_data_test(test_path: Path, diagnoses_list: list[str], baseline: bool = True, multi_cohort: bool = False) -> pd.DataFrame:

raise ClinicaDLArgumentError(
"If multi_cohort is given, the TSV_DIRECTORY argument should be a path to a TSV file."
)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to have an else statement here since you raise in the if.
This would spare one level of indentation for the following code block.

return test_df


def load_data_test_single(test_path: Path, diagnoses_list, baseline=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_data_test_single(test_path: Path, diagnoses_list, baseline=True):
def load_data_test_single(test_path: Path, diagnoses_list: list[str], baseline: bool = True) -> pd.DataFrame:

@thibaultdvx thibaultdvx marked this pull request as draft April 25, 2024 12:16
dependabot bot and others added 4 commits May 4, 2024 09:50
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.1 to 4.66.3.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](tqdm/tqdm@v4.66.1...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](pallets/werkzeug@3.0.1...3.0.3)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
broken path solved
@camillebrianceau camillebrianceau marked this pull request as ready for review May 15, 2024 11:38
dependabot bot and others added 6 commits May 17, 2024 09:14
Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.10.1 to 2.12.1.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.10.1...v2.12.1)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [gunicorn](https://github.com/benoitc/gunicorn) from 21.2.0 to 22.0.0.
- [Release notes](https://github.com/benoitc/gunicorn/releases)
- [Commits](benoitc/gunicorn@21.2.0...22.0.0)

---
updated-dependencies:
- dependency-name: gunicorn
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* try a simple workflow first

* try running on new ubuntu VM

* fixes

* bump poetry version to 1.8.3

* try removing caching..

* add workflow for testing tsv tools
* try skipping test_tsvtools when PR is in draft mode

* trigger CI

* add a cpu tag to avoid running cpu tests on gpu machines

* run also on refactoring branch
* add test workflow on GPU for train

* fix conda path

* fix conflicting workdir

* only run on non-draft PRs

* run also on refactoring branch
NicolasGensollen and others added 19 commits May 23, 2024 15:51
* add workflow for testing interpretation task

* add workflow for testing random search task

* add workflow for testing resume task

* add workflow for testing transfer learning task

* trigger CI

* trigger CI
* add cleaning step to test_tsvtools pipeline

* add test_generate pipeline

* add test_predict pipeline

* add test_prepare_data pipeline

* add test_quality_checks pipeline

* add refactoring target branch, cpu tag, and draft PR filter

* trigger CI
…nt in README (aramis-lab#600)

* update python version used for creating conda env in README

* investigate

* fix
* add no-gpu and adapt-base-dir flag
* Update quality_check.py
* add FileNotFound error in tree
* run unit tests on refactoring

* run linter on refactoring
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4.
- [Commits](certifi/python-certifi@2024.02.02...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [zipp](https://github.com/jaraco/zipp) from 3.17.0 to 3.19.1.
- [Release notes](https://github.com/jaraco/zipp/releases)
- [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst)
- [Commits](jaraco/zipp@v3.17.0...v3.19.1)

---
updated-dependencies:
- dependency-name: zipp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.3 to 70.0.0.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@v69.0.3...v70.0.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 1.40.1 to 2.8.0.
- [Release notes](https://github.com/getsentry/sentry-python/releases)
- [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md)
- [Commits](getsentry/sentry-python@1.40.1...2.8.0)

---
updated-dependencies:
- dependency-name: sentry-sdk
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring ClinicaDL refactoring 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants