New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

New caps dataset #562

Open

HuguesRoy wants to merge 41 commits into aramis-lab:refactoring from HuguesRoy:caps_dataset

Contributor

HuguesRoy commented Apr 22, 2024

Creation of a new folder caps_dataset_refactoring containing two .py files:

caps_dataset.py: low level class caps-dataset (work only for Image datasets)
concat_dataset.py: classes for concatenation, stacking and creation of dataset from control file

For the moment, these files are not connected to the rest of clinicadl.

dependabot bot and others added 3 commits

April 22, 2024 10:03


          Bump sqlparse from 0.4.4 to 0.5.0 (aramis-lab#558)

a9a23b9

Bumps [sqlparse](https://github.com/andialbrecht/sqlparse) from 0.4.4 to 0.5.0.
- [Changelog](https://github.com/andialbrecht/sqlparse/blob/master/CHANGELOG)
- [Commits](andialbrecht/sqlparse@0.4.4...0.5.0)

---
updated-dependencies:
- dependency-name: sqlparse
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          concatenation and stacking of capsdatasets +creation from control file

5e52ca8


          low level caps dataset class (CapsDatasetImage)

d56f705

HuguesRoy added the refactoring label

HuguesRoy requested review from camillebrianceau and thibaultdvx

April 22, 2024 08:28

pep8speaks commented Apr 22, 2024 •

edited

Loading

Hello @HuguesRoy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-05-15 09:52:28 UTC

HuguesRoy added 2 commits

April 22, 2024 10:56


          reformat

1b111d1


          reformat

thibaultdvx reviewed

View reviewed changes

Collaborator

thibaultdvx left a comment

Thanks a lot @HuguesRoy! Could you please try to write the docstrings with the template I gave you?

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py Outdated Show resolved Hide resolved

NicolasGensollen reviewed

View reviewed changes

Member

NicolasGensollen left a comment

Thanks @HuguesRoy !
I had a quick look and made some suggestions on caps_dataset.py, mainly on type hints.
Let me know if some things are not clear.

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py Outdated

+                  def elem_index(self):
+                      pass
+                  def label_fn(self, target: Union[str, float, int]) -> Union[float, int]:

Member

NicolasGensollen Apr 22, 2024

It seems like the method can return None:

Suggested change

      
                def label_fn(self, target: Union[str, float, int]) -> Union[float, int]:
          
                def label_fn(self, target: Union[str, float, int]) -> Optional[Union[float, int]]:

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

+                          label: value of the label usable in criterion.
+                      """
+                      domain_code = {"t1": 0, "flair": 1}
+                      return domain_code[str(target)]

Member

NicolasGensollen Apr 22, 2024

Can really target be a float, an integer, or a string ? From the dict above, it seems it can only take two values: "t1" or "flair".

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

+                      else:
+                          return self.label_code[str(target)]
+                  def domain_fn(self, target: Union[str, float, int]) -> Union[float, int]:

Member

NicolasGensollen Apr 22, 2024

Might be missing things, but it seems it can only return integers, right ?

Suggested change

      
                def domain_fn(self, target: Union[str, float, int]) -> Union[float, int]:
          
                def domain_fn(self, target: Union[str, float, int]) -> int:

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

+                      Args:
+                          idx: row number of the meta-data contained in self.df
+                      Returns:
+                          dictionary with following items:

Member

NicolasGensollen Apr 22, 2024

If all subclasses return dictionaries with the same structure, it might be valuable to build a custom type for this instead of relying on dictionaries.

@abc.abstractmethod
def __getitem__(self, idx: int) -> Sample:
    ....

@dataclass
class Sample:
    image: torch.Tensor
    label: int | float
    participant_id: str
    ...

WDYT?

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py Outdated Show resolved Hide resolved

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py



		##################################
		# Transformations

Member

NicolasGensollen Apr 22, 2024

Could be worth putting these in a separate module as this file is already pretty large. WDYT ?

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py



		################################
		# TSV files loaders

Member

NicolasGensollen Apr 22, 2024

Same kind of comment here

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

+              ################################
+              # TSV files loaders
+              ################################
+              def load_data_test(test_path: Path, diagnoses_list, baseline=True, multi_cohort=False):

Member

NicolasGensollen Apr 22, 2024

Suggested change

      
            def load_data_test(test_path: Path, diagnoses_list, baseline=True, multi_cohort=False):
          
            def load_data_test(test_path: Path, diagnoses_list: list[str], baseline: bool = True, multi_cohort: bool = False) -> pd.DataFrame:

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

+                          raise ClinicaDLArgumentError(
+                              "If multi_cohort is given, the TSV_DIRECTORY argument should be a path to a TSV file."
+                          )
+                      else:

Member

NicolasGensollen Apr 22, 2024

You don't need to have an else statement here since you raise in the if.
This would spare one level of indentation for the following code block.

clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dataset.py

		return test_df


		def load_data_test_single(test_path: Path, diagnoses_list, baseline=True):

Member

NicolasGensollen Apr 22, 2024

Suggested change

      
            def load_data_test_single(test_path: Path, diagnoses_list, baseline=True):
          
            def load_data_test_single(test_path: Path, diagnoses_list: list[str], baseline: bool = True) -> pd.DataFrame:

HuguesRoy and others added 7 commits

April 23, 2024 17:15


          Typing

d7c1fba


          typing : fench to english...

84811a4


          compatibility for patch,roi and slices

e4875cd


          Dataloader for paired and unpaired datasets

adfc233


          Update clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dat…

d28a50a

…aset.py

Co-authored-by: Gensollen <[email protected]>


          Update clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dat…

3b9e4dc

…aset.py

Co-authored-by: thibaultdvx <[email protected]>


          Update clinicadl/utils/caps_dataset/caps_dataset_refactoring/caps_dat…

c5f27ee

…aset.py

Co-authored-by: Gensollen <[email protected]>

thibaultdvx marked this pull request as draft

April 25, 2024 12:16

dependabot bot and others added 4 commits

May 4, 2024 09:50


          Bump tqdm from 4.66.1 to 4.66.3 (aramis-lab#569)

36eb46f

Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.1 to 4.66.3.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](tqdm/tqdm@v4.66.1...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump werkzeug from 3.0.1 to 3.0.3 (aramis-lab#570)

fa7f0f1

Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](pallets/werkzeug@3.0.1...3.0.3)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump jinja2 from 3.1.3 to 3.1.4 (aramis-lab#571)

a05fcd5

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Update concat_dataset.py

76fc265

broken path solved

camillebrianceau marked this pull request as ready for review

May 15, 2024 11:38

dependabot bot and others added 6 commits

May 17, 2024 09:14


          Bump mlflow from 2.10.1 to 2.12.1 (aramis-lab#575)

b2fc3e6

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.10.1 to 2.12.1.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.10.1...v2.12.1)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump gunicorn from 21.2.0 to 22.0.0 (aramis-lab#576)

495d5b9

Bumps [gunicorn](https://github.com/benoitc/gunicorn) from 21.2.0 to 22.0.0.
- [Release notes](https://github.com/benoitc/gunicorn/releases)
- [Commits](benoitc/gunicorn@21.2.0...22.0.0)

---
updated-dependencies:
- dependency-name: gunicorn
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump requests from 2.31.0 to 2.32.0 (aramis-lab#578)

bdd102a

updated-dependencies:
- dependency-name: requests
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          [CI] Run tests through GitHub Actions (aramis-lab#573)

beccd4c

* try a simple workflow first

* try running on new ubuntu VM

* fixes

* bump poetry version to 1.8.3

* try removing caching..

* add workflow for testing tsv tools


          [CI] Skip tests when PR is in draft mode (aramis-lab#592)

2861e9d

* try skipping test_tsvtools when PR is in draft mode

* trigger CI

* add a cpu tag to avoid running cpu tests on gpu machines

* run also on refactoring branch


          [CI] Test train workflow on GPU machine (aramis-lab#590)

f5de251

* add test workflow on GPU for train

* fix conda path

* fix conflicting workdir

* only run on non-draft PRs

* run also on refactoring branch

NicolasGensollen and others added 19 commits

May 23, 2024 15:51


          [CI] Port remaining GPU tests to GitHub Actions (aramis-lab#593)

69b3538

* add workflow for testing interpretation task

* add workflow for testing random search task

* add workflow for testing resume task

* add workflow for testing transfer learning task

* trigger CI

* trigger CI


          [CI] Remove GPU pipeline from Jenkinsfile (aramis-lab#594)

c9d9252


          [CI] Port remaining non GPU tests to GitHub Actions (aramis-lab#581)

753f04e

* add cleaning step to test_tsvtools pipeline

* add test_generate pipeline

* add test_predict pipeline

* add test_prepare_data pipeline

* add test_quality_checks pipeline

* add refactoring target branch, cpu tag, and draft PR filter

* trigger CI


          [CI] Remove jenkins related things (aramis-lab#595)

c424d77


          Add flags to run CI tests locally (aramis-lab#596)

52d7561


          [CI] Remove duplicated verbose flag in test pipelines (aramis-lab#598)

39d22fd


          [DOC] Update the Python version used for creating the conda environme…

571662c

…nt in README (aramis-lab#600)

* update python version used for creating conda env in README

* investigate

* fix


          Flag for local tests (aramis-lab#608)

d54d59c

* add no-gpu and adapt-base-dir flag


          Update quality_check.py (aramis-lab#609)

f20e7fb

* Update quality_check.py


          Fix issue in compare_folders (aramis-lab#610)

f6f382a

* add FileNotFound error in tree


          [INFRA] Update the Makefile check.lock target (aramis-lab#603)

52f9492


          [CI] Run unit tests and linter on refactoring branch (aramis-lab#618)

996cdd5

* run unit tests on refactoring

* run linter on refactoring


          Trigger tests when undrafted (aramis-lab#623)

d0d5cd2

* add ready_for_review event


          Bump urllib3 from 1.26.18 to 1.26.19 (aramis-lab#625)

dca3802

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump certifi from 2024.2.2 to 2024.7.4 (aramis-lab#634)

b8d402b

Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4.
- [Commits](certifi/python-certifi@2024.02.02...2024.07.04)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump zipp from 3.17.0 to 3.19.1 (aramis-lab#635)

6dc2956

Bumps [zipp](https://github.com/jaraco/zipp) from 3.17.0 to 3.19.1.
- [Release notes](https://github.com/jaraco/zipp/releases)
- [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst)
- [Commits](jaraco/zipp@v3.17.0...v3.19.1)

---
updated-dependencies:
- dependency-name: zipp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump setuptools from 69.0.3 to 70.0.0 (aramis-lab#636)

b30aac8

Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.3 to 70.0.0.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@v69.0.3...v70.0.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Bump sentry-sdk from 1.40.1 to 2.8.0 (aramis-lab#638)

f6d3f25

Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 1.40.1 to 2.8.0.
- [Release notes](https://github.com/getsentry/sentry-python/releases)
- [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md)
- [Commits](getsentry/sentry-python@1.40.1...2.8.0)

---
updated-dependencies:
- dependency-name: sentry-sdk
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>


          Merge branch 'hr_caps_dataset' into caps_dataset

94d81e3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels