Model updates, and some conversion logic #123

sherwoodf · 2024-07-12T12:49:55Z

Model updates from discussions with team. Mostly minor name changes, removing/adding fields that were dulpicates/i thought were duplicates but actually are not. Also created separate signal_channel_information object to hold that information within the image acquisition.
Created conversion logic for Specimen Preparation Protocol, Specimen Growth Protocol, Image Acquisition. Removed some unused code from the test/utils.py file in the ingest shared models package (which is why the diff is so big for that file)
Consolidated all the persist artefact logic used in multiple places in the ingest package into a util function. Did not use this for the study as that has more complex logic.

…ields will actually be generated endpoints

…sition conversion logic

sherwoodf · 2024-07-12T12:59:36Z

bia-ingest-shared-models/bia_ingest_sm/conversion/study.py

@@ -6,7 +6,8 @@
    get_generic_section_as_dict,
    mattributes_to_dict,
    dict_to_uuid,
-    find_sections_recursive
+    find_sections_recursive,
+    persist


Didn't end up using this - so should remove the unused import (will do it in this PR if other changes are needed)

sherwoodf · 2024-07-12T13:00:47Z

bia-ingest-shared-models/test/utils.py



-def get_template_channel() -> semantic_models.Channel:


template methods were copied over from the shared_models code package, but aren't actually needed here, so i decided to remove them.

sherwoodf · 2024-07-18T09:11:28Z

bia-ingest-shared-models/test/data/file_list_annotations_1.json

@@ -0,0 +1,2 @@
+[


I had to create this file to get the experimental_imaging_dataset test working. It was checking the Annotation Dataset, and couldn't find this file & was therefore complaining. I don't understand why it was checking this, but i wanted to move on with this PR without figuring out the EID code right now.

This was needed because the function biostudies.find_file_lists_in_submission goes through all file lists in the submissions when trying to create file references. Since a file list was added to the "Segmentation masks" "Annotations" section in data/S-BIADTEST.json, find_file_lists_in_submission was looking for the file list ...

bia-shared-datamodels/src/bia_models/semantic_models.py

…ared models, and fixed file_name -> file_path as per model change for file references

bia-ingest-shared-models/bia_ingest_sm/conversion/annotation_method.py

kbab · 2024-07-20T14:50:34Z

bia-ingest-shared-models/bia_ingest_sm/conversion/experimental_imaging_dataset.py

I think we need to revisit the logic of this file in light of the fact that we are not storing the list of file_references anymore. We may have to trigger the generation of file_references after obtaining the uuid for the experimental dataset, so we can pass this to the function that creates file_references, allowing them to point to their parent expermental imaging dataset.

kbab · 2024-07-20T14:53:09Z

bia-ingest-shared-models/bia_ingest_sm/conversion/file_reference.py

With new approach (file_reference points to parent EID) we may have to re-write this function. (see comment on assignment of submission_dataset)

kbab · 2024-07-20T14:54:06Z

bia-ingest-shared-models/bia_ingest_sm/conversion/file_reference.py

+            if persist_artefacts:
+                file_dict["uuid"] = fileref_uuid
+                file_dict["uri"] = file_uri(submission.accno, f)
+                file_dict["submission_dataset"] = fileref_uuid


This was just a place holder - we need to pass the actual submission_dataset uuid (especially as this will now be the only link to its parent)

I've not touched the file_reference code. That wasn't the intent of this PR.

Ok - I have created a clickup ticket to fix this which is assigned to me.

bia-ingest-shared-models/bia_ingest_sm/conversion/image_acquisition.py

bia-ingest-shared-models/bia_ingest_sm/conversion/specimen_growth_protocol.py

bia-ingest-shared-models/bia_ingest_sm/conversion/specimen_imaging_protocol.py

kbab · 2024-07-20T16:04:47Z

bia-ingest-shared-models/bia_ingest_sm/conversion/study.py

I don't think re module is used

It's used in:

def get_licence(study_attributes: Dict[str, Any]) -> semantic_models.LicenceType: """ Return enum version of licence of study """ licence = re.sub(r"\s", "_", study_attributes.get("License", "CC0")) return semantic_models.LicenceType(licence)

But i guess we've changed the enums now, so we don't need that?

I missed this - no I think we still need it!

bia-shared-datamodels/src/bia_shared_datamodels/bia_data_model.py

bia-shared-datamodels/src/bia_shared_datamodels/semantic_models.py

kbab · 2024-07-20T17:19:39Z

bia-shared-datamodels/src/bia_shared_datamodels/semantic_models.py

-    #       file extension
+
+    file_path: str = Field(description="""The path (including the name) of the file.""")
+    # TODO: Clarify if this should be biostudies 'type' or derived from file extension


Does this field relate to the biostudies 'type' - which is not really related to the extension of the file e.g. fire_object, directory etc as opposed to the file type derived from the file extension?

It's not even that: it's sort of 3 different types that we care about mushed together. But i've cut a ticket to deal with that later.

kbab · 2024-07-21T21:45:30Z

bia-ingest-shared-models/test/utils.py



-def get_template_channel() -> semantic_models.Channel:
-    return semantic_models.Channel.model_validate(
+def get_test_specimen_growth_protocol() -> List[bia_data_model.ImageAcquisition]:


should this function be get_test_image_acquisition ? Or should it be deleted as there is a get_test_image_acquisition on line 221

kbab

There are some type hints that need correcting and some places where deletion of code is required, otherwise LGTM.

kbab

LGTM

* model updates to standardise names further and to account for which fields will actually be generated endpoints * model updates and added specimen growth, preparation, and image acquisition conversion logic * added logic to generate annotation method objects * tidied up imports * created empty annotation file list to make tests pass * moved file reference conversion to it's own file, fixed imports of shared models, and fixed file_name -> file_path as per model change for file references * updated models and ingest code

sherwoodf added 2 commits July 11, 2024 16:25

model updates to standardise names further and to account for which f…

eb05833

…ields will actually be generated endpoints

model updates and added specimen growth, preparation, and image acqui…

687b69d

…sition conversion logic

sherwoodf temporarily deployed to test July 12, 2024 12:50 — with GitHub Actions Inactive

sherwoodf commented Jul 12, 2024

View reviewed changes

added logic to generate annotation method objects

76ed1ea

sherwoodf temporarily deployed to test July 12, 2024 14:48 — with GitHub Actions Inactive

tidied up imports

5bc4daa

sherwoodf temporarily deployed to test July 12, 2024 14:50 — with GitHub Actions Inactive

created empty annotation file list to make tests pass

2f5c33c

sherwoodf temporarily deployed to test July 16, 2024 15:45 — with GitHub Actions Inactive

sherwoodf commented Jul 18, 2024

View reviewed changes

kbab reviewed Jul 18, 2024

View reviewed changes

bia-shared-datamodels/src/bia_models/semantic_models.py Outdated Show resolved Hide resolved

sherwoodf added 2 commits July 19, 2024 09:26

Merge remote-tracking branch 'origin/main' into model_updates

204c804

moved file reference conversion to it's own file, fixed imports of sh…

0dc3088

…ared models, and fixed file_name -> file_path as per model change for file references

sherwoodf temporarily deployed to test July 19, 2024 09:43 — with GitHub Actions Inactive