Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model updates, and some conversion logic #123

Merged
merged 8 commits into from
Jul 22, 2024
Merged

Model updates, and some conversion logic #123

merged 8 commits into from
Jul 22, 2024

Conversation

sherwoodf
Copy link
Contributor

@sherwoodf sherwoodf commented Jul 12, 2024

  • Model updates from discussions with team. Mostly minor name changes, removing/adding fields that were dulpicates/i thought were duplicates but actually are not. Also created separate signal_channel_information object to hold that information within the image acquisition.
  • Created conversion logic for Specimen Preparation Protocol, Specimen Growth Protocol, Image Acquisition. Removed some unused code from the test/utils.py file in the ingest shared models package (which is why the diff is so big for that file)
  • Consolidated all the persist artefact logic used in multiple places in the ingest package into a util function. Did not use this for the study as that has more complex logic.

@@ -6,7 +6,8 @@
get_generic_section_as_dict,
mattributes_to_dict,
dict_to_uuid,
find_sections_recursive
find_sections_recursive,
persist
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't end up using this - so should remove the unused import (will do it in this PR if other changes are needed)



def get_template_channel() -> semantic_models.Channel:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template methods were copied over from the shared_models code package, but aren't actually needed here, so i decided to remove them.

@@ -0,0 +1,2 @@
[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create this file to get the experimental_imaging_dataset test working. It was checking the Annotation Dataset, and couldn't find this file & was therefore complaining. I don't understand why it was checking this, but i wanted to move on with this PR without figuring out the EID code right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because the function biostudies.find_file_lists_in_submission goes through all file lists in the submissions when trying to create file references. Since a file list was added to the "Segmentation masks" "Annotations" section in data/S-BIADTEST.json, find_file_lists_in_submission was looking for the file list ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to revisit the logic of this file in light of the fact that we are not storing the list of file_references anymore. We may have to trigger the generation of file_references after obtaining the uuid for the experimental dataset, so we can pass this to the function that creates file_references, allowing them to point to their parent expermental imaging dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With new approach (file_reference points to parent EID) we may have to re-write this function. (see comment on assignment of submission_dataset)

if persist_artefacts:
file_dict["uuid"] = fileref_uuid
file_dict["uri"] = file_uri(submission.accno, f)
file_dict["submission_dataset"] = fileref_uuid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a place holder - we need to pass the actual submission_dataset uuid (especially as this will now be the only link to its parent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not touched the file_reference code. That wasn't the intent of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I have created a clickup ticket to fix this which is assigned to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think re module is used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in:


def get_licence(study_attributes: Dict[str, Any]) -> semantic_models.LicenceType:
    """
    Return enum version of licence of study
    """
    licence = re.sub(r"\s", "_", study_attributes.get("License", "CC0"))
    return semantic_models.LicenceType(licence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But i guess we've changed the enums now, so we don't need that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this - no I think we still need it!

# file extension

file_path: str = Field(description="""The path (including the name) of the file.""")
# TODO: Clarify if this should be biostudies 'type' or derived from file extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this field relate to the biostudies 'type' - which is not really related to the extension of the file e.g. fire_object, directory etc as opposed to the file type derived from the file extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not even that: it's sort of 3 different types that we care about mushed together. But i've cut a ticket to deal with that later.



def get_template_channel() -> semantic_models.Channel:
return semantic_models.Channel.model_validate(
def get_test_specimen_growth_protocol() -> List[bia_data_model.ImageAcquisition]:
Copy link
Contributor

@kbab kbab Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be get_test_image_acquisition ? Or should it be deleted as there is a get_test_image_acquisition on line 221

Copy link
Contributor

@kbab kbab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some type hints that need correcting and some places where deletion of code is required, otherwise LGTM.

Copy link
Contributor

@kbab kbab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sherwoodf sherwoodf merged commit 66527f0 into main Jul 22, 2024
9 checks passed
@sherwoodf sherwoodf deleted the model_updates branch July 23, 2024 08:06
liviuba pushed a commit that referenced this pull request Jul 31, 2024
* model updates to standardise names further and to account for which fields will actually be generated endpoints

* model updates and added specimen growth, preparation, and image acquisition conversion logic

* added logic to generate annotation method objects

* tidied up imports

* created empty annotation file list to make tests pass

* moved file reference conversion to it's own file, fixed imports of shared models, and fixed file_name -> file_path as per model change for file references

* updated models and ingest code
ctr26 pushed a commit that referenced this pull request Jul 31, 2024
* model updates to standardise names further and to account for which fields will actually be generated endpoints

* model updates and added specimen growth, preparation, and image acquisition conversion logic

* added logic to generate annotation method objects

* tidied up imports

* created empty annotation file list to make tests pass

* moved file reference conversion to it's own file, fixed imports of shared models, and fixed file_name -> file_path as per model change for file references

* updated models and ingest code
ctr26 pushed a commit that referenced this pull request Aug 1, 2024
* model updates to standardise names further and to account for which fields will actually be generated endpoints

* model updates and added specimen growth, preparation, and image acquisition conversion logic

* added logic to generate annotation method objects

* tidied up imports

* created empty annotation file list to make tests pass

* moved file reference conversion to it's own file, fixed imports of shared models, and fixed file_name -> file_path as per model change for file references

* updated models and ingest code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants