-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Minutes Data Working Group 26 Apr 2020
Brad edited this page Jul 21, 2020
·
1 revision
- Review and refine the committee scope
- Task focus: identify the state of the art, and define best practices
- Plan meeting cadence
- Invited: Brad Genereaux (Nvidia), Michael Götz (DKFZ), Carole Sudre (KCL), Stephen Aylward (Kitware), Ben Murray (KCL), Wenqi Li (Nvidia), Jorge Cardoso (KCL), Prerna Dogra (NVIDIA)
- What is the state of art in biomarkers -
- Preprocessing - trying to make MONAI - “rosetta stone” - make it speak every library for bringing in other libraries. Enabling thoughts, but not much further
- Pre-processing pipeline built around NIFTI format - data array / offline and spacing orientation.
- Non-imaging data is classification labels - no principled way to capture this
- Bounding boxes is being added
- NLP / communication in EHR is a different problem space - combining imaging and non-imaging data
- Are their libraries that can be interfaced today that can bring pipelines in?
- NIFTI is well-prepared for storing imaging data, but not for other data types
- I/O Working Group - getting things off disk and store them in memory
- File structures - XArrays in Python is an emerging standard
- Preliminary support, not updating meta information
- Connecting the data with the physical representation
- Metadata for healthcare metadata - FHIR can be used, maybe CDA
- But what are the use cases? It can be really broad, but too broad = maybe not very useful
- Use cases for people using MONAI
- Bringing tabular data into MONAI
- Structured data - ontology tagged, data dictionary and key values
- Structured data relating to images - coordinates, bounding boxes, imaging elements
- Unstructured data - free text (NLP), 1D or 2D signals, heart rate, waveforms, movement rates
- How do we make it easy for developers to feed the data to MONAI?
- Design implications for the types of data being brought in
- Use cases with data elements
- Neuroimaging,
- COPD
- Gene expression
- Ultrasound guided intervention
- How do we connect data together? Imaging and non-imaging data
- E.g., a table or data dictionary element
- Define the data types of these elements
- A data manifest?
- Creating a way to represent “subjects”
- Do we pass along a “bag of content”, or do we need to indicate “prominence” of content?
- Making it too complex makes it unusable
- Leveraging DICOM specifications like TID1500 while might be most descriptive might not work well with developers early on
- Starting with something simple but supporting more complex transactions are ideal
- “Minimal data set” - if the use case demands “just pass along an image and a label”, that should be sufficient - developers shouldn’t have to specify unnecessary content
- Flexible data loader -> standardized internal representations
- Are there numpy representations that are used in other domains?
- What do we trust exists when things get passed along?
- We should support “if someone passes in a set of JPGs”
- Impact of digital pathology as an example - there’s open challenges in the interoperability space
- A content manifest to define the content
- Reproducibility
- Provenance - where the data came from
- Notion should be shared with the challenge group
- Brad; Doodle poll
- Brad; manifest
- All; collect use cases -> and types of inputs and outputs (EHR, signals, sensors, genetic data - few markers)
- Carole; put together a template and share with the team