Skip to content

Minutes Data Working Group 26 Apr 2020

Brad edited this page Jul 21, 2020 · 1 revision

Agenda

  • Review and refine the committee scope
  • Task focus: identify the state of the art, and define best practices
  • Plan meeting cadence

Attendees

  • Invited: Brad Genereaux (Nvidia), Michael Götz (DKFZ), Carole Sudre (KCL), Stephen Aylward (Kitware), Ben Murray (KCL), Wenqi Li (Nvidia), Jorge Cardoso (KCL), Prerna Dogra (NVIDIA)

Notes

  • What is the state of art in biomarkers -
    • Preprocessing - trying to make MONAI - “rosetta stone” - make it speak every library for bringing in other libraries. Enabling thoughts, but not much further
    • Pre-processing pipeline built around NIFTI format - data array / offline and spacing orientation.
    • Non-imaging data is classification labels - no principled way to capture this
    • Bounding boxes is being added
    • NLP / communication in EHR is a different problem space - combining imaging and non-imaging data
    • Are their libraries that can be interfaced today that can bring pipelines in?
    • NIFTI is well-prepared for storing imaging data, but not for other data types
  • I/O Working Group - getting things off disk and store them in memory
  • File structures - XArrays in Python is an emerging standard
    • Preliminary support, not updating meta information
    • Connecting the data with the physical representation
  • Metadata for healthcare metadata - FHIR can be used, maybe CDA
    • But what are the use cases? It can be really broad, but too broad = maybe not very useful
  • Use cases for people using MONAI
    • Bringing tabular data into MONAI
    • Structured data - ontology tagged, data dictionary and key values
    • Structured data relating to images - coordinates, bounding boxes, imaging elements
    • Unstructured data - free text (NLP), 1D or 2D signals, heart rate, waveforms, movement rates
  • How do we make it easy for developers to feed the data to MONAI?
  • Design implications for the types of data being brought in
  • Use cases with data elements
    • Neuroimaging,
    • COPD
      • Gene expression
    • Ultrasound guided intervention
  • How do we connect data together? Imaging and non-imaging data
    • E.g., a table or data dictionary element
    • Define the data types of these elements
    • A data manifest?
  • Creating a way to represent “subjects”
  • Do we pass along a “bag of content”, or do we need to indicate “prominence” of content?
    • Making it too complex makes it unusable
    • Leveraging DICOM specifications like TID1500 while might be most descriptive might not work well with developers early on
    • Starting with something simple but supporting more complex transactions are ideal
  • “Minimal data set” - if the use case demands “just pass along an image and a label”, that should be sufficient - developers shouldn’t have to specify unnecessary content
  • Flexible data loader -> standardized internal representations
    • Are there numpy representations that are used in other domains?
  • What do we trust exists when things get passed along?
  • We should support “if someone passes in a set of JPGs”
  • Impact of digital pathology as an example - there’s open challenges in the interoperability space
  • A content manifest to define the content
    • Reproducibility
    • Provenance - where the data came from
    • Notion should be shared with the challenge group

Action items

  • Brad; Doodle poll
  • Brad; manifest
  • All; collect use cases -> and types of inputs and outputs (EHR, signals, sensors, genetic data - few markers)
  • Carole; put together a template and share with the team
Clone this wiki locally