Minutes Data Working Group 30 Jul 2020

Agenda

Review MONAI Github wiki for minutes and notes
Review previous meeting minutes
Review discussions/presentations from joint working group and steering committee
Plan next steps

Minutes

Joint effort between Data and Evaluation, Reproducibility & Benchmarking workgroups is to model samples that come from challenges and papers
- E.g., review the surgical data examples from papers referenced in previous minutes page
- Data workgroup should create a prototype of structure and schema in FHIR
  - (Brad) Synthesize a FHIR resource based on the papers
- MONAI should explore which Python FHIR library to explore, that can effectively convert "FHIR to Tensor"
  - (Brad) Look into Python libraries and share
MLFlow
- Explore MLFlow as a potential model lifecycle management tool; details from website (from https://mlflow.org/) include:
  - MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
  - MLflow Tracking: Record and query experiments: code, data, config, and results
  - MLflow Projects: Package data science code in a format to reproduce runs on any platform
  - MLflow Models: Deploy machine learning models in diverse serving environments
  - Model Registry: Store, annotate, discover, and manage models in a central repository
- Question: How does MLFlow intersect with MONAI?
- Question: Does MONAI plug into MLFlow and how does it integrate with the ecosystem?
- What group should explore this? Might be something for the reproducability group
Integrations and Partners
- What about H2O.ai? - H2O.AI has AutoML and reproducability tooling
- Should there be a "partners" ad-hoc workgroup for MONAI to look at integrations within the broader community? - e.g., what about AWS?
Feedback from engineering
- Dev team should look at slide 7 of the joint working group content
- Should look at cross-validation; how to stratify the data, repeat the training workflow
  - Validation meaning giving the results of the training model
  - E.g., repeat validation 5 times - unbiased validation of model quality
- Currently MONAI 0.2 only supports validation fraction of data - only parameter available for now - need to expand so users can generate a fixed set (not just random seed)
- MSD - only used in the JSON file provided originally by the challenge provider
  - Look at proposal of a FHIR specification; e.g., need a converter to take MSD to a FHIR format (could be a utility library)
  - Can the underlying data representation be normalized? Need to do experiment
- Evaluation: MSD is readonly - how do you filter studies based on predictive match
  - Search terms which would subset the data into a virtual collection (e.g., by patient age range)
  - What about holdout data, how to make this consistently done?

Action Items

(Brad) Synthesize a FHIR resource based on the papers
(Brad) Look into Python libraries and share
(Brad) Share Powerpoints and create Github issues to represent directions to grow

Copyright (c) MONAI Consortium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minutes Data Working Group 30 Jul 2020

Agenda

Minutes

Action Items

Working Groups

Clone this wiki locally