Skip to content

Minutes Data Working Group 30 Jul 2020

Brad edited this page Jul 31, 2020 · 2 revisions

Agenda

  • Review MONAI Github wiki for minutes and notes
  • Review previous meeting minutes
  • Review discussions/presentations from joint working group and steering committee
  • Plan next steps

Minutes

  • Joint effort between Data and Evaluation, Reproducibility & Benchmarking workgroups is to model samples that come from challenges and papers
    • E.g., review the surgical data examples from papers referenced in previous minutes page
    • Data workgroup should create a prototype of structure and schema in FHIR
      • (Brad) Synthesize a FHIR resource based on the papers
    • MONAI should explore which Python FHIR library to explore, that can effectively convert "FHIR to Tensor"
      • (Brad) Look into Python libraries and share
  • MLFlow
    • Explore MLFlow as a potential model lifecycle management tool; details from website (from https://mlflow.org/) include:
      • MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
      • MLflow Tracking: Record and query experiments: code, data, config, and results
      • MLflow Projects: Package data science code in a format to reproduce runs on any platform
      • MLflow Models: Deploy machine learning models in diverse serving environments
      • Model Registry: Store, annotate, discover, and manage models in a central repository
    • Question: How does MLFlow intersect with MONAI?
    • Question: Does MONAI plug into MLFlow and how does it integrate with the ecosystem?
    • What group should explore this? Might be something for the reproducability group
  • Integrations and Partners
    • What about H2O.ai? - H2O.AI has AutoML and reproducability tooling
    • Should there be a "partners" ad-hoc workgroup for MONAI to look at integrations within the broader community? - e.g., what about AWS?
  • Feedback from engineering
    • Dev team should look at slide 7 of the joint working group content
    • Should look at cross-validation; how to stratify the data, repeat the training workflow
      • Validation meaning giving the results of the training model
      • E.g., repeat validation 5 times - unbiased validation of model quality
    • Currently MONAI 0.2 only supports validation fraction of data - only parameter available for now - need to expand so users can generate a fixed set (not just random seed)
    • MSD - only used in the JSON file provided originally by the challenge provider
      • Look at proposal of a FHIR specification; e.g., need a converter to take MSD to a FHIR format (could be a utility library)
      • Can the underlying data representation be normalized? Need to do experiment
    • Evaluation: MSD is readonly - how do you filter studies based on predictive match
      • Search terms which would subset the data into a virtual collection (e.g., by patient age range)
      • What about holdout data, how to make this consistently done?

Action Items

  • (Brad) Synthesize a FHIR resource based on the papers
  • (Brad) Look into Python libraries and share
  • (Brad) Share Powerpoints and create Github issues to represent directions to grow
Clone this wiki locally