Compact course: AI in research software

Is your research based on data? Do you use and/or train machine-learning models in your research? Then this course may be of interest to you!

This is a joint compact course held by Dr. Georg Schwesinger/Dr. Sebastian Zangerle (Research Data Unit), Peter Lippmann (Scientific AI group) and Dr. Inga Ulusoy (Scientific Software Center).

Context: The AI revolution is moving even more rapidly than the digital revolution and leads to the emergence of completely new tools and technologies that affect the scientific process. In this course, we will learn about data-based research software, tools and communities that are relevant in creating and sharing such software, and about best practices in data preparation, data sharing, training, sharing and using machine-learning models. Further, legal and ethical considerations will be discussed, as well as software security and possible pitfalls.

Learning objectives

After the course participants will be able to

Understand and follow best practices about preparing a dataset for training and testing
Understand and follow best practices in training ML models
Including appropriate tests in machine-learning based research software (MLBRS)
Apply software engineering best practices to your machine-learning based research software (MLBRS)
Avoid negative impact from legal, ethical and security issues
Making your results more generally applicable through using appropriate checklist for ML approaches

Prerequisites

Basic Python knowledge and knowledge about data processing, ML models and training of models is required.

Course content

The slides for the complete course can be found here.

1. Requirements of "ML-based science"

What this course is not
What this course is about
The intersection of data, software and engineering: Key aspects Slides for this section

2. Research Data Management

Data management and the Research Data Unit
Data availability and sharing: Open Research Data
Data findability and publication
Data licensing Slides for this section

3. Research Data Quality

How to prepare your data
Understand your data
Data transformations: When and how Slides for this section

4. Modeling of Research Data

Choosing a model
Evaluating a model: Underfitting and overfitting
Tooling
Making predictions
Model cards, sharing and publishing your model
How to (unit-)test machine-learning based research software
Model and software deployment Slides for this section

5. Machine-learning based research software: Software engineering best practices

Version control
Development workflows
Requirements and project management
Quality control
Packaging
Containerisation
Software Licensing Slides for this section

6. Making your work public: Considerations of more general use and prominent failures

Publishing checklists
REFORMS
MLBRS security and best practices
Ethical considerations
Legal considerations
Prominent failures: AI in general
Prominent failures: AI in research software
Common pitfalls and mistakes Slides for this section

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
.flake8_nb		.flake8_nb
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
machine-learning based research software 2024.pdf		machine-learning based research software 2024.pdf
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compact course: AI in research software

Learning objectives

Prerequisites

Course content

1. Requirements of "ML-based science"

2. Research Data Management

3. Research Data Quality

4. Modeling of Research Data

5. Machine-learning based research software: Software engineering best practices

6. Making your work public: Considerations of more general use and prominent failures

About

Releases

Packages

Languages

License

ssciwr/machine-learning-based-research-software-course

Folders and files

Latest commit

History

Repository files navigation

Compact course: AI in research software

Learning objectives

Prerequisites

Course content

1. Requirements of "ML-based science"

2. Research Data Management

3. Research Data Quality

4. Modeling of Research Data

5. Machine-learning based research software: Software engineering best practices

6. Making your work public: Considerations of more general use and prominent failures

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages