Skip to content

Latest commit

 

History

History
278 lines (223 loc) · 18.7 KB

curriculum.md

File metadata and controls

278 lines (223 loc) · 18.7 KB

RSE- Master curriculum

Preamble

The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes.

For the purpose of creating an RSE-Master Programm we identify the RSE as a person who creates or improves research software and/or the structures that the software interacts with in the computational environment of a research domain. In this spectrum we see skilled team member who may also choose to conduct own research as part of their role. But on the other end we also see paths for an RSE to specifically focus on a technical role as an alternative to a traditional research role because they enjoy and wish to focus on the development of research software.

For this task, to support research with/in the creation of digital tools, we structure this sample curriculum along three pillars:

  • research skills: these are competencies that enable an RSE to effectively participate in the research domain.
  • technical skills: these are compteencies, that enable an RSE to create effective tools for research
  • communication skills: these are skills that enable an RSE to effectively work and communicate with its peers and stakeholders across multiple domains.

Possible job roles for an RSE

Open science RSE

Open science and FAIRness of data and software are increasingly important topics in research, as exemplified by the demand of an increasing amount of research funding agencies requiring openness. Hence, an open science RSE is required to have a deeper knowledge in (\gls{RC}) and how to distribute software publicly (\gls{SRU}, \gls{SP}). Open Science RSEs can help researchers navigate the technical questions that come up when practising Open Science, such as "How do I make my code presentable?", "How do I make my code citable?", "What do I need to do to make my software \ac{FAIR}?", or "How do I sustainably work with an (international) team on a large code base?". Like the Data-focused RSE, they have a deep understanding of \ac{RDM} topics.

Project/community manager RSEs

When research software projects become larger, they need someone who manages processes and people. In practice, this concerns change management for code and documentation and community work to safeguard usability and adaptability, but also handling project governance and scalable decision-making processes. This gap can be filled by people who invest in the (\gls{PM}), (\gls{USERS}), and (\gls{TEAM}) skills, as exemplified in @subsec:examplecareer. Building a community around a research project is an important building block in building sustainable software [@Segal2009], so these RSEs play an important role, even if they do not necessarily touch much of the code themselves.

Teaching RSEs

RSEs interested in developing their (\gls{TEACH}) skill can focus on teaching the next generation of researchers and/or RSEs and will play a vital role in improving the quality of research software. They need to have a good understanding of all RSE competencies relevant to their domain and additionally should have experience or training in the educational field.

User interface/user experience designers for research software

Scientific software is a complex product that often needs to be refined in order to be usable even by other scientists. To facilitate this, there are people required that specialise in the (\gls{DOCBB}) and probably the (\gls{DIST}) competency with a focus on making end-user facing software really reusable and hence \ac{FAIR}. This task is supported by strong (\gls{MOD}) skills to reason about the behaviour of potential users of the software.

Specialisations outside the core RSE competencies

${DOMAIN}-RSE

While software is the lingua franca of all RSEs, there will be RSEs that have specialised in the intricacies of one particular research domain, such as medical RSEs, digital humanities RSEs, or physics RSEs. This can often serve as a base domain for RSE specialisation as in @subsec:examplemaster.

Data-focused RSE

RSEs working at the flourishing intersection between data science and RSE. They are skilled in cleaning data and/or running data analyses and can help researchers in setting up their analysis pipeline and/or \ac{RDM} solutions. When the field requires research on sensitive data or information, e.g., patient information in medicine, this RSE should have knowledge about secure transfer methods and/or ways to anonymise the data. As part of \ac{RDM}, this RSE profile is able to support all stages of the research data life cycle [@Nieva2020], with synchronous data management processes. Those processes implement established best practices for planning and documenting of data acquisition in a \ac{DMP}, as well as for management, storage, and preservation of data, and publication and sharing of data in repositories according to the \ac{FAIR} principles [@FAIR].

Research infrastructure RSE

This RSE is interested in \glspl{SysOp} and system administration and sets up \ac{IT} infrastructures for and with researchers. Therefore, this specialisation on the one hand requires a deep knowledge of physical computer and network hardware and on the other hand knowledge about setup and configuration of particular server software, e.g., setup of virtual machines on hypervisors or the planning and setup of compute server clusters for \ac{ML}. As an interface between the researchers and the infrastructure, they take care of user management, access permissions, and configuration of required services.

Legacy RSEs

Research software may be used and evolved over generations of research without change management or governance processes, while software "ecosystems" (e.g., programming languages, frameworks, operating systems) constantly evolve. This may lead to the emergence of legacy code that is actively used. To safeguard continued usability and adoption, these RSEs have experience in working with legacy code, and are competent in the application of software stacks that are no longer part of the general curricula (e.g., \gls{COBOL} or \gls{Fortran}). Adaption of existing, large-scale codebases to evolving dependencies (\gls{DIST}) or changing hardware (\ac{HPC}; see the HPC-RSE point below) may require mastery in refactoring techniques and in the usage of specialised code transformation tools.

HPC-RSE

RSEs with a focus on \ac{HPC} have specialist knowledge about programming models that can be used to efficiently undertake large-scale computations on parallel computing clusters. They may have knowledge of (automatic) code optimisation tools and methods and will understand how to write code that is optimised for different types of computing platforms, leveraging various efficiency related features of the target hardware. They are familiar with \acrshort{HPC}-specific package managers and can build dependencies from sources. They also understand the process of interacting with job scheduling systems that are often used on \ac{HPC} clusters to manage the queuing and running of computational tasks. \acrshort{HPC}-focused RSEs may be involved with managing \ac{HPC} infrastructure at the hardware or software level (or both) and understand how to calculate the environmental impact of large-scale computations. Their knowledge of how to run \ac{HPC} jobs and write successful \ac{HPC} access proposals can be vitally important to researchers wanting to make use of \ac{HPC} infrastructure.

ML-RSE

The development of research software based on \ac{ML} requires specialised theoretical background and experienced handling of appropriate software in order to produce meaningful results. This involves knowledge about data analysis and feature engineering, metrics that are involved in \ac{ML}, \ac{ML} algorithm selection and cross validation, and knowledge in mathematical optimisation methods and statistics. Here, we use \ac{ML} in a broad sense of machine-based learning including deep learning, reinforcement learning, neuro-symbolic learning and similar.

ML-RSEs analyse and check the suitability of an algorithm if it fulfils the needs of a certain task and they play a main role in deciding and selecting \ac{ML} libraries for a given task. The increasing usage of \ac{ML} in numerous scientific areas with social impact involves an emphasised awareness and consideration of possible influences and biases. At the intersection of data science [@SSIDataScience] and data-focused RSEs, the complex way of solving problems utilising \ac{ML} calls for this separate specialisation.

Web-development RSE

This RSE is skilled in web applications, front- and/or backend, and/or building and using APIs, for example for research data portals or big research projects. Ideally, this RSE should also have knowledge about (web) accessibility to allow a broad range of researchers or even the public to use the resulting applications. Therefore, a deep knowledge of web development skills is a required additional skill for this RSE.

Legal-RSE

All RSEs are a go-to person for questions about licensing, in particular when mixing software components with different licences. But with the rising requirements from legislation, we foresee the need for RSEs that still have a background in RSE but extend it with a knowledge of legal processes, that cover corner cases and go beyond applying Best Practice guides. These requirements may arise in the area of publication of research software, as this also requires knowledge about particular laws or regulatory frameworks concerning data protection, like the \ac{GDPR} within the \ac{EU} [@GDPR]. Another area are legal aspects of cybersecurity and export control in science and research (see @ExportControl for Germany). Legal-RSEs focus on facilitating the achievement of technically feasible solutions, while adhering to regulatory mandates. They are able to communicate and collaborate with lawyers.

A curriculum

Semester 1

This is the harmonization semester. We assume physicists, bioinformatics and computer scientists

Technical Skills (13 ECTS, 16 SWS):

  • Lecture: Data-centric language: e.g. R/Python 2SWS -> 1 ECTS
  • Exercise: Data-science with a Data-centric language(Also with NFDI repo) 4 SWS -> 4 ECTS
  • Lecture: Computing Environment Module - 1 (unix shell, Versionskontrolle, documentation, tests, IDE?) (2SWS) -> 1 ECTS
  • Lecture: theoretical Foundations of Computer Science 2SWS -> 1 ECTS
  • Exercise: Computing Environment Module applied with an interpreted language 2 SWS -> 2 ECTS
  • Software Engineering lecture (Requirements Engineering, Software Architecture) 4 SWS -> 4 ECTS

Research Skills(5 ECTS, 8SWS):

  • Lecture: Research + Data Ethics (4SWS) -> 2 ECTS
  • Lecture: Data science/analysis/statistics (2SWS) -> 1ECTS
  • Lecture: Interaction with Data repositories in NFDI (2 SWS) -> 1 ECTS

Communication Skills (4 ECTS, 4 SWS):

  • (*) Lecture + Exercise + Praktikum: collaborative Software Development with a platform(issues, MRs, project management, branches) (2SWS), 2 ECTS -> this can be offered as a service course to other domains
  • Seminar: Science Slam! Present your Bachelor thesis to a an audience of non-experts! (2SWS) 2ECTS Preparation: (Elements of Carpentry Instructor Training?)

Domain?

  • Exercise: Interaction with Data repositories in NFDI (2 SWS) -> 1 ECTS approximately 9 ECTS per Semester

Semester 2

Technical Skills (8 SWS, 4 ECTS):

  • Lecture: Machine-oriented language: E.g. Fortran/C++ 2SWS -> 1ECTS
  • Lecture: Hardware Module: (FP Arithmetik, Neumann-Architektur, Computer-Netzwerke(4 Wochen: from LAN upto WAN), Speicherlayout(Caches, NUMA), Heterogene Compute Architectures) 2SWS - 1 ECTS
  • Exercise: Machine-oriented languages on actual hardware 2 SWS -> 1ECTS
  • Lecture: Computing Environment Module - 2 ()

Research Skills: (14 SWS, 8ECTS)

  • Lecture: Urheberrecht, Verwertung, Patentierbarkeit (Vorlesung, 2 SWS) -> 1ECTS
  • Lecture: Software Publishing, Scientific Publishing, Technical Documentation (2SWS) -> 1 ECTS
  • Praktikum: Publishing des Projekts aus (*) mit gegenseitigem review (2SWS, aber eher 6x 4SWS, oder als Block) -> 1ECTS
  • Block-Praktikum: ReproHack, eigene Domäne (4 SWS, 4 ECTS), gerne auch als Service Veranstaltung: Nachwuchsgewinnung
  • 4 SWS Domäne

Communication Skills (6 SWS)

  • Lecture: Software-Management (Project Management focussed) (2SWS)
  • Lecture + Praktischer Teil: Lehre mit Erwachsenen: Carpentry mit Erwachsen (Carpentry Instructor training ist 1/3 Semester mit kleinem Praktikum ) , Erwachsenenbildung aus dem Angebot der Universität. (4 SWS)

Semester 3

  • Introduction to thesis topic

Technical Skills:

  • optional SE lectures

Research Skills:

  • optional: more domain lectures.
  • Lecture: Meta-Research(2 SWS) -> 1 ECTS
  • Block-Praktikum: ReproHack, fremde Domäne (4 SWS, 4 ECTS), gerne auch als Service Veranstaltung

Communication Skills

Domain?

Semester 4

  • Thesis

Ideas:

Electronic Lab course. heard of this in Erlangen for physics. Talks about ELN among other things.

Original Motivation:

The target audience for such a master's programme would be students holding a bachelor's degree from a domain science, which we will call "home domain" in the following. There is explicitly no restriction on the candidates' home domain: it may be from the \ac{STEM} disciplines, life sciences, humanities or social sciences. Candidates with a bachelor's degree in computer science are also explicitly included, although we acknowledge that their master's programme should include adaptations to make their interaction effective with other domain scientists. In order to give the future RSE the necessary breadth, we expect this to be a four semester curriculum.

The curriculum is formed from a combination of modules, some of which are core modules teaching essential skills that must be completed by all students. Other modules introduce more specialised concepts and skills. During the master's programme, students should pick an RSE specialisation from the list in this paper and attend these additional modules to deepen their knowledge in the field.

Core modules are of course drawn from the three pillars of the RSE and can be categorised accordingly.

  • Software/Technical skills:

    • Foundational module: Here we have an introduction to programming: Emphasising use cases over programming paradigms, students learn at least two languages: a language that facilitates prototyping and data processing (e.g., \gls{Python} or \gls{R}) and a language for designing complex, performance-critical systems (e.g., \gls{C}/\gls{Cpp}). This exposes them to computers in a hands-on fashion and is the foundation for (\gls{DOCBB}, \gls{DIST}).
    • Computing environment module: Programming languages are not enough to work in a landscape of many interconnected software components; hence we require something like software craftsmanship, where tools such as the Unix shell, version control systems, build systems, documentation generators, package distribution platforms, and software discovery systems are taught to strengthen skills in (\gls{DIST}, \gls{DOCBB}, \gls{SWREPOS}, \gls{SRU}).
    • Software engineering module: Here we develop foundational software engineering competencies (basic knowledge and skill regarding requirements engineering, software architecture and design, implementation, quality assurance, software evolution), again emphasising and strengthening (\gls{DOCBB}, \gls{DIST}) on a more abstract level.
  • Research skills:

    • Optional domain mastery module: Additional minor research courses, but students with a home-domain already have the research part well-covered.
    • Research tools module: Here we teach tools used to distribute and publish software, as well as introducing students to domain specific data repositories. Thereby gaining foundational knowledge in (\gls{SRU}, \gls{SP}, \gls{DOMREP}).
    • Meta-research module: Here we teach people how research works. The research life cycle is introduced, as well as the data life cycle and the software life cycle are abstractly introduced.
  • Communication skills:

    • Project management methods: Here we teach project management methods that are useful in science, such as agile ones (\gls{PM}).
    • Communication skills module: Here we have courses focusing on interdisciplinary communication, interacting across cultures, communication in hierarchies, supporting end users effectively. These are all facets of the (\gls{USERS}) skill.
    • Teaching module: This module covers topics to effectively design courses and teaching material for the various digital tools, thereby strengthening the (\gls{TEACH}) skill.

Given that RSE work also involves a lot of craftsmanship skills, hands-on practice is an integral part of the curriculum. At least two lab projects are required within the mandatory curriculum. These should be executed as a team and involve a question from a domain science. We recommend covering both the candidate's home domain and another domain of science. Ideally, projects stem from collaborations with scientists within the institution and RSE students take the role of a consultant. This setup strengthens the (\gls{TEAM}, \gls{TEACH}, \gls{USERS}) skill and most likely also the (\gls{MOD}) skill through interaction.

To emphasise the exposure to domains outside their bachelor's degree domain, we recommend that RSEs also support their non-home-domain project by supporting it with introductory courses from this discipline. We support the idea of broadening the interaction with other domains even more. This schools their ability to quickly adapt their vocabulary and thinking to other disciplines. This is an aspect of (\gls{MOD}).

To align with the specialisations listed in this paper, example optional modules include topics on \ac{HPC} engineering/parallel programming, numerical mathematics/scientific computing, web technologies, data stewardship, AI models/statistics, and community management/training.

The programme is finalised with a master's thesis which should be dual-supervised by an RSE supervisor from an actual project, and a domain supervisor. The thesis should answer a relevant research question (strengthening (\gls{NEW})) from the domain using computational methods. Software development is required, and the code is part of the gradable deliverables. The RSE supervisor ensures and grades the software craftsmanship aspects of the project. This setup ensures that we are grading the effectiveness of applying RSE skills in an actual research environment.