Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 2.62 KB

README.md

File metadata and controls

51 lines (34 loc) · 2.62 KB

Labs

The labs consist of basic labs and advanced labs. In this session, we designed experimental courses from the perspective of system research.

Encourage students to implement and optimize system modules by operating and applying mainstream and latest frameworks, platforms and tools to improve their ability to solve practical problems, not just understanding the use of tools.

Target users

  • Junior and Senior students in colleges
  • Graduate students

Experimental design goals

This experimental course is designed from the perspective of system research. Through the operation and application of mainstream and latest frameworks, platforms and tools, students are encouraged to implement and optimize system modules to improve their ability to solve practical problems, rather than just understanding the use of tools.

Experimental design features

  1. Provide a unified framework, platform and tools.

  2. Design an operable experiments content.

  3. The experiment content of universal design is convenient to deepen and improve according to the characteristics of the universities.

  4. Get started with practical engineering projects and deepen the understanding of AI systems.

Contents

Basic Labs

Lab No.
Lab Name Remarks
Prerequisites Setup Environment Setup envoironment for the experiments
Lab 1 A simple end-to-end AI example,
from a system perspective
Understand the systems from debug info and system logs
Lab 2 Customize operators Design and implement a customized operator (both forward and backward) in python
Lab 3 CUDA implementation Add a CUDA implementation for the customized operator
Lab 4 AllReduce implementation Improve AllReduce on Horovod: implement a lossy compression (3LC) on GPU for low-bandwidth network
Lab 5 Configure containers for customized training and inference Configure containers

Advanced Labs

Lab No.
Lab Name Remarks
Lab 6 Scheduling and resource management system Get familiar with OpenPAI or KubeFlow
Lab 7 Distributed training Try different kinds of all reduce implementations
Lab 8 AutoML Search for a new neural network structure for Image/NLP tasks
Lab 9 RL Systems Configure and get familiar with one of the following RL Systems: RLlib, …