Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
Add release note for v1.8.0 (#5564)
Browse files Browse the repository at this point in the history
  • Loading branch information
yiyione authored Jul 15, 2021
1 parent ba25509 commit d60cba4
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 22 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![Join the chat at https://gitter.im/Microsoft/pai](https://badges.gitter.im/Microsoft/pai.svg)](https://gitter.im/Microsoft/pai?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Version](https://img.shields.io/github/release/Microsoft/pai.svg)](https://github.com/Microsoft/pai/releases/latest)

**OpenPAI [v1.7.0](./RELEASE_NOTE.md#April-2021-version-170) has been released!**
**OpenPAI [v1.8.0](./RELEASE_NOTE.md#July-2021-version-180) has been released!**

With the release of v1.0, OpenPAI is switching to a more robust, more powerful and lightweight architecture. OpenPAI is also becoming more and more modular so that the platform can be easily customized and expanded to suit new needs. OpenPAI also provides many AI user-friendly features, making it easier for end users and administrators to complete daily AI tasks.

Expand Down
18 changes: 17 additions & 1 deletion RELEASE_NOTE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# OpenPAI Release Note

## July 2021 (version 1.8.0)

- Marketplace related update
- Please see [Marketplace](https://github.com/microsoft/openpaimarketplace/releases/tag/v1.8.0) for more details

- Alert manager
- Send alert to users when job status changed #5337

- Webportal
- Support UX of Job Priority #5417

- Others
- Customizable Autoscaler #5412
- Add custom ssl port support #5386
- Clean up repo. Remove obsolete code #5489

## April 2021 (version 1.7.0)

- Marketplace related update
Expand All @@ -11,7 +27,7 @@
- In new submission page, the sidebar can be shrank to give the main area more visual space.
- The new submission page moves the yaml editor into a single page, which allows user to focus on setting config or editing yaml protocol.
- The new submission page improves the responsive design in small and medium resolution.

> Know Issue: Tensorboard tool is not implemented in the new submission page yet. If you need to use it, please use the old version.
- Alert system enhancement
Expand Down
2 changes: 1 addition & 1 deletion contrib/kubespray/config/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
user: forexample
password: forexample
docker_image_tag: v1.7.0
docker_image_tag: v1.8.0

# Optional

Expand Down
24 changes: 12 additions & 12 deletions docs/manual/cluster-admin/installation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ To install OpenPAI >= `v1.0.0`, please first check [Installation Requirements](#

The deployment of OpenPAI requires you to have **at least 3 separate machines**: one dev box machine, one master machine, and one worker machine.

Dev box machine controls masters and workers through SSH during installation, maintenance, and uninstallation. There should be one, and only one dev box.
Dev box machine controls masters and workers through SSH during installation, maintenance, and uninstallation. There should be one, and only one dev box.

The master machine is used to run core Kubernetes components and core OpenPAI services. Currently, OpenPAI does not support high availability and you can only specify one master machine.

Expand All @@ -27,7 +27,7 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r
<td>Dev Box Machine</td>
<td>
<ul>
<li>It can communicate with all other machines (master and worker machines).</li>
<li>It can communicate with all other machines (master and worker machines).</li>
<li>It is separate from the cluster which contains the master machine and worker machines.</li>
<li>It can access the internet, especially needs to have access to the docker hub registry service or its mirror. Deployment process will pull Docker images.</li>
</ul>
Expand All @@ -38,7 +38,7 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r
<li>SSH service is enabled.</li>
<li>Passwordless ssh to all other machines (master and worker machines).</li>
<li>Docker is installed.</li>
</ul>
</ul>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -66,16 +66,16 @@ We recommend you to use CPU-only machines for dev box and master. The detailed r

The worker machines are used to run jobs. You can use multiple workers during installation.

We support various types of workers: CPU workers, GPU workers, and workers with other computing devices (e.g. TPU, NPU).
We support various types of workers: CPU workers, GPU workers, and workers with other computing devices (e.g. TPU, NPU).

At the same time, we also support two schedulers: the Kubernetes default scheduler, and [hivedscheduler](https://github.com/microsoft/hivedscheduler).

Hivedscheduler is the default for OpenPAI. It supports virtual cluster division, topology-aware resource guarantee, and optimized gang scheduling, which are not supported in the k8s default scheduler.
Hivedscheduler is the default for OpenPAI. It supports virtual cluster division, topology-aware resource guarantee, and optimized gang scheduling, which are not supported in the k8s default scheduler.


For now, the support for CPU/NVIDIA GPU workers and workers with other computing device is different:

- For CPU workers and NVIDIA GPU workers, both k8s default scheduler and hived scheduler can be used.
- For CPU workers and NVIDIA GPU workers, both k8s default scheduler and hived scheduler can be used.
- For workers with other types of computing devices (e.g. TPU, NPU), currently, we only support the usage of the k8s default scheduler. You can only include workers with the same computing device in the cluster. For example, you can use TPU workers, but all workers should be TPU workers. You cannot use TPU workers together with GPU workers in one cluster.

Please check the following requirements for different types of worker machines:
Expand Down Expand Up @@ -116,7 +116,7 @@ Please check the following requirements for different types of worker machines:
<ul>
<li><b>NVIDIA GPU Driver is installed.</b> You may use <a href="./installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed">a command</a> to check it. Refer to <a href="./installation-faqs-and-troubleshooting.html#how-to-install-gpu-driver">the installation guidance</a> in FAQs if the driver is not successfully installed. If you are wondering which version of GPU driver you should use, please also refer to <a href="./installation-faqs-and-troubleshooting.html#which-version-of-nvidia-driver-should-i-install">FAQs</a>.</li>
<li><b><a href="https://github.com/NVIDIA/nvidia-container-runtime">nvidia-container-runtime</a> is installed. And be configured as the default runtime of docker.</b> Please configure it in <a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file (daemon.json)</a>, instead of in the systemd's config. You can use command <code>sudo docker run --rm nvidia/cuda:10.0-base nvidia-smi</code> to check it. This command should output information of available GPUs if it is setup properly. Refer to <a href="./installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime">the installation guidance</a> if it is not successfully set up. We don't recommend to use <code>nvidia-docker2</code>. For a detailed comparison between <code>nvidia-container-runtime</code> and <code>nvidia-docker2</code>, please refer to <a href="https://github.com/NVIDIA/nvidia-docker/issues/1268#issuecomment-632692949">here</a>. </li>
</ul>
</ul>
</td>
</tr>
<tr>
Expand All @@ -139,7 +139,7 @@ Please check the following requirements for different types of worker machines:
<li>The driver of the device is installed.</li>
<li>The container runtime of the device is installed. And be configured as the default runtime of docker. Please configure it in <a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file</a>, because systemd's env will be overwritten during installation.</li>
<li>You should have a deployable <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/">device plugin</a> of the computing device. After the Kubernetes is set up, you should manually deploy it in cluster. </li>
</ul>
</ul>
</td>
</tr>
</tbody>
Expand All @@ -163,7 +163,7 @@ cd pai
Choose a version to install by checkout to a specific tag:

```bash
git checkout v1.7.0
git checkout v1.8.0
```

Please edit `layout.yaml` and a `config.yaml` file under `<pai-code-dir>/contrib/kubespray/config` folder.
Expand Down Expand Up @@ -220,7 +220,7 @@ machine-list:
``` yaml
user: forexample
password: forexample
docker_image_tag: v1.7.0
docker_image_tag: v1.8.0
# Optional
Expand All @@ -236,7 +236,7 @@ docker_image_tag: v1.7.0
# docker_cache_azure_container_name: "dockerregistry"
# docker_cache_fs_mount_path: "/var/lib/registry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# docker_cache_htpasswd: ""
# enable_marketplace: "true"
#############################################
Expand Down Expand Up @@ -362,7 +362,7 @@ You can run the following commands to set up kubectl on your localhost:
ansible-playbook -i ${HOME}/pai-deploy/kubespray/inventory/pai/hosts.yml set-kubectl.yml --ask-become-pass
```

By default, we don't set up `kubeconfig` or install `kubectl` client on the dev box machine, but we put the Kubernetes config file in `~/pai-deploy/kube/config`. You can use the config with any Kubernetes client to verify the installation.
By default, we don't set up `kubeconfig` or install `kubectl` client on the dev box machine, but we put the Kubernetes config file in `~/pai-deploy/kube/config`. You can use the config with any Kubernetes client to verify the installation.

Also, you can use the command `ansible-playbook -i ${HOME}/pai-deploy/kubespray/inventory/pai/hosts.yml set-kubectl.yml --ask-become-pass` to set up `kubeconfig` and `kubectl` on the dev box machine. It will copy the config to `~/.kube/config` and set up the `kubectl` client. After it is executed, you can use `kubectl` on the dev box machine directly.

Expand Down
12 changes: 6 additions & 6 deletions docs_zh_CN/manual/cluster-admin/installation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ master机器用于运行核心Kubernetes组件和核心OpenPAI服务。目前,
<td>dev box 机器</td>
<td>
<ul>
<li>它可以与所有其他机器(master和worker机器)通信。</li>
<li>它可以与所有其他机器(master和worker机器)通信。</li>
<li>它是独立于master机器和worker机器之外的一台机器。</li>
<li>它可以访问Internet。尤其是可以访问Docker Hub。部署过程会从Docker Hub拉取Docker镜像。</li>
</ul>
Expand All @@ -38,7 +38,7 @@ master机器用于运行核心Kubernetes组件和核心OpenPAI服务。目前,
<li>SSH服务已开启。</li>
<li>可以免密登录所有master和worker机器。</li>
<li>Docker已被正确安装。</li>
</ul>
</ul>
</td>
</tr>
<tr>
Expand Down Expand Up @@ -116,7 +116,7 @@ hivedscheduler是OpenPAI的默认调度器,它支持虚拟集群划分,拓
<ul>
<li><b>GPU驱动已被正确安装。</b> 您可以用<a href="./installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed">这个命令</a>来检查。 如果您的GPU驱动未被正确安装,可以参考<a href="./installation-faqs-and-troubleshooting.html#how-to-install-gpu-driver">如何安装GPU驱动</a>。如果您对安装哪个版本的GPU驱动有疑问,可以阅读<a href="./installation-faqs-and-troubleshooting.html#which-version-of-nvidia-driver-should-i-install">这个文档</a>。</li>
<li><b><a href="https://github.com/NVIDIA/nvidia-container-runtime">nvidia-container-runtime</a>已被正确安装,并且被设置为Docker的默认runtime。</b> 因为systemd的配置会在接下来安装过程中被覆盖,所以请不要在systemd里设置 docker 默认runtime,而是在<a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file (daemon.json)</a>里进行设置。 您可以使用命令<code>sudo docker run --rm nvidia/cuda:10.0-base nvidia-smi</code> 来检查这一项。如果该命令成功打出当前可用的显卡个数,就说明设置是没问题的。如果它未被正确安装,请参考<a href="./installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime">如何安装nvidia container runtime</a>。 我们不推荐您使用<code>nvidia-docker2</code>。 有关 <code>nvidia-container-runtime</code> 和 <code>nvidia-docker2</code> 的详细对比,请参考<a href="https://github.com/NVIDIA/nvidia-docker/issues/1268#issuecomment-632692949">这里</a>。</li>
</ul>
</ul>
</td>
</tr>
<tr>
Expand All @@ -139,7 +139,7 @@ hivedscheduler是OpenPAI的默认调度器,它支持虚拟集群划分,拓
<li>设备的驱动已被正确安装</li>
<li>设备的 container runtime 已被正确安装,并且被设置为Docker的默认runtime。因为systemd的配置会在接下来安装过程中被覆盖,所以请不要在systemd里设置 docker 默认runtime,而是在<a href="https://docs.docker.com/config/daemon/#configure-the-docker-daemon">docker-config-file</a>里进行设置。</li>
<li>您需要用一个该设备的<a href="https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/">device plugin</a>。在Kubernetes安装后,您需要手动将该device plugin部署在集群中。</li>
</ul>
</ul>
</td>
</tr>
</tbody>
Expand All @@ -163,7 +163,7 @@ cd pai
checkout到某一个tag,来选择需要安装的OpenPAI版本:

```bash
git checkout v1.7.0
git checkout v1.8.0
```

接下来,请编辑`<pai-code-dir>/contrib/kubespray/config`目录下的`layout.yaml``config.yaml`文件。
Expand Down Expand Up @@ -221,7 +221,7 @@ machine-list:
``` yaml
user: forexample
password: forexample
docker_image_tag: v1.7.0
docker_image_tag: v1.8.0
# Optional
Expand Down
2 changes: 1 addition & 1 deletion version/PAI.VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v1.7.0
v1.8.0

0 comments on commit d60cba4

Please sign in to comment.