To run MP-BERT at Ascend using the MindSpore framework, it is recommended to use Docker, an open source application container engine that allows developers to package their applications, as well as dependency packages, into a lightweight, portable container.
By using Docker, rapid deployment of MindSpore can be achieved and isolated from the system environment.
Download Ascend traning base image for MindSpore framework from ascendhub:
https://ascendhub.huawei.com/public-ascendhub/mindspore-modelzoo#/detail/ascend-mindspore
Note: Ascend and CANN firmware and drivers need to be installed in advance before installation.
Confirm installation of ARM-based Ubuntu 18.04/CentOS 7.6 64-bit operating system.
Linux server with GPU
Support for docker, conda and pip installation environments, see:
https://www.mindspore.cn/install
Pre-training of MP-BERT is not supported using the CPU and fine-tuning of training on large datasets is not recommended. predictions calculated by the CPU are recommended to be installed using conda or pip:
https://www.mindspore.cn/install
Huawei provides an online graphical training platform, ModelArts, for pre-training and fine-tuning of the MP-BERT, as detailed in: https://www.huaweicloud.com/product/modelarts.html?utm_source=3.baidu.com&utm_medium=organic&utm_adplace=kapian
MP-BERT is trained using publicly available unlabelled pure sequence protein sequences, by self-supervised learning in Figure a.
We train and provide several different pre-trained models with different MP-BERT Hidden Layer sizes, different training data and different data compositions.
A fine-tuned framework for classification, regression and sites prediction is currently available, as shown in Figures b and c.
MP-BERT is based on MindSpore ModelZoo's BERT which has been deeply modified to make it more suitable for protein tasks. Visit the ModelZoo page to learn more.
As MP-BERT needs to be trained on a large dataset, we recommend using a trained pre-trained model or contacting us.
In our study, we used 8 * Ascend 910 32GB computing NPUs, 768GB Memory on a Huawei Atlas 800-9000 training server to complete the training.
The data processing and pre-training code is stored under Pretrain_code and the training data is taken from the UniRef dataset.
Current results for the pre-training task of sequence pairs using Pfamily to establish links between sequences, predicted using the ProtENN .
Pre-trained models currently available:
model | url |
---|---|
UniRef50 1024 max | zenodo |
UniRef50 2048 base | zenodo |
See the Pretrain_code section for more information on the use of pre-training
Fine tuning can be achieved on one NPU or GPU card Please load a pre-trained model to achieve fine-tuning according to your needs See Finetune_code section for details
For new information see: MPB-PPI and MPB-PPISP
For new information see: MPB-PPI and MPB-PPISP