Skip to content

v1.0.0 beta: Complete refactor, Reduced Error Rates, Support for Advanced Training Features, Matching Mode

Compare
Choose a tag to compare
@qiuqiao qiuqiao released this 09 Dec 16:11
· 105 commits to main since this release

New Features

  • Complete Refactor: Refactored nearly all the code to improve maintainability.
  • Reduced Error Rate: Optimizations were made to the model structure, loss function, and decoding methods, reducing the error rate of phoneme alignment.
  • Advanced Training Features:
    • Automatic Mixed Precision Training: Utilized PyTorch Lightning's built-in mixed precision training.

      Simply specify the accelerator hyperparameter in the train_config.yaml (default is bf16-mixed).

    • Pretrained Models: Allows fine-tuning with a pretrained model.

      After downloading a pretrained model compatible with the current version from the release, ensure that the model hyperparameter in train_config.yaml matches the pretrained model you wish to use (this is generally provided on the release page). During training, use python train.py -p path_to_your_pretrained_ckpt to specify the pretrained model.

      For more advanced features, refer to the contents of train_config.yaml.

    • Matching Mode: Allows for the identification of a continuous sequence segment that maximizes probability within a given sequence of phonemes during inference, without the necessity to use all phonemes, similar to LyricFA. To enable during inference, specify -m.

Removed Features

  • Aspiration Detection: Due to the complexity of implementation, breath sound detection was not realized. This feature may be added in the future.

v1.0.0 beta:代码重构、降低错误率、支持高级训练特性、Matching模式

新特性

  • 代码重构: 重构了几乎所有代码,以提高可维护性。
  • 降低错误率: 对模型结构、loss函数、解码方式等进行了优化,降低了音素对齐的错误率。
  • 高级训练特性:
    • 自动混合精度训练: 使用了pytorch lightning自带的混合精度训练。

      只需在train_config.yaml中指定accelerator超参数(默认为bf16-mixed)。

    • 预训练模型: 允许使用预训练模型进行微调。

      在release下载符合当前版本的预训练模型后,确保train_config.yaml中的model超参数与需要使用的预训练模型一致(一般会在release页面给出)。在训练时,使用python train.py -p path_to_your_pretrained_ckpt指定预训练模型。

      有关更多高级功能,请参阅train_config.yaml中的内容。

    • Matching模式: 允许推理时在给定的音素序列中找到一个使得概率最大的连续序列片段,而非必须用上所有音素,类似于LyricFA。推理时指定-m即可开启,

移除的特性

  • 吸气音检测: 由于实现方法较为复杂,并未实现吸气音检测。这项功能可能会在将来添加。