Releases: qiuqiao/SOFA
v1.0.3: Improved Inference Speed, Better Model Performance Evaluation Metrics
The v1.0.3 version is compatible with models from v1.0.
Improved Features
- Significantly improved inference speed.
- More robust and realistic evaluation metrics in
evaluate.py
. - Short SPs are now merged into adjacent APs if they are too short.
Bug Fixes
- Fixed many bugs.
v1.0.3:加快推理速度,更好的模型性能评估指标
v1.0.3版本兼容v1.0版本的模型。
改进功能
- 大大提升推理速度
evaluate.py
的评估指标更鲁棒、更符合实际情况- 过短的SP会被合并到相邻的AP
bug修复
- 修复了许多bug
v1.0.2 beta: Multiple input and output formats, saving confidence scores, and model performance evaluation
The v1.0.2 version is compatible with models from v1.0.
New Features
- Saving Confidence Scores: During inference, specify the
--save_confidence
parameter to save the confidence score for each sample. - Model Evaluation: A new
evaluate.py
file has been added for evaluating model performance. See readme.md for usage instructions.
Improved Features
- Increased Support for Output Formats During Inference: Specify the
--out_formats
parameter during inference to freely choose the output format. Refer to the readme for usage instructions. - Customize the Input File Extension for Inference: Specify the
--in_format
parameter during inference to choose the file extension you want to use.
Bug Fixes
- Fixed many bugs.
v1.0.2 beta:多种推理时的输入输出格式,保存置信度,模型性能评估
v1.0.2版本兼容v1.0版本的模型。
新功能
- 保存置信度: 推理时,指定
--save_confidence
参数即可保存每一条样本的置信度。 - 模型性能评估: 新增
evaluate.py
文件,用于评估模型性能,使用方法参考readme。
改进功能
- 增加推理时的输出格式支持: 推理时指定
--out_formats
参数即可自由选择输出格式,使用方法参见readme。 - 自定义推理时输入的文件后缀名: 推理时指定
--in_format
参数,即可指定要使用的文件后缀名。
bug修复
- 修复了许多bug
v1.0.1 beta: Aspiration Detection, Reduced Model Size
Version 1.0.1 is compatible with the models from version 1.0.
New Features
- Aspiration Detection: Introduced the
ap_detector
module, which automatically detects aspiration noises. The module is customizable, similar to theg2p
module. - Reduced Disk Space Usage for Model Weights: After training completion, a model without optimization parameters is saved, significantly reducing the model's size.
Improved Features
- Optimized Exception Prompts: Enhanced the prompts when the program throws exceptions, making it easier for users to troubleshoot errors.
- Post-Processing: Added post-processing steps during inference to ensure that the final annotations are more in line with annotation standards.
Bug Fixes
- Fixed many bugs.
v1.0.1 beta:AP检测,减小模型体积
v1.0.1版本兼容v1.0版本的模型。
新功能
- AP检测: 加入
ap_detector
模块,自动检测呼吸音,模块可自定义,类似于g2p
模块 - 减小模型权重体积: 在训练完成后,保存一个无优化器参数的模型,大大减小了模型的体积
改进功能
- 优化异常提示: 优化了程序抛出异常时的提示,便于用户排查错误
- 后处理: 推理时加入后处理步骤,使得最终的标注更加符合标注规范
bug修复
- 修复了许多bug
v1.0.0 beta: Complete refactor, Reduced Error Rates, Support for Advanced Training Features, Matching Mode
New Features
- Complete Refactor: Refactored nearly all the code to improve maintainability.
- Reduced Error Rate: Optimizations were made to the model structure, loss function, and decoding methods, reducing the error rate of phoneme alignment.
- Advanced Training Features:
-
Automatic Mixed Precision Training: Utilized PyTorch Lightning's built-in mixed precision training.
Simply specify the
accelerator
hyperparameter in thetrain_config.yaml
(default isbf16-mixed
). -
Pretrained Models: Allows fine-tuning with a pretrained model.
After downloading a pretrained model compatible with the current version from the release, ensure that the
model
hyperparameter intrain_config.yaml
matches the pretrained model you wish to use (this is generally provided on the release page). During training, usepython train.py -p path_to_your_pretrained_ckpt
to specify the pretrained model.For more advanced features, refer to the contents of
train_config.yaml
. -
Matching Mode: Allows for the identification of a continuous sequence segment that maximizes probability within a given sequence of phonemes during inference, without the necessity to use all phonemes, similar to LyricFA. To enable during inference, specify
-m
.
-
Removed Features
- Aspiration Detection: Due to the complexity of implementation, breath sound detection was not realized. This feature may be added in the future.
v1.0.0 beta:代码重构、降低错误率、支持高级训练特性、Matching模式
新特性
- 代码重构: 重构了几乎所有代码,以提高可维护性。
- 降低错误率: 对模型结构、loss函数、解码方式等进行了优化,降低了音素对齐的错误率。
- 高级训练特性:
-
自动混合精度训练: 使用了pytorch lightning自带的混合精度训练。
只需在
train_config.yaml
中指定accelerator
超参数(默认为bf16-mixed
)。 -
预训练模型: 允许使用预训练模型进行微调。
在release下载符合当前版本的预训练模型后,确保
train_config.yaml
中的model
超参数与需要使用的预训练模型一致(一般会在release页面给出)。在训练时,使用python train.py -p path_to_your_pretrained_ckpt
指定预训练模型。有关更多高级功能,请参阅
train_config.yaml
中的内容。 -
Matching模式: 允许推理时在给定的音素序列中找到一个使得概率最大的连续序列片段,而非必须用上所有音素,类似于LyricFA。推理时指定
-m
即可开启,
-
移除的特性
- 吸气音检测: 由于实现方法较为复杂,并未实现吸气音检测。这项功能可能会在将来添加。
v0.0.1: Pretrained Model for Mandarin Singing Voice
代码版本:v0.0.1
使用语言:中文普通话(Mandarin)
使用词典:opencpop-extension
适用范围:歌声
发布日期:2023-10-18
训练数据:参见本次release的data_provider.md