Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the evaluation metric for training not loss or accuracy, but rather EER? #396

Open
NathanJHLee opened this issue Jan 3, 2025 · 2 comments

Comments

@NathanJHLee
Copy link

Hi, wespeaker team.
I just followed your configure and got loss and acc. But i think it is too low. So i searched this on issues.
I found similar thing.
#165

According to above, most important thing is checking EER, loss and acc are less important during training?
I am wondering loss and acc when it is 150 epochs for resnet34 In your training case.
And please check my training log that it is trained usual or not.
Thank you!!

In my training case as below.
Dataset : Voxcelebs
model : Resnet34

[ INFO : 2024-12-24 21:51:06,518 ] - | 149| 1000| 0.01728| 0.2| 1.4961| 71.647|
[ INFO : 2024-12-24 21:51:06,528 ] - | 149| 1000| 0.01728| 0.2| 1.4996| 71.495|
[ INFO : 2024-12-24 21:51:06,536 ] - | 149| 1000| 0.01728| 0.2| 1.4984| 71.522|
[ INFO : 2024-12-24 21:51:06,541 ] - | 149| 1000| 0.01728| 0.2| 1.5064| 71.708|
[ INFO : 2024-12-24 21:51:06,549 ] - | 149| 1000| 0.01728| 0.2| 1.4848| 71.882|
[ INFO : 2024-12-24 21:51:47,574 ] - | 149| 1066| 0.017247| 0.2| 1.5084| 71.395|
[ INFO : 2024-12-24 21:51:47,630 ] - | 149| 1066| 0.017247| 0.2| 1.4973| 71.537|
[ INFO : 2024-12-24 21:51:47,815 ] - | 149| 1066| 0.017247| 0.2| 1.4936| 71.602|
[ INFO : 2024-12-24 21:51:47,855 ] - | 149| 1066| 0.017247| 0.2| 1.4988| 71.583|
[ INFO : 2024-12-24 21:51:47,857 ] - | 149| 1066| 0.017247| 0.2| 1.4966| 71.579|
[ INFO : 2024-12-24 21:51:47,860 ] - | 149| 1066| 0.017247| 0.2| 1.4868| 71.852|
[ INFO : 2024-12-24 21:51:47,862 ] - | 149| 1066| 0.017247| 0.2| 1.5065| 71.697|
[ INFO : 2024-12-24 21:51:47,884 ] - | 149| 1066| 0.017247| 0.2| 1.4921| 71.735|
[ INFO : 2024-12-24 21:53:27,802 ] - | 150| 100| 0.017198| 0.2| 1.5211| 71.57|
[ INFO : 2024-12-24 21:53:27,803 ] - | 150| 100| 0.017198| 0.2| 1.4884| 71.703|
[ INFO : 2024-12-24 21:53:27,813 ] - | 150| 100| 0.017198| 0.2| 1.4869| 71.93|
[ INFO : 2024-12-24 21:53:27,826 ] - | 150| 100| 0.017198| 0.2| 1.4878| 71.688|
[ INFO : 2024-12-24 21:53:27,834 ] - | 150| 100| 0.017198| 0.2| 1.4782| 72.234|
[ INFO : 2024-12-24 21:53:27,846 ] - | 150| 100| 0.017198| 0.2| 1.4912| 71.219|
[ INFO : 2024-12-24 21:53:27,851 ] - | 150| 100| 0.017198| 0.2| 1.4765| 71.727|

And it is my 'config.yaml'
data_type: raw
dataloader_args:
batch_size: 128
drop_last: true
num_workers: 16
pin_memory: false
prefetch_factor: 8
dataset_args:
aug_prob: 0.6
fbank_args:
dither: 1.0
frame_length: 25
frame_shift: 10
num_mel_bins: 80
filter: true
filter_args:
max_num_frames: 800
min_num_frames: 100
num_frms: 200
resample_rate: 16000
sample_num_per_epoch: 0
shuffle: true
shuffle_args:
shuffle_size: 2500
spec_aug: false
spec_aug_args:
max_f: 8
max_t: 10
num_f_mask: 1
num_t_mask: 1
prob: 0.6
speed_perturb: true
enable_amp: false
exp_dir: RESNET-TSTP-emb256-fbank80-num_frms200-aug0.6-spTrue-saFalse-ArcMargin-SGD-epoch150_20241223
gpus:

  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
    log_batch_interval: 100
    loss: CrossEntropyLoss
    loss_args: {}
    margin_scheduler: MarginScheduler
    margin_update:
    epoch_iter: 1066
    final_margin: 0.2
    fix_start_epoch: 40
    increase_start_epoch: 20
    increase_type: exp
    initial_margin: 0.0
    update_margin: true
    model: ResNet34
    model_args:
    embed_dim: 256
    feat_dim: 80
    pooling_func: TSTP
    two_emb_layer: false
    model_init: null
    noise_data: data/musan/lmdb
    num_avg: 10
    num_epochs: 250
    optimizer: SGD
    optimizer_args:
    lr: 0.1
    momentum: 0.9
    nesterov: true
    weight_decay: 0.0001
    projection_args:
    do_lm: false
    easy_margin: false
    embed_dim: 256
    num_class: 17982
    project_type: arc_margin
    scale: 32.0
    reverb_data: data/rirs/lmdb
    save_epoch_interval: 5
    scheduler: ExponentialDecrease
    scheduler_args:
    epoch_iter: 1066
    final_lr: 5.0e-05
    initial_lr: 0.1
    num_epochs: 250
    scale_ratio: 16.0
    warm_from_zero: true
    warm_up_epoch: 6
    seed: 42
    train_data: data/vox2_dev/raw.list
    train_label: data/vox2_dev/utt2spk
@wsstriving
Copy link
Collaborator

Since this is ArcMargin loss with margins, the current loss behavior is expected. If you switch to the standard softmax criterion, you can achieve significantly higher accuracy more easily.

@NathanJHLee
Copy link
Author

Thank you for your explanation. I have one more question. If I use ArcMargin for projection during training, is there a way to determine the optimal number of epochs during Stage 3? Or is the only option to check the EER by evaluating the saved epoch models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants