Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVAR calculation and dataset split #19

Open
enhuiz opened this issue Aug 3, 2021 · 1 comment
Open

SVAR calculation and dataset split #19

enhuiz opened this issue Aug 3, 2021 · 1 comment
Assignees

Comments

@enhuiz
Copy link

enhuiz commented Aug 3, 2021

Thanks for sharing the code and model. I'm trying to reproduce the SVAR results in the paper and find I can make the EER down to 1.57% with a threshold of 0.7896. However, with this threshold, I only got a SVAR around 25% on the conversion result.

My procedure of building the SV system is:

  1. Extract the GE2E features for the VCTK dataset using the model provided here: https://github.com/resemble-ai/Resemblyzer
  2. For each of the 110 speakers (a newer version of VCTK was adopted, which contains 110 speakers), calculate a mean GE2E feature as the anchor by averaging all the features of that speaker.
  3. For all utterances, calculate the cosine similiarity between its GE2E feature and the 110 anchors.
  4. Do a binary search to find the threshold.

And after that, I got a EER of 1.57%, which is lower than the paper's 5.6%, the script I used is as follows:

import tqdm
import glob
import numpy as np
import argparse
from pathlib import Path
from collections import defaultdict


def cosine_similarity(x, y):
    x = x / np.linalg.norm(x, axis=-1, keepdims=True)
    y = y / np.linalg.norm(y, axis=-1, keepdims=True)
    return np.einsum("m d, ... d -> m ...", x, y)


def binary_search(pp, pn, l=0, u=1, ε=1e-4):
    m = (l + u) / 2
    frr = (pp <= m).astype(np.float32).mean()
    far = (pn > m).astype(np.float32).mean()
    if abs(frr - far) < ε:
        return m, frr, far
    if frr > far:
        return binary_search(pp, pn, l, m, ε)
    else:
        return binary_search(pp, pn, m, u, ε)


def read_speaker_from_path(path):
    return Path(path).parts[-2]


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("glob")
    args = parser.parse_args()

    ge2e_paths = glob.glob(args.glob)
    speakers = list(map(read_speaker_from_path, ge2e_paths))
    ge2es = [np.load(path)["arr_0"] for path in tqdm.tqdm(ge2e_paths, "loading ...")]

    ge2es_by_speaker = defaultdict(list)
    for speaker, ge2e in zip(speakers, ge2es):
        ge2es_by_speaker[speaker].append(ge2e)

    # use the mean ge2e of the speaker as the anchor
    anchor_by_speaker = {
        speaker: np.mean(ge2es, axis=0) for speaker, ge2es in ge2es_by_speaker.items()
    }

    # positive to positive sim
    pps = []

    # positive to negative sim
    pns = []

    for speaker, ge2es in tqdm.tqdm(ge2es_by_speaker.items()):
        for anchor_speaker, anchor_ge2e in anchor_by_speaker.items():
            anchor_ge2e = anchor_ge2e[None]
            if speaker == anchor_speaker:
                # positive anchor
                pps.append(cosine_similarity(ge2es, anchor_ge2e))
            else:
                # negative anchor
                pns.append(cosine_similarity(ge2es, anchor_ge2e))

    pp = np.concatenate(pps)
    pn = np.concatenate(pns)

    print("pp, pn shapes:")
    print(pp.shape, pn.shape)
    print("pp, pn means:")
    print(pp.mean(), pn.mean())

    print("pp quntiles: 5%, 10%, 25%, 50%, 75%, 90%, 95%")
    print(np.quantile(pp, [0.05, 0.10, 0.25, 0.5, 0.75, 0.9, 0.95]))

    print("pn quntiles: 5%, 10%, 25%, 50%, 75%, 90%, 95%")
    print(np.quantile(pn, [0.05, 0.10, 0.25, 0.5, 0.75, 0.9, 0.95]))

    thres, frr, far = binary_search(pp, pn)

    print("thres:", thres, "ffr:", frr, "far:", far)
    print("EER(%)", frr * 100)


if __name__ == "__main__":
    main()

Do you mind to share how do you build the SV system or the threshold adopted in the paper? It will be helpful.

The other question is that how do you split the training/testing data for the seen-to-seen. In my case, I only sampled 1000 utterances from the dataset randomly, it may by convered in the training. It will be good to get the same test set. Many thanks.

@enhuiz enhuiz changed the title SVAR calculation and Dataset split SVAR calculation and dataset split Aug 3, 2021
@yistLin
Copy link
Owner

yistLin commented Aug 4, 2021

@howard1337 can you describe in detail how you built the ASV system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants