SVAR calculation and dataset split

Thanks for sharing the code and model. I'm trying to reproduce the SVAR results in the paper and find I can make the EER down to 1.57% with a threshold of 0.7896. However, with this threshold, I only got a SVAR around 25% on the conversion result.

My procedure of building the SV system is:

1. Extract the GE2E features for the VCTK dataset using the model provided here: https://github.com/resemble-ai/Resemblyzer
2. For each of the 110 speakers (a newer version of VCTK was adopted, which contains 110 speakers), calculate a mean GE2E feature as the anchor by averaging all the features of that speaker.
3. For all utterances, calculate the cosine similiarity between its GE2E feature and the 110 anchors. 
4. Do a binary search to find the threshold.

And after that, I got a EER of 1.57%, which is lower than the paper's 5.6%, the script I used is as follows:

```python
import tqdm
import glob
import numpy as np
import argparse
from pathlib import Path
from collections import defaultdict


def cosine_similarity(x, y):
    x = x / np.linalg.norm(x, axis=-1, keepdims=True)
    y = y / np.linalg.norm(y, axis=-1, keepdims=True)
    return np.einsum("m d, ... d -> m ...", x, y)


def binary_search(pp, pn, l=0, u=1, ε=1e-4):
    m = (l + u) / 2
    frr = (pp <= m).astype(np.float32).mean()
    far = (pn > m).astype(np.float32).mean()
    if abs(frr - far) < ε:
        return m, frr, far
    if frr > far:
        return binary_search(pp, pn, l, m, ε)
    else:
        return binary_search(pp, pn, m, u, ε)


def read_speaker_from_path(path):
    return Path(path).parts[-2]


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("glob")
    args = parser.parse_args()

    ge2e_paths = glob.glob(args.glob)
    speakers = list(map(read_speaker_from_path, ge2e_paths))
    ge2es = [np.load(path)["arr_0"] for path in tqdm.tqdm(ge2e_paths, "loading ...")]

    ge2es_by_speaker = defaultdict(list)
    for speaker, ge2e in zip(speakers, ge2es):
        ge2es_by_speaker[speaker].append(ge2e)

    # use the mean ge2e of the speaker as the anchor
    anchor_by_speaker = {
        speaker: np.mean(ge2es, axis=0) for speaker, ge2es in ge2es_by_speaker.items()
    }

    # positive to positive sim
    pps = []

    # positive to negative sim
    pns = []

    for speaker, ge2es in tqdm.tqdm(ge2es_by_speaker.items()):
        for anchor_speaker, anchor_ge2e in anchor_by_speaker.items():
            anchor_ge2e = anchor_ge2e[None]
            if speaker == anchor_speaker:
                # positive anchor
                pps.append(cosine_similarity(ge2es, anchor_ge2e))
            else:
                # negative anchor
                pns.append(cosine_similarity(ge2es, anchor_ge2e))

    pp = np.concatenate(pps)
    pn = np.concatenate(pns)

    print("pp, pn shapes:")
    print(pp.shape, pn.shape)
    print("pp, pn means:")
    print(pp.mean(), pn.mean())

    print("pp quntiles: 5%, 10%, 25%, 50%, 75%, 90%, 95%")
    print(np.quantile(pp, [0.05, 0.10, 0.25, 0.5, 0.75, 0.9, 0.95]))

    print("pn quntiles: 5%, 10%, 25%, 50%, 75%, 90%, 95%")
    print(np.quantile(pn, [0.05, 0.10, 0.25, 0.5, 0.75, 0.9, 0.95]))

    thres, frr, far = binary_search(pp, pn)

    print("thres:", thres, "ffr:", frr, "far:", far)
    print("EER(%)", frr * 100)


if __name__ == "__main__":
    main()
```

Do you mind to share how do you build the SV system or the threshold adopted in the paper? It will be helpful.

The other question is that how do you split the training/testing data for the seen-to-seen. In my case, I only sampled 1000 utterances from the dataset randomly, it may by convered in the training. It will be good to get the same test set. Many thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SVAR calculation and dataset split #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SVAR calculation and dataset split #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions