Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data sets #119

Open
wwyl2000 opened this issue Oct 19, 2022 · 2 comments
Open

data sets #119

wwyl2000 opened this issue Oct 19, 2022 · 2 comments

Comments

@wwyl2000
Copy link

Dear Author,
Thanks for sharing your package. In your example to generate the data set, "fire" has 2 parts of data, positive and negative. What is the positive data? Was it pre-recorded? Also, if i have a new word to detect, for example, hakunamatata, how to obtain the datasets?

Thanks,
WWY

@ljj7975
Copy link
Member

ljj7975 commented Nov 5, 2022

positive refers to audios with target keyword (fire).
negative are the audios without target keyword (fire).

training on negative set helps decreasing false positive rate.

Unfortunately, there isn't a good way of generating a dataset for custom wakeword.
If it is made up of common word such as hey, hi, cat. Data generation using Mozilla Dataset should work.

However, generating a dataset for non-standard word such as hakunamatata is not yet supported.

@wwyl2000
Copy link
Author

wwyl2000 commented Nov 5, 2022

Hi ljj7975,
Many thanks for your informaion.
Best,
wwy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants