Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable hosting (+long-term archiving) of generated models (acoustic models etc) #228

Open
alexis-michaud opened this issue Mar 1, 2020 · 4 comments

Comments

@alexis-michaud
Copy link

This issue and #228 both stem from a related concern: stable storage (and long-term archiving) not just of primary data, but also of 'intermediate' states of the data sets (preprocessed data sets) and of 'computational outputs' such as acoustic models trained on a given data set.

Even if the tool & the data are available online, if training is a matter of days then wouldn't it make good sense to make the models available for download, too? (to the extent that the colleagues who produced them wish to make them available, of course)

Possible benefits:

  • facilitating experiments on transfer learning
  • opening possibilities for smaller-size companies doing Natural Language Processing that need acoustic models but may not find it easy to invest the necessary amount of resources for data acquisition campaigns to create them
@alexis-michaud
Copy link
Author

An interesting example is set by a release of 1,008 machine translation models, covering 140 different languages. (From Hugging Face.)

@oadams
Copy link
Collaborator

oadams commented May 23, 2020

I think this is definitely the way to go. As part of the Elpis-ESPnet integration it'll be good to prepare a multilingual model that can be fine-tuned to target languages. Making such a model pip-installable, or easy to get by other means would be useful for the reasons you mentioned.

@alexis-michaud
Copy link
Author

alexis-michaud commented May 24, 2020

Great. Laurent Besacier (@besacier) will look into Gitlab possibilities at 'his' place, and we will also look with @sguillaume at Gitlab possibilities at Huma-Num.

@alexis-michaud
Copy link
Author

@benfoley notes that ESPnet uses Zenodo: see https://colab.research.google.com/drive/1gnSuuFMNHvg1Tfli0bhhOMyfgQKkU3bu

see a list of deposits here

An example, fresh from this month (October 2020):
https://doi.org/10.5281/zenodo.4062451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants