Skip to content

Issues: oscar-project/corpus

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Missing pages in Common Crawl
#22 opened Nov 2, 2022 by hadiasghari
OSCAR 22.XX scope
#21 opened Sep 7, 2022 by Uinelj
2 of 6 tasks
OSCAR 22.XX
Dataset name Issue in model card at Huggingface bug Something isn't working
#17 opened Feb 11, 2022 by ibraheem-moosa
Low size of Swahili Oscar
#16 opened Nov 11, 2021 by hadyelsahar
Vietnamese language: text and meta/warc-target-uri mismatched bug Something isn't working lang:vi Language: Vietnamese ver:21.09 Version: OSCAR 21.09
#15 opened Nov 6, 2021 by Luvata
Scots language corpus is non linguistic? lang:sco Language: Scots quality Quality-related issue ver:21.09 Version: OSCAR 21.09
#14 opened Nov 4, 2021 by Uinelj
Quality warning: Neapolitan lang:nap Language: Neapolitan quality Quality-related issue ver:21.09 Version: OSCAR 21.09 ver:2019 Version: OSCAR 2019
#13 opened Nov 4, 2021 by Uinelj
Quality warning: Somali lang:so Language: Somali quality Quality-related issue ver:21.09 Version: OSCAR 21.09 ver:2019 Version: OSCAR 2019
#12 opened Nov 4, 2021 by Uinelj
Quality warning: Northern Frisian lang:frr Language: Northern Frisian quality Quality-related issue ver:21.09 Version: OSCAR 21.09 ver:2019 Version: OSCAR 2019
#11 opened Nov 4, 2021 by Uinelj
Quality warning: Chavacano lang:cbk Language: Chavacano quality Quality-related issue ver:21.09 Version: OSCAR 21.09 ver:2019 Version: OSCAR 2019
#10 opened Nov 4, 2021 by Uinelj
West Flemish contains only two words lang:vls Language: West Flemish quality Quality-related issue ver:21.09 Version: OSCAR 21.09
#7 opened Nov 2, 2021 by Uinelj
Wu Chinese dataset is of bad quality. lang:wuu Language Wu Chinese quality Quality-related issue ver:21.09 Version: OSCAR 21.09
#5 opened Nov 2, 2021 by Uinelj
3835 records full of backslashes bug Something isn't working lang:en Language: English ver:2019 Version: OSCAR 2019
#4 opened Oct 27, 2021 by stas00
[BUG] Encoding errors in OSCAR 21.09 lang:tr Language: Turkish ver:21.09 Version: OSCAR 21.09
#2 opened Sep 29, 2021 by stefan-it
strange datasets for Yue Chinese corpus lang:yue Language: Yue Chinese ver:2019 Version: OSCAR 2019
#1 opened Jun 17, 2021 by cosmeowpawlitan
Support for Tigrinya lang:tir Language: Tigrinya suggestion Suggestions: New languages, metadata, etc.
#3 opened Jul 27, 2020 by tadeze
ProTip! Mix and match filters to narrow down what you’re looking for.