Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information about other langID models #107

Merged
merged 1 commit into from
Jul 25, 2023
Merged

Add information about other langID models #107

merged 1 commit into from
Jul 25, 2023

Conversation

Uinelj
Copy link
Member

@Uinelj Uinelj commented Jul 13, 2023

see #106

@Uinelj
Copy link
Member Author

Uinelj commented Jul 13, 2023

CI is failing because of CC 503 errors :(
We should maybe embed simpler CC shards.

@chris-ha458
Copy link

This is almost exactly what I expected. (I forgot all other details)

CI is failing because of CC 503 errors :(
We should maybe embed simpler CC shards.

This means that the build test is pulling CC shards that are unavailable due to HTTP 503 errors(likely from overload) right?
It is a bit unfortunate that issues like this is happening for a "benign documentation change" but I guess this is setup this way for a reason.
Is there anything I could do to help?

@Uinelj
Copy link
Member Author

Uinelj commented Jul 25, 2023

Downloading shards at CI run might be a smell, since you're dependent on the network for your CI to run. My goal was to integrate real-world data to be sure nothing breaks :)

We could split testing into two different parts: one with tiny custom shards (that we can build by filtering already existing ones, only keeping wikipedia content to avoid copyright issues), and one that can be launched locally with shards from the internet.

I'm interested in keeping the "real world shards" because a lot of noisy/bad data is in it (such as images encoded as text, stuff like that) and that data can break things so it's important to test on.

@Uinelj Uinelj merged commit 888f6ff into main Jul 25, 2023
@Uinelj Uinelj deleted the fix-#106 branch July 25, 2023 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants