-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial German support #69
Conversation
Awesome work @ischender! When this is ready for review later, let me know and I'll assign a reviewer. |
@yosukehigashi I am supposed to assign this to you, but I don't seem to be able to... |
Assigned to Yosuke! (Sorry, I assumed Alex would be able to set the reviewer) |
Thanks for the massive contribution @ischender! I'll review this ASAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!! Just a few minor comments, but overall looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be great if we could follow the format in the English LangCheck Quickstart
notebook for this! I.e.
- Cell 1: !pip install langcheck
- Cell 2: Define a list of generated outputs (in German), and call
fluency
- Cell 3: Show the results of running a comparison (
fluency > 0.5
) - (and so on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, that actually made me realize that the Parrot model did not work as well as I previously thought, so now it's wrapped in a translation layer that gives seemingly good results (I used different translations engines not to pullute the tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting!
Intuitively though, I'd expect something like fluency
to be quite language-specific, and it seems hard to accurately assess fluency if you translate it to another language first? (On the other hand, something like factual_consistency
feels more language-agnostic).
From a quick browse on huggingface, https://huggingface.co/EIStakovskii/bert-base-german-cased_fluency seems like it might be ok (although it's quite old so likely not state-of-the-art) - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the tests I ran, the fluency seems to match what I would expect, at least in the case of German, when the language structure is similar enough (the basic grammar of English is of a Germanic language, after all).
I will give this bert one a go too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosukehigashi , I tried a quick and dirty implementation using "EIStakovskii/bert-base-german-cased_fluency"
in a version of langcheck.metrics.de._fluency_local
and ran a quick test...
It seems pretty bad.
While with the the translation we get what we expect:
Please note that the first 2 sentences are pretty much as bad in German as in English...
Meaning:
generated_outputs = [
'Black cat the',
'The black cat is.',
'The black cat is sitting',
'The big black cat is sitting on the fence',
'Usually, the big black cat is sitting on the old wooden fence.'
]
As you can see, they get mostly the same value:
(interestingly, the translation was done with different engines)
Parrot, that has SOME multilingual properties since it's based on T5 (not mT5 though) does worse than translation (the reason I switched to translations) but better than the bert-based model

Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this analysis! As we said on Discord, let's go with this approach for now since it's performing the best.
Could you create an issue (something like "Consider computing German fluency without translating to English") and add a link to this comment thread, so that this doesn't get buried?
thank you, I will be on the suggestions as of next week |
Co-authored-by: Yosuke Higashi <[email protected]>
Co-authored-by: Yosuke Higashi <[email protected]>
Thanks for the updates! I'll take a look later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting!
Intuitively though, I'd expect something like fluency
to be quite language-specific, and it seems hard to accurately assess fluency if you translate it to another language first? (On the other hand, something like factual_consistency
feels more language-agnostic).
From a quick browse on huggingface, https://huggingface.co/EIStakovskii/bert-base-german-cased_fluency seems like it might be ok (although it's quite old so likely not state-of-the-art) - what do you think?
Co-authored-by: Yosuke Higashi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions about the Translate class, but otherwise LGTM!
Co-authored-by: Yosuke Higashi <[email protected]>
…nto pr/ischender/69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of your work on this!!🚀 Looks good to merge once the last two comments are resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this analysis! As we said on Discord, let's go with this approach for now since it's performing the best.
Could you create an issue (something like "Consider computing German fluency without translating to English") and add a link to this comment thread, so that this doesn't get buried?
Looks like all comments have been resolved so I'll merge this now. Thanks @ischender!🇩🇪 |
Adding preliminary support for German (and potentially other Western European languages).
Supported: reference_based, source_based, reference_free.
Missing only the factual_consistency notebooks, since I will need to find good data for that.