Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial German support #69

Merged
merged 56 commits into from
Jan 22, 2024
Merged

Initial German support #69

merged 56 commits into from
Jan 22, 2024

Conversation

ischender
Copy link
Contributor

@ischender ischender commented Dec 6, 2023

Adding preliminary support for German (and potentially other Western European languages).
Supported: reference_based, source_based, reference_free.
Missing only the factual_consistency notebooks, since I will need to find good data for that.

@kennysong
Copy link
Contributor

Awesome work @ischender! When this is ready for review later, let me know and I'll assign a reviewer.

@ischender
Copy link
Contributor Author

@yosukehigashi I am supposed to assign this to you, but I don't seem to be able to...

@kennysong
Copy link
Contributor

Assigned to Yosuke! (Sorry, I assumed Alex would be able to set the reviewer)

@yosukehigashi
Copy link
Contributor

Thanks for the massive contribution @ischender! I'll review this ASAP

Copy link
Contributor

@yosukehigashi yosukehigashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!! Just a few minor comments, but overall looks great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be great if we could follow the format in the English LangCheck Quickstart notebook for this! I.e.

  • Cell 1: !pip install langcheck
  • Cell 2: Define a list of generated outputs (in German), and call fluency
  • Cell 3: Show the results of running a comparison (fluency > 0.5)
  • (and so on)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, that actually made me realize that the Parrot model did not work as well as I previously thought, so now it's wrapped in a translation layer that gives seemingly good results (I used different translations engines not to pullute the tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting!
Intuitively though, I'd expect something like fluency to be quite language-specific, and it seems hard to accurately assess fluency if you translate it to another language first? (On the other hand, something like factual_consistency feels more language-agnostic).

From a quick browse on huggingface, https://huggingface.co/EIStakovskii/bert-base-german-cased_fluency seems like it might be ok (although it's quite old so likely not state-of-the-art) - what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the tests I ran, the fluency seems to match what I would expect, at least in the case of German, when the language structure is similar enough (the basic grammar of English is of a Germanic language, after all).

I will give this bert one a go too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosukehigashi , I tried a quick and dirty implementation using "EIStakovskii/bert-base-german-cased_fluency" in a version of langcheck.metrics.de._fluency_local and ran a quick test...
It seems pretty bad.
image
While with the the translation we get what we expect:
image
Please note that the first 2 sentences are pretty much as bad in German as in English...
Meaning:

generated_outputs = [
    'Black cat the',
    'The black cat is.',
    'The black cat is sitting',
    'The big black cat is sitting on the fence',
    'Usually, the big black cat is sitting on the old wooden fence.'
]

As you can see, they get mostly the same value:
image

(interestingly, the translation was done with different engines)

Parrot, that has SOME multilingual properties since it's based on T5 (not mT5 though) does worse than translation (the reason I switched to translations) but better than the bert-based model

image

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this analysis! As we said on Discord, let's go with this approach for now since it's performing the best.
Could you create an issue (something like "Consider computing German fluency without translating to English") and add a link to this comment thread, so that this doesn't get buried?

@ischender
Copy link
Contributor Author

Thanks for the great work!! Just a few minor comments, but overall looks great!

thank you, I will be on the suggestions as of next week

@yosukehigashi
Copy link
Contributor

@yosukehigashi I should have implemented all the suggestions/changes, unless something managed to slip by (sorry if to). Can you have a look when you can?

Thanks for the updates! I'll take a look later today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting!
Intuitively though, I'd expect something like fluency to be quite language-specific, and it seems hard to accurately assess fluency if you translate it to another language first? (On the other hand, something like factual_consistency feels more language-agnostic).

From a quick browse on huggingface, https://huggingface.co/EIStakovskii/bert-base-german-cased_fluency seems like it might be ok (although it's quite old so likely not state-of-the-art) - what do you think?

Copy link
Contributor

@yosukehigashi yosukehigashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions about the Translate class, but otherwise LGTM!

Copy link
Contributor

@yosukehigashi yosukehigashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work on this!!🚀 Looks good to merge once the last two comments are resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this analysis! As we said on Discord, let's go with this approach for now since it's performing the best.
Could you create an issue (something like "Consider computing German fluency without translating to English") and add a link to this comment thread, so that this doesn't get buried?

@yosukehigashi
Copy link
Contributor

Looks like all comments have been resolved so I'll merge this now. Thanks @ischender!🇩🇪

@yosukehigashi yosukehigashi merged commit 6eba3bb into citadel-ai:main Jan 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants