Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial German support #69

Merged
merged 56 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f5758a6
first basic implementation of reference free - DE
ischender Dec 5, 2023
5f7efac
fizing tokenizers and adding a rougeL example
ischender Dec 5, 2023
34a6911
adding reference based German tests
ischender Dec 5, 2023
d2aa437
fixing tests, and fixing import bugs in DE
ischender Dec 5, 2023
20f02d3
Merge branch 'main' into de-support
ischender Dec 6, 2023
7fdbb58
adding notes re: case sensitivity
ischender Dec 6, 2023
e61da5b
removing autoformatting
ischender Dec 6, 2023
a0824e4
Merge branch 'citadel-ai:main' into de-support
ischender Dec 19, 2023
07225f8
adapting source based reference from ZH
ischender Dec 19, 2023
f11bd02
fixing error in typing in python 3.8
ischender Dec 20, 2023
2b5cb31
fixing formatting issues
ischender Dec 20, 2023
d3634cd
adding source based tests + context_relevance
ischender Dec 20, 2023
5bb01b2
first implementation reference free, no tests
ischender Dec 20, 2023
a30d2b6
small fixes
ischender Dec 20, 2023
abfc265
style fixes
ischender Dec 20, 2023
0a4d6db
first round of tests for reference free
ischender Dec 26, 2023
08cdb4c
finalizing tests for reference free
ischender Dec 26, 2023
1cfe04b
Merge branch 'citadel-ai:main' into de-support
ischender Dec 27, 2023
4cb5628
Merge branch 'main' into de-support
ischender Jan 4, 2024
32baf5b
moving and unifying Detoxify to metrics from /de and /en
ischender Jan 9, 2024
f0795fd
fixing typos and commented out code from the PR
ischender Jan 9, 2024
1cbe428
Update src/langcheck/metrics/de/reference_based_text_quality.py
ischender Jan 9, 2024
306bbea
Update src/langcheck/metrics/de/reference_based_text_quality.py
ischender Jan 9, 2024
de3af68
Update src/langcheck/metrics/de/source_based_text_quality.py
ischender Jan 9, 2024
650d0ca
Update src/langcheck/metrics/de/source_based_text_quality.py
ischender Jan 9, 2024
6f5995a
adding translation to fluency check
ischender Jan 10, 2024
35f08f7
adding test_tokenizers
ischender Jan 10, 2024
ef8b712
moving German model name for semantic similarity inside semantic_simi…
ischender Jan 10, 2024
63d1890
Merge branch 'de-support' of github.com:ischender/langcheck into de-s…
ischender Jan 10, 2024
ee79354
removing tokenizer parameters
ischender Jan 10, 2024
a12ca6e
fixing formatting
ischender Jan 10, 2024
71f2b62
fixing German sentence in `ai_disclaimer_similarity`
ischender Jan 10, 2024
117509a
fixing types
ischender Jan 10, 2024
47b4ee0
updated table for metrics
ischender Jan 10, 2024
8b52857
fixing types problems
ischender Jan 10, 2024
97468b3
fix notebook name
yosukehigashi Jan 11, 2024
14b549c
use de fluency
yosukehigashi Jan 11, 2024
458ec4d
fix docstring
yosukehigashi Jan 11, 2024
bdf638c
Update docs/metrics.md
ischender Jan 11, 2024
e154723
German documentation + screenshots
ischender Jan 11, 2024
285752a
adding German (ドイツ語???) to the Japanese documentation
ischender Jan 11, 2024
b8a1b8e
Merge branch 'de-support' of github.com:ischender/langcheck into de-s…
ischender Jan 11, 2024
a419d2b
adding a translation function to wrap up longer texts
ischender Jan 17, 2024
09adbc9
translating factual consistency data
ischender Jan 17, 2024
443ca7b
fixing translation wrapper
ischender Jan 17, 2024
0dca8ba
fixing translation wrapper
ischender Jan 17, 2024
4eb0676
moving translation wrapper to a file, adding tests
ischender Jan 17, 2024
bf5cd21
removing debugging print statements
ischender Jan 17, 2024
7a3fbff
format json documents
yosukehigashi Jan 18, 2024
d2a7acf
changes from PR comments
ischender Jan 18, 2024
cacf3bf
Update src/langcheck/metrics/de/_translation.py
ischender Jan 18, 2024
2e79a93
fixing flake8 formatting issue
ischender Jan 18, 2024
8ea3a4f
Merge branch 'de-support' of https://github.com/ischender/langcheck i…
yosukehigashi Jan 18, 2024
2c66b5a
small corrections to make pyright happy
ischender Jan 18, 2024
82b947f
changing block size as per PR suggestion
ischender Jan 18, 2024
e21a2d7
updating comment to describe min block size
ischender Jan 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions README_de.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
<div align="center">

<img src="docs/_static/LangCheck-Logo-square.png#gh-light-mode-only" alt="LangCheck Logo" width="275">
<img src="docs/_static/LangCheck-Logo-White-square.png#gh-dark-mode-only" alt="LangCheck Logo" width="275">

[![](https://dcbadge.vercel.app/api/server/Bkndx9RXqw?compact=true&style=flat)](https://discord.gg/Bkndx9RXqw)
[![Pytest Tests](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml/badge.svg)](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml)
[![Downloads](https://static.pepy.tech/badge/langcheck)](https://pepy.tech/project/langcheck)
![GitHub](https://img.shields.io/github/license/citadel-ai/langcheck)

Einfache, pythonische Bausteine zur Bewertung von LLM-Anwendungen.

[Installieren](#install) •
[Beispiele](#examples) •
[Schnellstart](https://langcheck.readthedocs.io/en/latest/quickstart.html) •
[Docs](https://langcheck.readthedocs.io/en/latest/index.html) •
[English](README.md) •
[日本語](README_ja.md)

</div>

## Installieren

```shell
pip install langcheck
```

## Beispiele

### Text bewerten

Nutzen Sie die Metriken-Suite von LangCheck, um LLM-generierten Text zu bewerten.

```python
import langcheck

# Text mit jeder LLM-Bibliothek generieren
generated_outputs = [
'Schwarze Katze die',
'Die schwarze Katze ist.',
'Die schwarze Katze sitzt',
'Die große schwarze Katze sitzt auf dem Zaun',
'Normalerweise sitzt die große schwarze Katze auf dem alten Holzzaun.'
]

# Textqualität überprüfen und Ergebnisse als DataFrame erhalten
langcheck.metrics.de.fluency(generated_outputs)
```

![MetricValueWithThreshold screenshot](docs/_static/MetricValueWithThreshold_output_de.png)

Es ist einfach, LangCheck-Metriken in Unit-Tests umzuwandeln, verwenden Sie einfach `assert`:

```python
assert langcheck.metrics.de.fluency(generated_outputs) > 0.5
```

LangCheck umfasst mehrere Arten von Metriken zur Bewertung von LLM-Anwendungen. Einige Beispiele:

| Art der Metrik | Beispiele | Sprachen |
| ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | ------------- |
| [Reference-Free Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-free-text-quality-metrics) | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)`<br>`ai_disclaimer_similarity(generated_outputs)` | EN, JA, DE |
| [Reference-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)` | EN, JA, DE |
| [Source-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#source-based-text-quality-metrics) | `factual_consistency(generated_outputs, sources)` | EN, JA, DE |
| [Text Structure Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#text-structure-metrics) | `is_float(generated_outputs, min=0, max=None)`<br>`is_json_object(generated_outputs)` | All Languages |

### Metriken visualisieren.

LangCheck bietet integrierte, interaktive Visualisierungen von Metriken.

```python
# Einige Metriken auswählen
fluency_values = langcheck.metrics.de.fluency(generated_outputs)
sentiment_values = langcheck.metrics.de.sentiment(generated_outputs)

# Interaktives Streudiagramm einer Metrik
fluency_values.scatter()
```

![Scatter plot for one metric](docs/_static/scatter_one_metric_de.gif)

```python
# Interaktives Streudiagramm von zwei Metriken
langcheck.plot.scatter(fluency_values, sentiment_values)
```

![Scatter plot for two metrics](docs/_static/scatter_two_metrics_de.png)

```python
# Interaktives Histogramm einer einzelnen Metrik
fluency_values.histogram()
```

![Histogram for one metric](docs/_static/histogram_de.png)

### Daten erweitern

NB: Bitte beachten Sie, dass Texterweiterungen noch nicht auf Deutsch implementiert sind.

Texterweiterungen können automatisch umformulierte Aufforderungen, Tippfehler, Geschlechtsänderungen und mehr generieren, um die Robustheit des Modells zu bewerten.

Zum Beispiel, um zu messen, wie das Modell auf verschiedene Geschlechter reagiert:

```python
male_prompts = langcheck.augment.gender(prompts, to_gender='male')
female_prompts = langcheck.augment.gender(prompts, to_gender='female')

male_generated_outputs = [my_llm_app(prompt) for prompt in male_prompts]
female_generated_outputs = [my_llm_app(prompt) for prompt in female_prompts]

langcheck.metrics.sentiment(male_generated_outputs)
langcheck.metrics.sentiment(female_generated_outputs)
```

### Unit Testing

Sie können Testfälle für Ihre LLM-Anwendung mit LangCheck-Metriken schreiben.

Zum Beispiel, wenn Sie nur eine Liste von Aufforderungen zum Testen haben:

```python
from langcheck.utils import load_json

# Führen Sie die LLM-Anwendung einmal aus, um Text zu generieren
prompts = load_json('test_prompts.json')
generated_outputs = [my_llm_app(prompt) for prompt in prompts]

# Unit tests
def test_toxicity(generated_outputs):
assert langcheck.metrics.toxicity(generated_outputs) < 0.1

def test_fluency(generated_outputs):
assert langcheck.metrics.fluency(generated_outputs) > 0.9

def test_json_structure(generated_outputs):
assert langcheck.metrics.validation_fn(
generated_outputs, lambda x: 'myKey' in json.loads(x)).all()
```

### Monitoring

Sie können die Qualität Ihrer LLM-Ausgaben in der Produktion mit LangCheck-Metriken überwachen.

Speichern Sie einfach die Ausgaben und geben Sie sie in LangCheck ein.

```python
production_outputs = load_json('llm_logs_2023_10_02.json')['outputs']

# Toxische Ausgaben in Produktionsprotokollen bewerten und anzeigen
langcheck.metrics.toxicity(production_outputs) > 0.75

# Oder wenn Ihre App strukturierten Text ausgibt
langcheck.metrics.is_json_array(production_outputs)
```

### Guardrails

Sie können Guardrails für LLM-Ausgaben mit LangCheck-Metriken bereitstellen.

Filtern Sie einfach Kandidatenausgaben durch LangCheck.

```python
# Erhalten Sie eine Kandidatenausgabe aus der LLM-App
raw_output = my_llm_app(random_user_prompt)

# Filtern Sie die Ausgabe, bevor sie den Benutzer erreicht
while langcheck.metrics.contains_any_strings(raw_output, blacklist_words).any():
raw_output = my_llm_app(random_user_prompt)
```
6 changes: 3 additions & 3 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ LangCheckには、他にも以下のようなLLMアプリケーションを評

| 種類 | 主な指標 | 言語 |
| ------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------- | ------------ |
| [Reference-Free Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-free-text-quality-metrics) | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)` | 英語、日本語 |
| [Reference-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)` | 英語、日本語 |
| [Source-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#source-based-text-quality-metrics) | `factual_consistency(generated_outputs, sources)` | 英語、日本語 |
| [Reference-Free Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-free-text-quality-metrics) | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)` | 英語、日本語、ドイツ語 |
| [Reference-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)` | 英語、日本語、ドイツ語 |
| [Source-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#source-based-text-quality-metrics) | `factual_consistency(generated_outputs, sources)` | 英語、日本語、ドイツ語 |
| [Text Structure Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#text-structure-metrics) | `is_float(generated_outputs, min=0, max=None)`<br>`is_json_object(generated_outputs)` | 全ての言語 |

### 数値の可視化
Expand Down
1 change: 1 addition & 0 deletions benchmarking/data/qags_cnndm-de.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions benchmarking/data/qags_xsum-de.json

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/histogram_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/scatter_one_metric_de.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/scatter_two_metrics_de.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 11 additions & 3 deletions docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,23 @@ from langcheck.metrics.ja import fluency # Japanese fluency metric
from langcheck.metrics import is_json_array # Language-agnostic metric
```

The same is true for German text:

```python
from langcheck.metrics.de import fluency # German fluency metric
from langcheck.metrics import is_json_array # Language-agnostic metric
```


## Metric Types

LangCheck metrics are categorized by metric type, which correspond to the kind of ground truth data that's required.

| Type of Metric | Examples | Languages |
| ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | ------------- |
| [Reference-Free Text Quality Metrics](#reference-free-text-quality-metrics) | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)`<br>`ai_disclaimer_similarity(generated_outputs)` | EN, JA |
| [Reference-Based Text Quality Metrics](#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)` | EN, JA |
| [Source-Based Text Quality Metrics](#source-based-text-quality-metrics) | `factual_consistency(generated_outputs, sources)` | EN, JA |
| [Reference-Free Text Quality Metrics](#reference-free-text-quality-metrics) | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)`<br>`ai_disclaimer_similarity(generated_outputs)` | EN, JA, DE |
| [Reference-Based Text Quality Metrics](#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)` | EN, JA, DE |
| [Source-Based Text Quality Metrics](#source-based-text-quality-metrics) | `factual_consistency(generated_outputs, sources)` | EN, JA, DE |
| [Text Structure Metrics](#text-structure-metrics) | `is_float(generated_outputs, min=0, max=None)`<br>`is_json_object(generated_outputs)` | All Languages |

(reference-free-text-quality-metrics)=
Expand Down
Loading
Loading