Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/aimagelab/HWD
Browse files Browse the repository at this point in the history
  • Loading branch information
vittoriopippi committed Feb 4, 2025
2 parents 1a4a3fd + 89bfa63 commit 3e248d1
Showing 1 changed file with 16 additions and 11 deletions.
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# HWD: A Novel Evaluation Score for Styled Handwritten Text Generation


This repository contains the reference code and dataset for the paper [HWD: A Novel Evaluation Score for Styled Handwritten Text Generation](https://papers.bmvc2023.org/0007.pdf).
If you find it useful, please cite it as:
```
Expand All @@ -24,7 +23,7 @@ pip install .
Detailed instructions for generating styled handwritten text will be added in a future update.

# Evaluation
This section describes how to evaluate the quality of styled handwritten text generation using various metrics.
This section describes how to evaluate the quality of styled handwritten text generation using various scores.

## Dataset
Organize your data in the following folder structure:
Expand Down Expand Up @@ -73,14 +72,18 @@ Some evaluation metrics depend on whether the dataset is folded or unfolded. The

For an image of height $h$ and width $w$, the unfold operation splits the image into $n=⌊w/h⌋$ square images of size $h \times h$.

![unfold_5](https://github.com/user-attachments/assets/f49da3d9-692c-45cd-be86-c05928410a20)

```python
fakes = fakes.unfold()
reals = reals.unfold()
```
For FID and KID, images are cropped by default, as described in the paper. If you wish to evaluate using the entire line instead of cropping, you can unfold the dataset using the above method.

## HWD (Handwriting Distance)
The primary evaluation metric introduced in the paper. It compares two datasets (reference and generated) by resizing images to a height of 32 pixels and using an Euclidean distance metric.
The HWD is the primary evaluation score introduced in the paper. It compares two datasets (reference and generated) by resizing images to a height of 32 pixels and using the Euclidean distance between their features.

![HWD](https://github.com/user-attachments/assets/c64152c6-3414-4cb1-b4ab-a31202fe8fb4)

```python
from hwd.scores import HWDScore
Expand All @@ -90,8 +93,10 @@ score = hwd(fakes, reals)
print(f"HWD Score: {score}")
```

## FID (Frechet Inception Distance)
The FID compares the distributions of two datasets in the feature space of a pretrained model. By default, images are cropped before evaluation.
## FID (Fréchet Inception Distance)
The FID compares the distributions of two datasets in the feature space of an InceptionNet pretrained on ImageNet. By default, images are cropped before evaluation.

![FID](https://github.com/user-attachments/assets/bd4e4538-0508-4f52-835d-4371c5e968ac)

```python
from hwd.scores import FIDScore
Expand All @@ -102,7 +107,7 @@ print(f"FID Score: {score}")
```

## BFID (Binarized FID)
A variant of FID that operates on binarized images. The binarized scores are computed by applying Otsu's thresholding before evaluation.
The BFID is a variant of the FID that operates on binarized images. This score is computed by applying Otsu's thresholding before evaluation.

```python
from hwd.scores import BFIDScore
Expand All @@ -113,7 +118,7 @@ print(f"BFID Score: {score}")
```

## KID (Kernel Inception Distance)
The KID measures differences using maximum mean discrepancy (MMD). By default, images are cropped before evaluation.
The KID measures differences between sets of images by using the maximum mean discrepancy (MMD). By default, images are cropped before evaluation.

```python
from hwd.scores import KIDScore
Expand All @@ -124,7 +129,7 @@ print(f"KID Score: {score}")
```

## BKID (Binarized KID)
The binarized version of KID. The binarized scores are computed by applying Otsu's thresholding before evaluation.
The BKID is a variant of the KID that operates on binarized images. This score is computed by applying Otsu's thresholding before evaluation.

```python
from hwd.scores import BKIDScore
Expand All @@ -135,7 +140,7 @@ print(f"BKID Score: {score}")
```

## CER (Character Error Rate)
The CER evaluates the character-level accuracy of generated handwriting by comparing the predicted text against ground-truth transcriptions. By default, the model `Microsoft/trocar-base-handwritten` is used.
The CER evaluates the character-level accuracy of generated handwritten text images by comparing their contained text against the ground-truth transcriptions. By default, the model `Microsoft/trocar-base-handwritten` is used.

```python
from hwd.scores import CERScore
Expand All @@ -152,7 +157,7 @@ print(f"CER Score: {score}")
```

### LPIPS (Learned Perceptual Image Patch Similarity)
Measures perceptual differences between images using feature activations from a deep network. The LPIPS metric uses a custom implementation with the same backbone as HWD.
The LPIPS measures perceptual differences between images by using feature activations from a deep network. The LPIPS score in this repo uses a custom implementation with the same backbone as HWD.

```python
from hwd.scores import LPIPSScore
Expand All @@ -163,7 +168,7 @@ print(f"LPIPS Score: {score}")
```

## I-LPIPS (Intra-LPIPS)
Evaluates intra-image consistency by comparing style coherence within the sample. This is also a custom implementation using the same backbone as HWD.
The I-LPIPS evaluates the intra-image consistency by comparing style coherence between crops within the sample. This is also a custom implementation using the same backbone as HWD.

```python
from hwd.scores import IntraLPIPSScore
Expand Down

0 comments on commit 3e248d1

Please sign in to comment.