Merge branch 'main' of https://github.com/aimagelab/HWD

aimagelab · Feb 4, 2025 · 3e248d1 · 3e248d1
2 parents 1a4a3fd + 89bfa63
commit 3e248d1
Showing 1 changed file with 16 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,5 @@
 # HWD: A Novel Evaluation Score for Styled Handwritten Text Generation
 
-
 This repository contains the reference code and dataset for the paper [HWD: A Novel Evaluation Score for Styled Handwritten Text Generation](https://papers.bmvc2023.org/0007.pdf).
 If you find it useful, please cite it as:
 ```
@@ -24,7 +23,7 @@ pip install .
 Detailed instructions for generating styled handwritten text will be added in a future update.
 
 # Evaluation
-This section describes how to evaluate the quality of styled handwritten text generation using various metrics.
+This section describes how to evaluate the quality of styled handwritten text generation using various scores.
 
 ## Dataset
 Organize your data in the following folder structure:
@@ -73,14 +72,18 @@ Some evaluation metrics depend on whether the dataset is folded or unfolded. The
 
 For an image of height $h$ and width $w$, the unfold operation splits the image into $n=⌊w/h⌋$ square images of size $h \times h$.
 
+![unfold_5](https://github.com/user-attachments/assets/f49da3d9-692c-45cd-be86-c05928410a20)
+
 ```python
 fakes = fakes.unfold()
 reals = reals.unfold()
 ```
 For FID and KID, images are cropped by default, as described in the paper. If you wish to evaluate using the entire line instead of cropping, you can unfold the dataset using the above method.
 
 ## HWD (Handwriting Distance)
-The primary evaluation metric introduced in the paper. It compares two datasets (reference and generated) by resizing images to a height of 32 pixels and using an Euclidean distance metric.
+The HWD is the primary evaluation score introduced in the paper. It compares two datasets (reference and generated) by resizing images to a height of 32 pixels and using the Euclidean distance between their features.
+
+![HWD](https://github.com/user-attachments/assets/c64152c6-3414-4cb1-b4ab-a31202fe8fb4)
 
 ```python
 from hwd.scores import HWDScore
@@ -90,8 +93,10 @@ score = hwd(fakes, reals)
 print(f"HWD Score: {score}")
 ```
 
-## FID (Frechet Inception Distance)
-The FID compares the distributions of two datasets in the feature space of a pretrained model. By default, images are cropped before evaluation.
+## FID (Fréchet Inception Distance)
+The FID compares the distributions of two datasets in the feature space of an InceptionNet pretrained on ImageNet. By default, images are cropped before evaluation.
+
+![FID](https://github.com/user-attachments/assets/bd4e4538-0508-4f52-835d-4371c5e968ac)
 
 ```python
 from hwd.scores import FIDScore
@@ -102,7 +107,7 @@ print(f"FID Score: {score}")
 ```
 
 ## BFID (Binarized FID)
-A variant of FID that operates on binarized images. The binarized scores are computed by applying Otsu's thresholding before evaluation.
+The BFID is a variant of the FID that operates on binarized images. This score is computed by applying Otsu's thresholding before evaluation.
 
 ```python
 from hwd.scores import BFIDScore
@@ -113,7 +118,7 @@ print(f"BFID Score: {score}")
 ```
 
 ## KID (Kernel Inception Distance)
-The KID measures differences using maximum mean discrepancy (MMD). By default, images are cropped before evaluation.
+The KID measures differences between sets of images by using the maximum mean discrepancy (MMD). By default, images are cropped before evaluation.
 
 ```python
 from hwd.scores import KIDScore
@@ -124,7 +129,7 @@ print(f"KID Score: {score}")
 ```
 
 ## BKID (Binarized KID)
-The binarized version of KID. The binarized scores are computed by applying Otsu's thresholding before evaluation.
+The BKID is a variant of the KID that operates on binarized images. This score is computed by applying Otsu's thresholding before evaluation.
 
 ```python
 from hwd.scores import BKIDScore
@@ -135,7 +140,7 @@ print(f"BKID Score: {score}")
 ```
 
 ## CER (Character Error Rate)
-The CER evaluates the character-level accuracy of generated handwriting by comparing the predicted text against ground-truth transcriptions. By default, the model `Microsoft/trocar-base-handwritten` is used.
+The CER evaluates the character-level accuracy of generated handwritten text images by comparing their contained text against the ground-truth transcriptions. By default, the model `Microsoft/trocar-base-handwritten` is used.
 
 ```python
 from hwd.scores import CERScore
@@ -152,7 +157,7 @@ print(f"CER Score: {score}")
 ```
 
 ### LPIPS (Learned Perceptual Image Patch Similarity)
-Measures perceptual differences between images using feature activations from a deep network. The LPIPS metric uses a custom implementation with the same backbone as HWD.
+The LPIPS measures perceptual differences between images by using feature activations from a deep network. The LPIPS score in this repo uses a custom implementation with the same backbone as HWD.
 
 ```python
 from hwd.scores import LPIPSScore
@@ -163,7 +168,7 @@ print(f"LPIPS Score: {score}")
 ```
 
 ## I-LPIPS (Intra-LPIPS)
-Evaluates intra-image consistency by comparing style coherence within the sample. This is also a custom implementation using the same backbone as HWD.
+The I-LPIPS evaluates the intra-image consistency by comparing style coherence between crops within the sample. This is also a custom implementation using the same backbone as HWD.
 
 ```python
 from hwd.scores import IntraLPIPSScore