NLP Project

This is an open-ended NLP project for CSE517, focusing on evaluating the capability of GPT-4V to recover the text in the scientific figures generated by diffusion model. The main branch is for some random samples from real dataset Paper2Fig100k and the code2fig branch is for flowchart style synthetic dataset. We will first draw the red circles in the pictures, and then call GPT-4V API to guess the text in the red circles with/without context, and compare these results with the vanilla OCR recognition results. The evaluatin metric we choose contain edit distance, rouge score and bert similarity score.

To run the program, the files we still need now contain the original pictures (ori/), pictures after applying diffusion models (reconstructions/), captions(texts/) and references images(references/, can be empty). Before running this code, the directory should be setup like

📂 ori/
- 📊 p0.png
📂 reconstructions/
- 📊 p0_70per.png
📂 references/
- 📂 p0/
  - 📊 0.jpg
  - 📊 1.jpg
📂 texts/
- 📄 p0.txt

0. Environment

need to install mmocr.

conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate open-mmlab
pip3 install openmim
git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
mim install -e .

And the metrics

pip install evaluate bert_score rouge_score

1. Set OpenAI key

export NLP_API_KEY='YOUR_API_KEY'

2. Automatically draw red circles on the original images and reconstruction images

Just run

python draw_red_ellipse_and_recognize.py --random_sample_num 5

random_sample_num here means that we will randomly select 5 red circle to draw, rather than draw all the red circles. NOTE: Since in this file we first get the ocr bbox of the original pictures, then apply them on the pictures after applying diffusion model, so please make sure that the size and configuration of these two pictures are almost matched. (The function will resize the picture but still need to check the configurations manually.)

After running this script, the files architecture will be like:

📂 ori/
- 📊 p0.png
📂 reconstructions/
- 📊 p0_70per.png
📂 references/
- 📂 p0/
  - 📊 0.jpg
  - 📊 1.jpg
📂 texts/
- 📄 p0.txt
📂 ori_red_circles/
- 📂 p0/
  - 📊 0.png
  - 📄 0.txt
  - 📊 1.png
  - 📄 1.txt
  - ...
📂 red_circles/
- 📂 p0_70per/
  - 📊 0.jpg
  - 📊 1.jpg
  - ...

NOTE: You'd better double check the ground truth results in ori_red_circles/pi/j.txt since the text recognization model is not always perform perfectly. NOTE: If you want to ignore some red circles, just delete the items in red_circles and you don't need to delete anything in the ori_red_circles (Although alignment is fine, but it's exhausting).

3. Apply GPT-4V for correcting the content in the red circles

When debug, can run as:

python gpt4v_recover.py --select_pic_strength_name p0_50per p1_50per p0_70per --debug 1

In this format, the program will not call the openai API but use some predefined content to test the program. After making sure that the program works well, you can run

python gpt4v_recover.py --debug 0

Here drop_pic_strength_name denotes the pictures that you don't want to pay attention to, and select_pic_strength_name denotes that you just want to try on these examples.

This script will output the correction content and the stats values (edit distance, GLEU, semantic similarity using Bert), and store in evaluation_results.

4. Recalculate the results

If you run the experiments several times and want to evaluate the average scores among all the results, just run

python final_report.py --evaluate_path evaluation_results

Name	Name	Last commit message	Last commit date
Latest commit ypwang61 update final reuslts Mar 16, 2024 fe753cd · Mar 16, 2024 History 16 Commits
Synthetic Dataset	Synthetic Dataset	Synthetic Data Updated	Mar 16, 2024
evaluate_results	evaluate_results	update final reuslts	Mar 16, 2024
ori	ori	upload real_datasets	Mar 16, 2024
ori_red_circles	ori_red_circles	upload real_datasets	Mar 16, 2024
reconstructions	reconstructions	upload real_datasets	Mar 16, 2024
red_circles	red_circles	upload real_datasets	Mar 16, 2024
references	references	upload real_datasets	Mar 16, 2024
texts	texts	upload real_datasets	Mar 16, 2024
.gitignore	.gitignore	upload real_datasets	Mar 16, 2024
Dataset_Generation.py	Dataset_Generation.py	update final reuslts	Mar 16, 2024
api_utils.py	api_utils.py	workable version update	Mar 16, 2024
draw_red_ellipse_and_recognize.py	draw_red_ellipse_and_recognize.py	workable version update	Mar 16, 2024
final_report.py	final_report.py	update results	Mar 16, 2024
gpt4v_recover.py	gpt4v_recover.py	submit real datasets	Mar 16, 2024
my_evaluate.py	my_evaluate.py	update final reuslts	Mar 16, 2024
readme.md	readme.md	update readme	Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Project

0. Environment

1. Set OpenAI key

2. Automatically draw red circles on the original images and reconstruction images

3. Apply GPT-4V for correcting the content in the red circles

4. Recalculate the results

About

Releases

Packages

Contributors 2

Languages

ypwang61/NLP_project

Folders and files

Latest commit

History

Repository files navigation

NLP Project

0. Environment

1. Set OpenAI key

2. Automatically draw red circles on the original images and reconstruction images

3. Apply GPT-4V for correcting the content in the red circles

4. Recalculate the results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages