Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wildreceipt dataset #1359

Merged
merged 33 commits into from
Oct 27, 2023
Merged
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
7ae18dc
[ADD] wildreceipt init
HamzaGbada Oct 12, 2023
f4c4895
[ADD] wildreceipt init
HamzaGbada Oct 12, 2023
a883ed0
[ADD] wildreceipt _convert_xmin_ymin
HamzaGbada Oct 12, 2023
ddb4d67
[ADD] wildreceipt _convert_xmin_ymin
HamzaGbada Oct 12, 2023
15abe0d
[ADD] wildreceipt _convert_xmin_ymin
HamzaGbada Oct 12, 2023
dcb63cb
[ADD] wildreceipt test
HamzaGbada Oct 14, 2023
87bf015
[ADD] wildreceipt test
HamzaGbada Oct 14, 2023
17c1112
[UPDATE] wildreceipt use_polygon
HamzaGbada Oct 15, 2023
f197337
[UPDATE] wildreceipt img_folder
HamzaGbada Oct 15, 2023
3c7ce8d
[ADD] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 18, 2023
b7d8cb7
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 18, 2023
a1f09b0
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 18, 2023
e3b9bdc
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 18, 2023
8c57b75
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 18, 2023
630437d
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 19, 2023
15804df
[BUG] mock_wildreceipt_dataset in conftest.py
HamzaGbada Oct 24, 2023
275afa5
[FIX] mock_wildreceipt_dataset labels
HamzaGbada Oct 25, 2023
82ed210
[FIX] mock_wildreceipt_dataset labels
HamzaGbada Oct 25, 2023
1e06371
[FIX] mock_wildreceipt_dataset labels
HamzaGbada Oct 25, 2023
a968db4
remove todos
HamzaGbada Oct 25, 2023
e42c71e
remove todos
HamzaGbada Oct 25, 2023
4ec3bf5
[UPDATE] wildreceipt_image_folder
HamzaGbada Oct 26, 2023
ff4b399
[ADD] test_wildreceipt_dataset tf
HamzaGbada Oct 26, 2023
2a7d1e0
[UPDATE] WILDRECEIPT optimize imports
HamzaGbada Oct 26, 2023
bffca24
[FIX] WILDRECEIPT self.data
HamzaGbada Oct 26, 2023
954b8b0
[UPDATE] save fata in RAM
HamzaGbada Oct 26, 2023
edbcaf2
[UPDATE] docs
HamzaGbada Oct 26, 2023
e257a29
[UPDATE] box wildreceipt
HamzaGbada Oct 26, 2023
2b3a578
[UPDATE] docs
HamzaGbada Oct 27, 2023
c18175b
[UPDATE] filter empty and whitespace
HamzaGbada Oct 27, 2023
6c33799
[UPDATE] filter empty and whitespace
HamzaGbada Oct 27, 2023
fcedaba
[FIX] format
HamzaGbada Oct 27, 2023
478a420
[FIX] format
HamzaGbada Oct 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doctr/datasets/wildreceipt.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def __init__(
img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
)
for crop, label in zip(crops, list(text_targets)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there are text inside we need to filter out ?
For example text which contains whitespaces ?

Ref.:

if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):

Copy link
Contributor Author

@HamzaGbada HamzaGbada Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that this dataset contains small text elements that might not be conducive to the recognition task. For instance, we could consider filtering out text elements that are empty or consist of characters such as "-", "*", "/", "=", "#", or "@" to enhance the quality of the recognition process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamzaGbada
Mh in this case i think it would be enough to filter empty elements or if a whitespace is in the label.
We can handle all the above punctuations :)

if not any(char in label for char in ["", "-", "*", "/", "=", "#", "@"]):
if not any(char in label for char in ["", " "]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if label and " " not in label:

self.data.append((crop, label))
else:
self.data.append(
Expand Down