GitHub - Spartan-71/Amazon-ML-Challenge-2024: A machine learning solution for extracting key entity values (weight, volume, dimensions) from product images.

Problem Statement:

Feature Extraction from Images

In this hackathon, the goal is to create a machine learning model that extracts entity values from images. This capability is crucial in fields like healthcare, e-commerce, and content moderation, where precise product information is vital. As digital marketplaces expand, many products lack detailed textual descriptions, making it essential to obtain key details directly from images. These images provide important information such as weight, volume, voltage, wattage, dimensions, and many more, which are critical for digital stores.

Dataset:

The dataset is divided into two main files:

train.csv: Contains over 310,000 image links along with metadata.
test.csv: Contains over 130,000 image links along with metadata.

Each dataset has the following columns:

index: An unique identifier (ID) for the data sample
image_link: Public URL where the product image is available for download. Example link - https://m.media-amazon.com/images/I/71XfHPR36-L.jpg
group_id: Category code of the product
entity_name: Product entity name. For eg: “item_weight”
entity_value: Product entity value. For eg: “34 gram” Note: For test.csv, you will not see the column entity_value as it is the target variable.

Model Pipeline

Results

Our model achieved the following results:

F1 Score: 0.03 (highest)
Rank: 503 out of 2500+ teams

Conclusion

This hackathon, spanning four days, was the longest we’ve participated in so far. We faced challenges, particularly with the computational resources needed to process such a large volume of data. Processing the test.csv file, which involved downloading over 52 GB of images and running them through an OCR model, took us collectively more than 20 hours. Despite these obstacles, it was an invaluable experience to work on a real-world problem faced by Amazon.

We are extremely grateful to Amazon for hosting such an engaging hackathon and look forward to competing again next year!

Contributors

Anish Dabhane – Text Postprocessing & Unit Extraction
Kshitij Aucharmal – Image Processing & OCR Integration
Ajinkya Bogle – Image Preprocessing & Unit Extraction

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Hackathon		Hackathon
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amazon document.docx		amazon document.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement:

Feature Extraction from Images

Dataset:

Model Pipeline

Results

Conclusion

Contributors

About

Releases

Packages

Contributors 2

Languages

License

Spartan-71/Amazon-ML-Challenge-2024

Folders and files

Latest commit

History

Repository files navigation

Problem Statement:

Feature Extraction from Images

Dataset:

Model Pipeline

Results

Conclusion

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages