Product Search

Reproduction - Dense Retrieval Methods

Note

The original code has been refactored to be more concise and clean. As a result, the product search results could be slightly different from the numbers in our paper.

(Optional, only if you'd like to reproduce our results on ESCI)

Download the processed data from Google Drive;
Unzip and put sampled_item_metadata_esci.jsonl and test.csv under AmazonReviews2023/product_search_results/cache/esci/;

First generate dense query/item representations and cache them

python generate_emb.py --dataset McAuley-Lab/Amazon-C4 --plm_name hyp1231/blair-roberta-base --feat_name blair-base

Then evaluate the product search performance

python eval_search.py --dataset McAuley-Lab/Amazon-C4 --suffix blair-baseCLS --domain

Arguments

--dataset
- McAuley-Lab/Amazon-C4
- esci
--plm_name
- roberta-base
- roberta-large
- princeton-nlp/sup-simcse-roberta-base
- princeton-nlp/sup-simcse-roberta-large
- hyp1231/blair-roberta-base
- hyp1231/blair-roberta-large

Note

Please update --feat_name and --suffix accordingly.

Baseline - BM25

(Optional, only if you'd like to reproduce our results on ESCI)

Download the processed data from Google Drive;
Unzip and put sampled_item_metadata_esci.jsonl and test.csv under AmazonReviews2023/product_search_results/cache/esci/;

python bm25.py --dataset McAuley-Lab/Amazon-C4

Arguments

--dataset
- McAuley-Lab/Amazon-C4
- esci

Data Preprocessing - ESCI

python dataset/process_esci.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Product Search

Reproduction - Dense Retrieval Methods

Baseline - BM25

Data Preprocessing - ESCI

Files

README.md

Latest commit

History

README.md

File metadata and controls

Product Search

Reproduction - Dense Retrieval Methods

Baseline - BM25

Data Preprocessing - ESCI