Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 1.79 KB

File metadata and controls

61 lines (43 loc) · 1.79 KB

Product Search

Reproduction - Dense Retrieval Methods

Note

The original code has been refactored to be more concise and clean. As a result, the product search results could be slightly different from the numbers in our paper.

(Optional, only if you'd like to reproduce our results on ESCI)

  • Download the processed data from Google Drive;
  • Unzip and put sampled_item_metadata_esci.jsonl and test.csv under AmazonReviews2023/product_search_results/cache/esci/;

First generate dense query/item representations and cache them

python generate_emb.py --dataset McAuley-Lab/Amazon-C4 --plm_name hyp1231/blair-roberta-base --feat_name blair-base

Then evaluate the product search performance

python eval_search.py --dataset McAuley-Lab/Amazon-C4 --suffix blair-baseCLS --domain

Arguments

  • --dataset

    • McAuley-Lab/Amazon-C4
    • esci
  • --plm_name

    • roberta-base
    • roberta-large
    • princeton-nlp/sup-simcse-roberta-base
    • princeton-nlp/sup-simcse-roberta-large
    • hyp1231/blair-roberta-base
    • hyp1231/blair-roberta-large

Note

Please update --feat_name and --suffix accordingly.

Baseline - BM25

(Optional, only if you'd like to reproduce our results on ESCI)

  • Download the processed data from Google Drive;
  • Unzip and put sampled_item_metadata_esci.jsonl and test.csv under AmazonReviews2023/product_search_results/cache/esci/;
python bm25.py --dataset McAuley-Lab/Amazon-C4

Arguments

  • --dataset
    • McAuley-Lab/Amazon-C4
    • esci

Data Preprocessing - ESCI

python dataset/process_esci.py