This repository contains the code for our ECIR 2023 accepted paper: Towards Effective Paraphrasing for Information Disguise
.
code/beam_search_code/Disguise Text.ipynb
: Shows the disguise of a true sentence (query) via our modelcode/beam_search_code/beam_helper
: contains all the helper modules for our modelbeam_utils.py
: contains the code dealing with single level phrase substitution, Beam Search, Constituency Parse Tree creation etc.synonyms_store.py
: contains the code to get synonyms of a term in Counterfitting synonyms vector spacefaiss_fetch.py
: Contains the code for initializing DPR and fetching top K relevant documentsperplexity_calculation.py
: contains the code initiating the perplexity calculationfetch_use_scores.py
: contains the code to create Universal Sentence Encoding for a given piece of text
code/beam_search_code/counter-fitted-vectors.txt
: Counterfitting vectors used for fetching synonymsdata/all_syns.json
: Contains the 10 nearest neighbours for all terms in the dictionary (the nearest neighbours were calcuated by usingFacebook AI Similarity Search (FAISS)
) on the vectors incounter-fitted-vectors.txt
sql_lite_dbs/<name>.db
: expects the database containing the metadata and contents of the document store (to be used by DPR)code/faiss_indexes/<name>.faiss
: expects the vectors for the documents in the document storecode/faiss_indexes/exp_with_two_thou_short.json
: expects the configuration file containing the parameters describing how to read ".faiss"
Details of the conda environment
for the above codebase is present in adversarial_search.yaml
.
We use Haystack's DPR implementation.
Parameter Name | Description |
MAX_DEPTH | Number of levels in the beam search tree ie the MAXIMUM number of phrase substitutions allowed to be made in the query |
ALPHA_VAL |
|
NUM_PERPLEXITY_NODES_TO_EXPAND |
|
BeamWidth | Max number of nodes at each level of the beam tree. |
NUM_FAISS_DOCS_TO_RETRIEVE | Max relevant documents to be fetched for the query in which the source document's presence needs to be checked. |
SIMILARITY_CUT_OFF_THRESHOLD |
|