Query-Independent Priors

The script query-independent-priors.py can be used to calculate query-independent priors on the basis of training data. To simply print the repository priors, use the --no-update option. Otherwise, this script will attempt to update an existing ElasticSearch index to add the priors it has computed.

Usage

In general, query independent priors can be computed and updated with the following command:

python3 query-independent-priors.py -q <qrels> -d <update_json_folder>

There are several options. The defaults for ElasticSearch updating are based on a basic ElasticSearch installation with an index called "biocaddie." These parameters may be overridden with the -u, -p, and -i options. See all options with -h.

Scoring with Pre-computed Priors

Once your ElasticSearch index is updated to include priors, they may be incorporated into document scoring using the following query format (with an example curl command):

curl -XGET '<host>:<port>/<index>/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "function_score": {
      "query": {
        "query_string": {
          "query": "<query string>"
        }
      },
      "script_score": {
        "script": {
          "lang": "painless",
          "inline": "_score * params._source.prior"
        }
      }
    }
  }
}'

Be sure to replace bracketed sections with your required values. This will adjust the raw document score by multiplying it against the stored prior value previously computed.

Query-Dependent Priors

For query-dependent priors, we provide a simple ElasticSearch plugin wrapping around the search endpoint. Once installed, query-dependent priors may be computed and incorporated into scoring on-the-fly by directing searches to the /<index>/_priorsearch REST endpoint.

Prerequisites

Docker

OR

Git + Maven

Usage

For now, cloning the source is required to run the plugin:

git clone gtsherman/biocaddie-priors && cd biocaddie-priors

To use:

Setup

Make sure that the biocaddie benchmark test dataset exists somewhere on disk:

cd $HOME
wget https://biocaddie.org/sites/default/files/update_json_folder.zip && unzip update_json_folder.zip

Run an ElasticSearch 5.3.2 container using the helper script:

./scripts/start.sh

Then, set up an index with the required parameters (store==true):

./scripts/create-index.sh

NOTE: You may need to modify dataset_path in ./scripts/add-docs.sh if your benchmark data is not located within $HOME.

Finally, use the helper script to add the documents to the index:

./scripts/add-docs.sh

NOTE: Indexing the full benchmark set can take a long time. If you only need a small subset of the documents, you can always Ctrl+C once you get the desired number of records indexed.

Build

A helper script has been included to ease building:

./scripts/build.sh

This will attempt to build the source using Maven (or Docker, if Maven is not available).

Either way, the build should produce a target/releases/ directory with the necessary .zip file.

The .zip that ElasticSearch needs should be found at ./target/releases/queryexpansion-5.3.2-SNAPSHOT.zip.

Load

Once the artifacts are built, we just need to install them and restart ElasticSearch:

./scripts/install.sh
./scripts/restart.sh

Test

You should now be able to test the new endpoint using the helper script or via raw curl:

$ ./test.sh

You can check the container logs to see what happened under the covers:

$ ./logs.sh
...
[2017-07-25T23:47:58,861][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] Starting QueryDocPriorSearch (index=biocaddie, query=multiple sclerosis, type=dataset, field=_all, fbDocs=50, stoplist=null)
[2017-07-25T23:47:58,923][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] Parameters: 
[2017-07-25T23:47:58,923][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] bioproject_021116: 0.0821917808219178
[2017-07-25T23:47:58,924][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] arrayexpress_020916: 0.1095890410958904
[2017-07-25T23:47:58,924][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] __DEFAULT_PRIOR__: 0.0136986301369863
[2017-07-25T23:47:58,924][INFO ][o.n.e.p.QueryDocPriorSearchRestAction] [cb4q1Rk] Running query with prior against: biocaddie
...

Helper Scripts

A few other helper scripts are included to ease testing of the plugin:

./scripts/start.sh          # Runs or starts your elasticsearch container
./scripts/stop.sh           # Stops your elasticsearch container
./scripts/restart.sh
./scripts/create-index.sh   # Creates a test index with the proper settings to enable storing term vectors
./scripts/add-docs.sh [-v]  # Adds documents from the biocaddie benchmark set to your index (assumes correct paths)
./scripts/delete-index.sh   # Deletes your container's test index and the records within
./scripts/build.sh          # Builds up elasticsearch plugin artifacts
./scripts/install.sh        # Installs the elasticsearch plugin into your running container
./scripts/remove.sh         # Removes your container's installed queryexpanion plugin
./rebuild.sh                # Removes the current plugin, builds the artifacts, installs the new plugin, and restarts elasticsearch to facilitate rapid development and testing
./logs.sh                   # View your elasticsearch container logs (tail=100)
./test.sh [search]          # Performs a test query against our REST API endpoint (only expands by default, but searches if first parameter is "search")

Credit

The plugin code is pseudo-forked from https://github.com/bodom0015/elasticsearch-queryexpansion-plugin. Much of it remains unchanged, and strangeness may occur.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logs.sh		logs.sh
pom.xml		pom.xml
query-independent-priors.py		query-independent-priors.py
rebuild.sh		rebuild.sh
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query-Independent Priors

Usage

Scoring with Pre-computed Priors

Query-Dependent Priors

Prerequisites

Usage

Setup

Build

Load

Test

Helper Scripts

Credit

About

Releases

Packages

Contributors 2

Languages

License

biocaddie/biocaddie-priors

Folders and files

Latest commit

History

Repository files navigation

Query-Independent Priors

Usage

Scoring with Pre-computed Priors

Query-Dependent Priors

Prerequisites

Usage

Setup

Build

Load

Test

Helper Scripts

Credit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages