Skip to content

RJMillerLab/opendata-keyword-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keyword search of Open Data

This is the repository for a keyword search engine for Open Data repositories.

The implementation is based :

  • Xapian: a high performant and scalable text search engine.
  • Gensim: a Python NLP library. This is used to access pretrained word vectors.

The search engine indexes and searches both metadata and data values of Socrata data sets.

Prerequisites:

  • Socrata datasets:

    • Data files: `./data/usertables_data/<package_id>.json.gz
    • Metadata files: ./data/usertables_schema/<package_id>_schema.json
  • Pretrained GloVe word vectors

Makefile

make ./data/wordvec/glove.6B.txt

Downloads the glove vectors

make ./data/wordvec_50d.txt

converts the glove vectors to word2vec vectors

make index

Creates the xapian index for both metadata and data values.

make backend

Starts the backend search API server

make frontend

Starts the frontend Web application at port 8997. You can start searching at "http://localhost:8997"

About

Searching open data with keyword queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published