Skip to content
Change the repository type filter

All

    Repositories list

    • BMP

      Public
      Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.
      Rust
      Apache License 2.0
      11400Updated Sep 17, 2024Sep 17, 2024
    • wapopp

      Public
      A C++ parser for the Washington Post (WaPo) format.
      C++
      Apache License 2.0
      0000Updated Jul 28, 2024Jul 28, 2024
    • pisa

      Public
      PISA: Performant Indexes and Search for Academia
      C++
      Apache License 2.0
      649234810Updated Jul 20, 2024Jul 20, 2024
    • ciff-hub

      Public
      Hosting some useful CIFFs
      Apache License 2.0
      0000Updated Jun 3, 2024Jun 3, 2024
    • Porter2

      Public
      Porter2 stemming library
      C++
      Apache License 2.0
      4500Updated Apr 8, 2024Apr 8, 2024
    • taily

      Public
      Implementation of Taily algorithm as described by Aly et al. in the 2013 paper "Taily: shard selection using the tail of score distributions."
      C++
      MIT License
      2210Updated Aug 4, 2023Aug 4, 2023
    • Standard speed regression test for PISA
      Rust
      0055Updated Jan 20, 2023Jan 20, 2023
    • ciff

      Public
      The inverted index exchange format as defined as part of the Open-Source IR Replicability Challenge (OSIRRC) initiative
      Rust
      Apache License 2.0
      3900Updated Aug 9, 2022Aug 9, 2022
    • pyciff

      Public
      Python bindings for CIFF library at https://github.com/pisa-engine/ciff
      Python
      Apache License 2.0
      0100Updated Mar 7, 2022Mar 7, 2022
    • pypisa

      Public
      A Python interface to the PISA IR engine
      Python
      Apache License 2.0
      1321Updated Sep 26, 2021Sep 26, 2021
    • warcpp

      Public
      A C++ parser for the Web Archive (WARC) format.
      C++
      Apache License 2.0
      0200Updated Aug 29, 2021Aug 29, 2021
    • Experiments for "A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing"
      Apache License 2.0
      0200Updated Oct 20, 2020Oct 20, 2020
    • Benchmarking several score accumulators used in IR
      C++
      0100Updated Apr 2, 2020Apr 2, 2020
    • Rust
      44000Updated Apr 1, 2020Apr 1, 2020
    • trecpp

      Public
      A C++ parser for the TREC document format.
      C++
      Apache License 2.0
      0000Updated Mar 18, 2020Mar 18, 2020
    • mln

      Public
      An implementation of the Most-Likely-Next algorithm
      C++
      Apache License 2.0
      0000Updated Mar 4, 2020Mar 4, 2020
    • tokenizer

      Public
      C++
      Apache License 2.0
      1100Updated Feb 28, 2020Feb 28, 2020
    • pisa-jr

      Public
      Minimal implementation of PISA in Rust
      Rust
      Apache License 2.0
      0001Updated Feb 26, 2020Feb 26, 2020
    • TREC Text collection format parser
      Rust
      Apache License 2.0
      0000Updated Feb 25, 2020Feb 25, 2020
    • A parser and MongoDB backed store for searching the New York Times Annotated Corpus (LDC2008T19)
      Python
      Apache License 2.0
      2000Updated May 4, 2019May 4, 2019
    • nytpp

      Public
      A C++ parser for the New York Times (NYT) format.
      Apache License 2.0
      0000Updated Apr 30, 2019Apr 30, 2019
    • HTML
      0000Updated Apr 24, 2019Apr 24, 2019
    • Krovetz stemming library
      C++
      Apache License 2.0
      0110Updated Apr 10, 2019Apr 10, 2019
    • Experiments for "Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study".
      C++
      0200Updated Apr 9, 2019Apr 9, 2019
    • docker

      Public
      Docker image for PISA
      Dockerfile
      Apache License 2.0
      1010Updated Apr 3, 2019Apr 3, 2019
    • raxpp

      Public
      C++ bindings for rax: https://github.com/antirez/rax
      CMake
      Apache License 2.0
      0230Updated Jan 28, 2019Jan 28, 2019