Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance case law search relevancy #7

Open
mlissner opened this issue Sep 19, 2024 · 0 comments
Open

Enhance case law search relevancy #7

mlissner opened this issue Sep 19, 2024 · 0 comments
Labels
courtlistener enhancement Making an existing feature better

Comments

@mlissner
Copy link
Member

Feature Request Template

Please include the following information in your feature request:

Headline

  • "CiteGeist gets better again: CourtListener's search engine is now enhanced to provide the best results to all your queries"

What is the Feature?

"CiteGeist" is low-key the name of our search engine's ranking algorithm. There are a number of ways we can make it better:

Each of these uses the metadata of the case to provide better query relevancy and together they should make results more relevant.

What Problem Might it Solve?

Currently, searching uses field-based boosting, phrase boosting, and TF-IDF relevancy to order results. It's fine, but sometimes you really have to wonder why it didn't find the case you're looking for.

Describe a Scenario in Which the Feature Might be Used

Whenever people search, this would enhance the results they got.

Technical Requirements

  • How hard is it to make, subjectively? Medium

  • Best guess, how long would it take to make, roughly? 4 weeks

  • What would it require that we do technically?

    We would have to figure out the different ranking algorithms and how to apply them to our results in a performant way that can be updated as new content comes in (difficult with network-based algos). We would then have to calculate the network scores and associate them with the 10M cases we have.

Existing Systems or Alternatives?

There are plenty of case law search tools, but I think the target we currently have is Google Scholar. If we do this, I expect our results should compete with theirs.

Any Additional Information?

  1. We're also working on semantic search, which would partly obviate this, but I think we'll have keyword search for the foreseeable future.

  2. We're in a cool position with the network-based search because one of the folks in our orbit is doing PhD-level research on that topic. If we can make it work, that'd be special.

  3. In the past, we used pagerank for this, but we stopped a few years ago and nobody noticed. It's not the best ranking algo for something like court cases.

@mlissner mlissner changed the title Enhancement: Make search result ordering better Enhance case law search relevancy Sep 25, 2024
@mlissner mlissner added enhancement Making an existing feature better courtlistener labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
courtlistener enhancement Making an existing feature better
Projects
None yet
Development

No branches or pull requests

1 participant