Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docket entry classifier #75

Open
anseljh opened this issue Nov 22, 2024 · 0 comments
Open

Docket entry classifier #75

anseljh opened this issue Nov 22, 2024 · 0 comments

Comments

@anseljh
Copy link
Member

anseljh commented Nov 22, 2024

Headline

FLP's RECAP Archive Categorizes Docket Entries to Help You Find the Needle in the Haystack

What is the Feature?

Build classifiers for docket entry text to label them as, e.g., Complaint, Answer, Pleading (a superset encompassing both Complaint and Answer), Motion, Memorandum, Order, Judgment, Motion for Summary Judgment (a subset of Motion), Claim Construction Order, etc.

These can then become:

  • search facets
  • filters on the docket view (just show me the pleadings!)
  • labels visible on the docket entry, to help you see at a glance if it's what you want
  • (those labels could be localized to other languages to help users whose first language is not English)

What Problem Might it Solve?

A lot of dockets are hella long, hundreds or thousands of entries across many pages. This stinks if you're only interested in a subset of the documents. Today, you have to either read a zillion docket entries, or ctrl-F your way through each page—and that may not work because of variation across courts. If we classified docket entries, then users could filter on our labels instead, eliminating the inaccurate text searches.

Describe a Scenario in Which the Feature Might be Used

As a lawyer, I'm working on a summary judgment motion in a patent case. I know my opposing party has been in some other cases, and I want to see what arguments they raised in cases that reached the summary judgment phase. I can find the cases, but those dockets are crazy long.

Enter docket entry classification! Now, I can filter by label and easily get to all the summary judgment motions and orders.

Technical Requirements

  • This is medium hard at least! We'd need to come up with (or adopt) a decent controlled vocabulary for our labels, and then train models to recognize them across the corpus of RECAP docket entries.
  • If this is supervised ML, we need to produce a solid training set.
  • If we used LLMs, it could take much less time, but we'd need to test a lot to gain confidence, and it would be computationally and financially expensive because of how many docket entries we have

Existing Systems or Alternatives?

Back in the day, I did this with a rules engine that evaluated zillions of regexes for proto-Lex Machina. It was hard because there was so much variation in wording across courts, which made it complex and brittle. However, at least while I was there, it outperformed what PhD students were able to do with ML. But that was a long time ago, and a lot has changed!

Any Additional Information?

This is also important because it's an enabler for other things:

  • More targeted alerts (alert only on new pleadings; never alert on pro hac vice materials; etc.)
  • Targeted further treatment of important docs
  • Domain-specific research
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant