Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile citations to statutes #73

Open
lmullen opened this issue Aug 26, 2022 · 3 comments
Open

Reconcile citations to statutes #73

lmullen opened this issue Aug 26, 2022 · 3 comments

Comments

@lmullen
Copy link
Owner

lmullen commented Aug 26, 2022

We have likely found statutory citations in the format 1 Reporter 123. (Are there other forms of statutory citation?) We need a source of data and way to reconcile the statutory citations, parallel to reconciling CAP.

@kfunk074
Copy link
Collaborator

There is no open source library for statutes that I’m aware of. The proprietary HeinOnline might be open to collaboration after we can prove concept with CAP.

For now if we just want to compile a database of citations, a couple notes:

  • I think I told you to cap the regular expression at 3 digits for the opening term. It would need to be capped at 4 to capture years. Statutory cites commonly take the form 1848 N.Y. Laws 497 (recognize that one?).
  • Statutes might include a section indicator in place of or in addition to the page. So the above might also be rendered 1848 N.Y. Laws § 1 or 1848 N.Y. Laws 497 Sec. 1 and the obvious permutations in between.
  • If you truly want to be comprehensive you’ll need enough alpha characters to accommodate 1853 Code of Civil Procedure tit. 9 sec. 124. If that sweeps in too much junk to make the game worth the candle, we might need a different approach to sniff out Act, Code, Constitution, Chapter, Title, Article, Section and common abbreviations of the same.

@lmullen
Copy link
Owner Author

lmullen commented Aug 27, 2022

I think this is going to be sufficiently complicated that I don't want to get bogged down here until we've done the CAP cases.

The approach, I think, would be to write a series of more targeted detectors, one each for the patterns. So I don't want to introduce more noise in the generic detector by modifying it.

I am guess OCRwise that 1848 N.Y. Laws § 1 is just unlikely to appear that often because of §. But 1848 N.Y. Laws 497 Sec. 1 is just number words number words number which will catch a lot of stuff.

@kfunk074
Copy link
Collaborator

As the world's premier expert on OCR detection of section symbols, I'd say it does better than you might expect. The trick is usually making sure the regex finder isn't excluding weird characters or treating them as a stop term.

I agree on tabling statutes for now but keeping the issue live to target down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants