This is a WIP set of scripts for accessing data off the New York Unified Court System website and putting it into a readable formats for search.
This was originally created to make it easier to aggregate information during the Child Victims Act look-back period for people who do not have a way to login in the NYSCEF or want to play with the data.
The data is currently hosted in a sqlite database with datasette here.
A huge challenge of getting data from NYSCEF is that in order to aggregate all data on a certain case type over a time period, you must use the "Case Search" option. This allows you to search by Court
and a specific Date
. Once you have all the cases that were submitted to that court on that date, you must look through the entire list and only look for things under your subject. If anyone has a better way to search through this, please let me know.
Then, there is the issue that all use of "Case Search" is protected by captcha. With a little testing it seemed like the session cookie, JSESSIONID
, associated with the captcha limited requests with the session id to less than 100. I chose 60 just to be safe. This means that after 60 requests you must input a new JESSSIONID
for the script to work. I've sat for a while inputting new ids from fresh, captcha'd sessions into the script.
However, if you have a set of docket ids, there is no captcha requirement for going directly to the docket ids case information, so the other scripts run much faster.
The first step to using the docket_id_fetcher.py
script is going to https://iapps.courts.state.ny.us/nyscef/CaseSearch?TAB=courtDateRange
and passing the captcha test. Then, inspect the cookie and grab the JSESSIONID
cookie. You need this to run the script and for every subsequent 60 queries.
There are also some required argument for the script to run:
argument | required | description |
---|---|---|
start-date | true | the start date of your search window |
session-id | true | the JSESSIONID generated by completing captcha test |
case-type | true | the case type you are looking to aggregate, default: no type |
end-date | false | the end date of your search window, default: today |
court | false | the court you want to search, default: all courts |
output | false | the file you want to output your ids to |
Here is an example of the script running I used for the Child Victim's Act:
python3 docket_id_fetcher.py --start-date "08/14/2019" --case-type "Torts - Child Victims Act" --session-id "37B5BE431C206047654303BE1BE00F70.server2037" --ouput "ids.txt"
Note that you will need to look at the exact language used by NYSCEF for the type
.
If you have a list of all docket ids you want to inspect, you can then use a script to generate data about the cases themselves.
All you need to do is either pass a list of docket ids like so:
python3 case_data_fetcher.py --docket-ids "123,456"
or a file of the docket ids like this:
python3 case_data_fetcher.py --docket-ids-file "ids.txt"
It will spit out a json file with formatting that captures all the metadata for the docket and saves it in an output file.