Skip to content

apark2020/nyscef-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYSCEF Downloader

This is a WIP set of scripts for accessing data off the New York Unified Court System website and putting it into a readable formats for search.

This was originally created to make it easier to aggregate information during the Child Victims Act look-back period for people who do not have a way to login in the NYSCEF or want to play with the data.

The data is currently hosted in a sqlite database with datasette here.

Challenges

A huge challenge of getting data from NYSCEF is that in order to aggregate all data on a certain case type over a time period, you must use the "Case Search" option. This allows you to search by Court and a specific Date. Once you have all the cases that were submitted to that court on that date, you must look through the entire list and only look for things under your subject. If anyone has a better way to search through this, please let me know.

Then, there is the issue that all use of "Case Search" is protected by captcha. With a little testing it seemed like the session cookie, JSESSIONID, associated with the captcha limited requests with the session id to less than 100. I chose 60 just to be safe. This means that after 60 requests you must input a new JESSSIONID for the script to work. I've sat for a while inputting new ids from fresh, captcha'd sessions into the script.

However, if you have a set of docket ids, there is no captcha requirement for going directly to the docket ids case information, so the other scripts run much faster.

Fetching Docket Ids

The first step to using the docket_id_fetcher.py script is going to https://iapps.courts.state.ny.us/nyscef/CaseSearch?TAB=courtDateRange and passing the captcha test. Then, inspect the cookie and grab the JSESSIONID cookie. You need this to run the script and for every subsequent 60 queries.

There are also some required argument for the script to run:

argument required description
start-date true the start date of your search window
session-id true the JSESSIONID generated by completing captcha test
case-type true the case type you are looking to aggregate, default: no type
end-date false the end date of your search window, default: today
court false the court you want to search, default: all courts
output false the file you want to output your ids to

Here is an example of the script running I used for the Child Victim's Act:

python3 docket_id_fetcher.py --start-date "08/14/2019" --case-type "Torts - Child Victims Act" --session-id "37B5BE431C206047654303BE1BE00F70.server2037" --ouput "ids.txt"

Note that you will need to look at the exact language used by NYSCEF for the type.

Fetching Case Data

If you have a list of all docket ids you want to inspect, you can then use a script to generate data about the cases themselves.

All you need to do is either pass a list of docket ids like so:

python3 case_data_fetcher.py --docket-ids "123,456"

or a file of the docket ids like this:

python3 case_data_fetcher.py --docket-ids-file "ids.txt"

It will spit out a json file with formatting that captures all the metadata for the docket and saves it in an output file.

About

Data grabbing from iapps.courts.state.ny.us

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages