JusCrawler

The Project

This projects intends to improve the access to public data by providing, developing and integrating an API + Crawler which retrieves public data from processes details of TJAL Tribunal de Justiça de Alagoas and TJCE Tribunal de Justiça do Ceará.

How to run the project:

clone the repository

git clone <this repository>

Create and activate your virtual environment

pip install virtualenv
virtualenv juscrawler
. juscrawler/bin/activate

install all dependecies

pip install -r requirements.txt

Run the API

python run.py

Perform a consult:

Open any client service like Insomnia or Postman and perform a request to localhost on port 5000 using the /consult endpoint and sending the following JSON structure:

127.0.0.1:5000/consult

# JSON format expected:

"process_number":"0000000-00.0000.0.02.0000"
"process_number":"0000000-00.0000.0.06.0000"

or simply:

00000000000000020000 -> 20 chars

How to test the endpoints

with the API up:

pytest tests/test_crawler.py

Some Tools in this project:

Scrapy
Selenium
Flask
Pytest

Some cool features

In order to enhance the crawler perfomance, multiprocessing was applied in API call in order to run two spiders in parallel.
This enhanced the scraping time by 2. The average time for consulting is now between 6 - 7 seconds.
Some input checks was made:
- The API infers the Court (Tribunal de Justiça de Alagoas ou do Ceará) by the input pattern.
- The API considers:
  - Too many digits in input
  - Not enough digits in input
  - Number out of expected format
  - bad formatting of JSON request
Some webpages can get pretty tricky to base the script logic, hence, in order to bypass this problem a combination of scrapy and selenium was used to crawl and scrape the 4 web pages in this project (2 court instances for each)
Regex was vastly used since some CSS could be in many sorts of formats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JusCrawler

The Project

How to run the project:

clone the repository

Create and activate your virtual environment

install all dependecies

Run the API

Perform a consult:

How to test the endpoints

with the API up:

Some Tools in this project:

Some cool features

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
api		api
crawler		crawler
tests		tests
README.md		README.md
readme.md		readme.md
requirements.txt		requirements.txt
run.py		run.py

igoorsimoess/JusCrawler

Folders and files

Latest commit

History

Repository files navigation

JusCrawler

The Project

How to run the project:

clone the repository

Create and activate your virtual environment

install all dependecies

Run the API

Perform a consult:

How to test the endpoints

with the API up:

Some Tools in this project:

Some cool features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages