Document Classification

Document classification with PyTorch. This repository was made using the practicalAI boilerplate template.

Set up with virtualenv

cd src
virtualenv -p python3 venv
source venv/bin/activate
python3 setup.py develop
python3 -m pytest tests
gunicorn --log-level ERROR --workers 4 --timeout 60 --graceful-timeout 30 --bind 0.0.0.0:5000 --access-logfile - --error-logfile - --reload wsgi

tensorboard --logdir="tensorboard" --port=6006

Set up with docker

docker build -t document-classification:latest -f Dockerfile .
docker run -d -p 5000:5000 --name document-classification document-classification:latest

Train a model

Training POST /train

curl --request POST \
     --url http://localhost:5000/document-classification/train \
     --header "Content-Type: application/json" \
     --data '{
        "config_file": "training.json"
        }'

Usage

Inference POST /predict

curl --request POST \
     --url http://localhost:5000/document-classification/predict/latest \
     --header "Content-Type: application/json" \
     --data '{
        "X": "Global warming is inevitables, scientists warn."
        }'

Python package

from api.utils import predict
X = "Global warming is inevitables, scientists warn."
prediction = predict(experiment_id="latest", X=X)["data"]["prediction"]

>>> print (prediction)
[{'y': 'Sci/Tech', 'probability': 0.6540133357048035}, {'y': 'Business', 'probability': 0.339420884847641}, {'y': 'World', 'probability': 0.003702996065840125}, {'y': 'Sports', 'probability': 0.002862769179046154}]

API endpoints

Health check GET /api

curl --request GET \
     --url http://localhost:5000/document-classification

Training POST /train

curl --request POST \
     --url http://localhost:5000/document-classification/train \
     --header "Content-Type: application/json" \
     --data '{
        "config_file": "training.json"
        }'

Inference POST /predict

curl --request POST \
     --url http://localhost:5000/document-classification/predict/latest \
     --header "Content-Type: application/json" \
     --data '{
        "X": "Global warming is inevitables, scientists warn."
        }'

List of experiments GET /experiments

curl --request GET \
     --url http://localhost:5000/document-classification/experiments

Experiment info GET /info/<experiment_id>

curl --request GET \
     --url http://localhost:5000/document-classification/info

Get classes for a model GET /classes/<experiement_id>

curl --request GET \
     --url http://localhost:5000/document-classification/classes

Delete an experiment GET /delete/<experiement_id>

curl --request GET \
     --url http://localhost:5000/document-classification/delete/2019-03-14T01:05:49.989428_fafe6eb4-462f-11e9-bfe0-f0189887caab

Directory structure

document-classification/
├── src/
|   ├── api/                      - holds all API scripts
|   |   ├── endpoints.py            - API endpoint definitions
|   |   └── utils.py                - utility functions for endpoints
|   ├── configs/                  - configuration files
|   |   ├── logging.json            - logger configuration
|   |   └── training.json           - training configuration
|   ├── datasets/                 - directory to hold datasets
|   |   └── news.csv                - data file
|   ├── document_classification/  - ML files
|   |   ├── dataset.py              - dataset
|   |   ├── model.py                - model functions
|   |   ├── utils.py                - utility functions
|   |   ├── vectorizer.py           - vectorize the processed data
|   |   └── vocabulary.py           - vocabulary to vectorize data
|   ├── tests/                    - tests
|   |   ├── e2e/                    - integration tests
|   |   ├── unit/                   - unit tests
|   ├── application.py            - application script
|   ├── config.py                 - application configuration
|   ├── requirements.txt          - python package requirements
|   ├── setup.py                  - custom package setup
|   ├── wsgi.py                   - application initialization
├── .dockerignore             - dockerignore file
├── .gitignore                - gitignore file
├── Dockerfile                - Dockerfile for the application
├── CODE_OF_CONDUCT.md        - code of conduct
├── CODEOWNERS                - code owner assignments
├── LICENSE                   - license description
└── README.md                 - repository readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Classification

Set up with virtualenv

Set up with docker

Train a model

Usage

API endpoints

Directory structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

License

Bunny10/document-classification

Folders and files

Latest commit

History

Repository files navigation

Document Classification

Set up with virtualenv

Set up with docker

Train a model

Usage

API endpoints

Directory structure

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages