Skip to content

πŸ“– Document classification with PyTorch.

License

Notifications You must be signed in to change notification settings

Bunny10/document-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Classification

Document classification with PyTorch. This repository was made using the practicalAI boilerplate template.

Set up with virtualenv

cd src
virtualenv -p python3 venv
source venv/bin/activate
python3 setup.py develop
python3 -m pytest tests
gunicorn --log-level ERROR --workers 4 --timeout 60 --graceful-timeout 30 --bind 0.0.0.0:5000 --access-logfile - --error-logfile - --reload wsgi
tensorboard --logdir="tensorboard" --port=6006

Set up with docker

docker build -t document-classification:latest -f Dockerfile .
docker run -d -p 5000:5000 --name document-classification document-classification:latest

Train a model

  • Training POST /train
curl --request POST \
     --url http://localhost:5000/document-classification/train \
     --header "Content-Type: application/json" \
     --data '{
        "config_file": "training.json"
        }'

Usage

  • Inference POST /predict
curl --request POST \
     --url http://localhost:5000/document-classification/predict/latest \
     --header "Content-Type: application/json" \
     --data '{
        "X": "Global warming is inevitables, scientists warn."
        }'
  • Python package
from api.utils import predict
X = "Global warming is inevitables, scientists warn."
prediction = predict(experiment_id="latest", X=X)["data"]["prediction"]

>>> print (prediction)
[{'y': 'Sci/Tech', 'probability': 0.6540133357048035}, {'y': 'Business', 'probability': 0.339420884847641}, {'y': 'World', 'probability': 0.003702996065840125}, {'y': 'Sports', 'probability': 0.002862769179046154}]

API endpoints

  • Health check GET /api
curl --request GET \
     --url http://localhost:5000/document-classification
  • Training POST /train
curl --request POST \
     --url http://localhost:5000/document-classification/train \
     --header "Content-Type: application/json" \
     --data '{
        "config_file": "training.json"
        }'
  • Inference POST /predict
curl --request POST \
     --url http://localhost:5000/document-classification/predict/latest \
     --header "Content-Type: application/json" \
     --data '{
        "X": "Global warming is inevitables, scientists warn."
        }'
  • List of experiments GET /experiments
curl --request GET \
     --url http://localhost:5000/document-classification/experiments
  • Experiment info GET /info/<experiment_id>
curl --request GET \
     --url http://localhost:5000/document-classification/info
  • Get classes for a model GET /classes/<experiement_id>
curl --request GET \
     --url http://localhost:5000/document-classification/classes
  • Delete an experiment GET /delete/<experiement_id>
curl --request GET \
     --url http://localhost:5000/document-classification/delete/2019-03-14T01:05:49.989428_fafe6eb4-462f-11e9-bfe0-f0189887caab

Directory structure

document-classification/
β”œβ”€β”€ src/
|   β”œβ”€β”€ api/                      - holds all API scripts
|   |   β”œβ”€β”€ endpoints.py            - API endpoint definitions
|   |   └── utils.py                - utility functions for endpoints
|   β”œβ”€β”€ configs/                  - configuration files
|   |   β”œβ”€β”€ logging.json            - logger configuration
|   |   └── training.json           - training configuration
|   β”œβ”€β”€ datasets/                 - directory to hold datasets
|   |   └── news.csv                - data file
|   β”œβ”€β”€ document_classification/  - ML files
|   |   β”œβ”€β”€ dataset.py              - dataset
|   |   β”œβ”€β”€ model.py                - model functions
|   |   β”œβ”€β”€ utils.py                - utility functions
|   |   β”œβ”€β”€ vectorizer.py           - vectorize the processed data
|   |   └── vocabulary.py           - vocabulary to vectorize data
|   β”œβ”€β”€ tests/                    - tests
|   |   β”œβ”€β”€ e2e/                    - integration tests
|   |   β”œβ”€β”€ unit/                   - unit tests
|   β”œβ”€β”€ application.py            - application script
|   β”œβ”€β”€ config.py                 - application configuration
|   β”œβ”€β”€ requirements.txt          - python package requirements
|   β”œβ”€β”€ setup.py                  - custom package setup
|   β”œβ”€β”€ wsgi.py                   - application initialization
β”œβ”€β”€ .dockerignore             - dockerignore file
β”œβ”€β”€ .gitignore                - gitignore file
β”œβ”€β”€ Dockerfile                - Dockerfile for the application
β”œβ”€β”€ CODE_OF_CONDUCT.md        - code of conduct
β”œβ”€β”€ CODEOWNERS                - code owner assignments
β”œβ”€β”€ LICENSE                   - license description
└── README.md                 - repository readme

About

πŸ“– Document classification with PyTorch.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published