Skip to content

weni-ai/bothub-nlp-api

Repository files navigation

Bothub NLP - Natural Language Processing services

Build Status Coverage Status version 2.2.0 python 3.6 license AGPL-3.0

Check the main Bothub project repository.

Services

bothub-nlp-api

Packages

bothub-backend (python 3.6)

bothub-nlp-celery (python 3.6)

Requirements

  • Python (3.6)
  • Docker
  • Docker-Compose

Development

Use make commands to init_development_env.

Command Description
make init_development_env Init file .env with variables environment

Environment Variables

General

Variable Type Default Description
ENVIRONMENT str production
BOTHUB_ENGINE_URL str https://api.bothub.it Web service api url
BOTHUB_NLP_API_HOST str 0.0.0.0 Web service ip
BOTHUB_NLP_API_PORT int 2657 Web service port
BOTHUB_NLP_API_WEB_CONCURRENCY int None
BOTHUB_NLP_API_WORKERS_PER_CORE int 3
BOTHUB_NLP_API_LOG_LEVEL str info
BOTHUB_NLP_API_KEEPALIVE int 120
BOTHUB_NLP_SENTRY_CLIENT bool False Enable Sentry Client
BOTHUB_NLP_SENTRY str None Sentry Client URL
SUPPORTED_LANGUAGES str `en pt`

QA tasks

Variable Type Default Description
BOTHUB_NLP_API_ENABLE_QA_ROUTE bool False Enable QA api route
BOTHUB_NLP_API_QA_TEXT_LIMIT int 25000 Limit of characters allowed in QA text
BOTHUB_NLP_API_QA_QUESTION_LIMIT int 500 Limit of characters allowed in QA question

Training

Variable Type Default Description
BOTHUB_SERVICE_TRAIN str celery celery to train on celery worker or ai-platform to use GCP service
BOTHUB_GOOGLE_PROJECT_ID int None GCP project id
BOTHUB_GOOGLE_CREDENTIALS_REFRESH_TOKEN str None GCP credentials
BOTHUB_GOOGLE_CREDENTIALS_TOKEN_URI str None GCP credentials
BOTHUB_GOOGLE_CREDENTIALS_CLIENT_ID str None GCP credentials
BOTHUB_GOOGLE_CREDENTIALS_CLIENT_SECRET str None GCP credentials
BOTHUB_GOOGLE_AI_PLATFORM_REGISTRY str us.gcr.io/bothub/bothub-nlp-ai-platform Google Container Registry (GCR) project url
BOTHUB_GOOGLE_AI_PLATFORM_IMAGE_VERSION str 1.0.0 String to match built image version in google GCR
BOTHUB_GOOGLE_AI_PLATFORM_PACKAGE_URI list ["gs://poc-training-ai-platform/bothub-nlp-ai-platform/bothub-nlp-ai-platform-0.1.tar.gz"]
BOTHUB_GOOGLE_AI_PLATFORM_JOB_TIMEOUT int None Time limit (seconds) a job can run before sending a cancel signal to GCP

Celery connection

Needed variables to connect to celery running on workers

Variable Type Default Description
BOTHUB_NLP_CELERY_BROKER_URL str redis://localhost:6379/0 Celery Broker URL, check usage instructions in Celery Docs
BOTHUB_NLP_CELERY_BACKEND_URL str BOTHUB_NLP_CELERY_BROKER_URL value Celery Backend URL, check usage instructions in Celery Docs

Celery queue

Variables to set available queues running on workers

Variable Type Default Description
AVAILABLE_SPACY_MODELS string en|pt_br|es|fr|ru Available SPACY models of working nodes
AVAILABLE_BERT_MODELS string en|pt_br|xx Available BERT models of working nodes
AVAILABLE_QA_MODELS string en|pt_br|xx Available QA models of working nodes
AVAILABLE_SPECIFIC_SPACY_QUEUES string en|pt_br|es|fr|ru Available languages with word2vec models. It means there is workers listening to en-SPACY, pt_br-SPACY, ... queues
AVAILABLE_SPECIFIC_BERT_QUEUES string en|pt_br Available languages with BERT models. It means there is workers listening to en-BERT, pt_br-BERT queues. Other languages will be sent to multilang-BERT queue
AVAILABLE_SPECIFIC_QA_QUEUES string en|pt_br Available languages with QA models. It means there is workers listening to en-QA, pt_br-QA queues. Other languages will be sent to multilang-QA queue
AVAILABLE_SPECIFIC_QUEUES string "" Languages without model that need to be handled in exclusive queues. It means there is workers listening to "" queue. Other languages will be sent to multilang queue