Skip to content

Commit

Permalink
Release 1.7.0
Browse files Browse the repository at this point in the history
  • Loading branch information
IgnatovFedor authored Aug 12, 2024
2 parents 6e1036d + ab737ee commit aff2748
Show file tree
Hide file tree
Showing 8 changed files with 260 additions and 63 deletions.
91 changes: 31 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,29 @@
# DeepPavlov 1.0

[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
![Python 3.6, 3.7, 3.8, 3.9, 3.10, 3.11](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-green.svg)
[![Downloads](https://pepy.tech/badge/deeppavlov)](https://pepy.tech/project/deeppavlov)
<img align="right" height="27%" width="27%" src="docs/_static/deeppavlov_logo.png"/>
[![Static Badge](https://img.shields.io/badge/DeepPavlov%20Community-blue)](https://forum.deeppavlov.ai/)
[![Static Badge](https://img.shields.io/badge/DeepPavlov%20Demo-blue)](https://demo.deeppavlov.ai/)

DeepPavlov is an open-source conversational AI library built on [PyTorch](https://pytorch.org/).

DeepPavlov is designed for
* development of production ready chat-bots and complex conversational systems,
* research in the area of NLP and, particularly, of dialog systems.
DeepPavlov 1.0 is an open-source NLP framework built on [PyTorch](https://pytorch.org/) and [transformers](https://github.com/huggingface/transformers). DeepPavlov 1.0 is created for modular and configuration-driven development of state-of-the-art NLP models and supports a wide range of NLP model applications. DeepPavlov 1.0 is designed for practitioners with limited knowledge of NLP/ML.

## Quick Links

* Demo [*demo.deeppavlov.ai*](https://demo.deeppavlov.ai/)
* Documentation [*docs.deeppavlov.ai*](http://docs.deeppavlov.ai/)
* Model List [*docs:features/*](http://docs.deeppavlov.ai/en/master/features/overview.html)
* Contribution Guide [*docs:contribution_guide/*](http://docs.deeppavlov.ai/en/master/devguides/contribution_guide.html)
* Issues [*github/issues/*](https://github.com/deeppavlov/DeepPavlov/issues)
* Forum [*forum.deeppavlov.ai*](https://forum.deeppavlov.ai/)
* Blogs [*medium.com/deeppavlov*](https://medium.com/deeppavlov)
* [Extended colab tutorials](https://github.com/deeppavlov/dp_tutorials)
* Docker Hub [*hub.docker.com/u/deeppavlov/*](https://hub.docker.com/u/deeppavlov/)
* Docker Images Documentation [*docs:docker-images/*](http://docs.deeppavlov.ai/en/master/intro/installation.html#docker-images)

Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we can improve the DeepPavlov framework.

**Models**

[Named Entity Recognition](http://docs.deeppavlov.ai/en/master/features/models/NER.html) | [Intent/Sentence Classification](http://docs.deeppavlov.ai/en/master/features/models/classification.html) |

[Question Answering over Text (SQuAD)](http://docs.deeppavlov.ai/en/master/features/models/SQuAD.html) | [Knowledge Base Question Answering](http://docs.deeppavlov.ai/en/master/features/models/KBQA.html)

[Sentence Similarity/Ranking](http://docs.deeppavlov.ai/en/master/features/models/neural_ranking.html) | [TF-IDF Ranking](http://docs.deeppavlov.ai/en/master/features/models/tfidf_ranking.html)

[Syntactic Parsing](http://docs.deeppavlov.ai/en/master/features/models/syntax_parser.html) | [Morphological Tagging](http://docs.deeppavlov.ai/en/master/features/models/morpho_tagger.html)

[Automatic Spelling Correction](http://docs.deeppavlov.ai/en/master/features/models/spelling_correction.html) | [Entity Extraction](http://docs.deeppavlov.ai/en/master/features/models/entity_extraction.html)

[Open Domain Questions Answering](http://docs.deeppavlov.ai/en/master/features/models/ODQA.html) | [Russian SuperGLUE](http://docs.deeppavlov.ai/en/master/features/models/superglue.html)

[Relation Extraction](http://docs.deeppavlov.ai/en/master/features/models/relation_extraction.html)

**Embeddings**

[BERT embeddings for the Russian, Polish, Bulgarian, Czech, and informal English](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert)
|name|Description|
|--|--|
| ⭐️ [*Demo*](https://demo.deeppavlov.ai/)|Check out our NLP models in the online demo|
| 📚 [*Documentation*](http://docs.deeppavlov.ai/)|How to use DeepPavlov 1.0 and its features|
| 🚀 [*Model List*](http://docs.deeppavlov.ai/en/master/features/overview.html)|Find the NLP model you need in the list of available models|
| 🪐 [*Contribution Guide*](http://docs.deeppavlov.ai/en/master/devguides/contribution_guide.html)|Please read the contribution guidelines before making a contribution|
| 🎛 [*Issues*](https://github.com/deeppavlov/DeepPavlov/issues)|If you have an issue with DeepPavlov, please let us know|
|[*Forum*](https://forum.deeppavlov.ai/)|Please let us know if you have a problem with DeepPavlov|
| 📦 [*Blogs*](https://medium.com/deeppavlov)|Read about our current development|
| 🦙 [Extended colab tutorials](https://github.com/deeppavlov/dp_tutorials)|Check out the code tutorials for our models|
| 🌌 [*Docker Hub*](https://hub.docker.com/u/deeppavlov/)|Check out the Docker images for rapid deployment|
| 👩‍🏫 [*Feedback*](https://forms.gle/i64fowQmiVhMMC7f9)|Please leave us your feedback to make DeepPavlov better|

[ELMo embeddings for the Russian language](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#elmo)

[FastText embeddings for the Russian language](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#fasttext)

**Auto ML**

[Tuning Models](http://docs.deeppavlov.ai/en/master/features/hypersearch.html)

**Integrations**

[REST API](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html) | [Socket API](http://docs.deeppavlov.ai/en/master/integrations/socket_api.html)

[Amazon AWS](http://docs.deeppavlov.ai/en/master/integrations/aws_ec2.html)

## Installation

Expand All @@ -65,11 +32,14 @@ Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we c

1. Create and activate a virtual environment:
* `Linux`

```
python -m venv env
source ./env/bin/activate
```
2. Install the package inside the environment:
```
pip install deeppavlov
```
Expand Down Expand Up @@ -122,7 +92,7 @@ Dataset will be downloaded regardless of whether there was `-d` flag or not.

To train on your own data you need to modify dataset reader path in the
[train config doc](http://docs.deeppavlov.ai/en/master/intro/config_description.html#train-config).
The data format is specified in the corresponding model doc page.
The data format is specified in the corresponding model doc page.

There are even more actions you can perform with configs:

Expand All @@ -131,20 +101,19 @@ python -m deeppavlov <action> <config_path> [-d] [-i]
```

* `<action>` can be
* `install` to install model requirements (same as `-i`),
* `download` to download model's data (same as `-d`),
* `train` to train the model on the data specified in the config file,
* `evaluate` to calculate metrics on the same dataset,
* `interact` to interact via CLI,
* `riseapi` to run a REST API server (see
* `install` to install model requirements (same as `-i`),
* `download` to download model's data (same as `-d`),
* `train` to train the model on the data specified in the config file,
* `evaluate` to calculate metrics on the same dataset,
* `interact` to interact via CLI,
* `riseapi` to run a REST API server (see
[doc](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html)),
* `predict` to get prediction for samples from *stdin* or from
* `predict` to get prediction for samples from *stdin* or from
*<file_path>* if `-f <file_path>` is specified.
* `<config_path>` specifies path (or name) of model's config file
* `-d` downloads required data
* `-i` installs model requirements


### Python

To get predictions from a model interactively through Python, run
Expand All @@ -157,7 +126,9 @@ model = build_model(<config_path>, install=True, download=True)
# get predictions for 'input_text1', 'input_text2'
model(['input_text1', 'input_text2'])
```

where

* `install=True` installs model requirements (optional),
* `download=True` downloads required data from web - pretrained model files and embeddings (optional),
* `<config_path>` is model name (e.g. `'ner_ontonotes_bert_mult'`), path to the chosen model's config file (e.g.
Expand All @@ -174,7 +145,7 @@ model = train_model(<config_path>, install=True, download=True)

To train on your own data you need to modify dataset reader path in the
[train config doc](http://docs.deeppavlov.ai/en/master/intro/config_description.html#train-config).
The data format is specified in the corresponding model doc page.
The data format is specified in the corresponding model doc page.

You can also calculate metrics on the dataset specified in your config file:

Expand Down
2 changes: 1 addition & 1 deletion deeppavlov/_meta.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '1.6.0'
__version__ = '1.7.0'
__author__ = 'Neural Networks and Deep Learning lab, MIPT'
__description__ = 'An open source library for building end-to-end dialog systems and training chatbots.'
__keywords__ = ['NLP', 'NER', 'SQUAD', 'Intents', 'Chatbot']
Expand Down
134 changes: 134 additions & 0 deletions deeppavlov/configs/ner/ner_conll2003_deberta_crf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
{
"dataset_reader": {
"class_name": "conll2003_reader",
"data_path": "{DOWNLOADS_PATH}/conll2003/",
"dataset_name": "conll2003",
"provide_pos": false
},
"dataset_iterator": {
"class_name": "data_learning_iterator"
},
"chainer": {
"in": [
"x"
],
"in_y": [
"y"
],
"pipe": [
{
"class_name": "torch_transformers_ner_preprocessor",
"vocab_file": "{TRANSFORMER}",
"do_lower_case": false,
"max_seq_length": 512,
"max_subword_length": 15,
"token_masking_prob": 0.0,
"in": [
"x"
],
"out": [
"x_tokens",
"x_subword_tokens",
"x_subword_tok_ids",
"startofword_markers",
"attention_mask",
"tokens_offsets"
]
},
{
"id": "tag_vocab",
"class_name": "simple_vocab",
"unk_token": [
"O"
],
"pad_with_zeros": true,
"save_path": "{MODEL_PATH}/tag.dict",
"load_path": "{MODEL_PATH}/tag.dict",
"fit_on": [
"y"
],
"in": [
"y"
],
"out": [
"y_ind"
]
},
{
"class_name": "torch_transformers_sequence_tagger",
"n_tags": "#tag_vocab.len",
"pretrained_bert": "{TRANSFORMER}",
"attention_probs_keep_prob": 0.5,
"use_crf": true,
"encoder_layer_ids": [
-1
],
"save_path": "{MODEL_PATH}/model",
"load_path": "{MODEL_PATH}/model",
"in": [
"x_subword_tok_ids",
"attention_mask",
"startofword_markers"
],
"in_y": [
"y_ind"
],
"out": [
"y_pred_ind",
"probas"
]
},
{
"ref": "tag_vocab",
"in": [
"y_pred_ind"
],
"out": [
"y_pred"
]
}
],
"out": [
"x_tokens",
"y_pred"
]
},
"train": {
"metrics": [
{
"name": "ner_f1",
"inputs": [
"y",
"y_pred"
]
},
{
"name": "ner_token_f1",
"inputs": [
"y",
"y_pred"
]
}
],
"evaluation_targets": [
"valid",
"test"
],
"class_name": "torch_trainer"
},
"metadata": {
"variables": {
"ROOT_PATH": "~/.deeppavlov",
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
"MODELS_PATH": "{ROOT_PATH}/models",
"TRANSFORMER": "microsoft/deberta-v3-base",
"MODEL_PATH": "{MODELS_PATH}/ner_conll2003_deberta_crf"
},
"download": [
{
"url": "http://files.deeppavlov.ai/v1/ner/ner_conll2003_deberta_crf.tar.gz",
"subdir": "{MODEL_PATH}"
}
]
}
}
86 changes: 86 additions & 0 deletions deeppavlov/configs/ner/ner_ontonotes_deberta_crf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
{
"dataset_reader": {
"class_name": "conll2003_reader",
"data_path": "{DOWNLOADS_PATH}/ontonotes/",
"dataset_name": "ontonotes",
"provide_pos": false
},
"dataset_iterator": {
"class_name": "data_learning_iterator"
},
"chainer": {
"in": ["x"],
"in_y": ["y"],
"pipe": [
{
"class_name": "torch_transformers_ner_preprocessor",
"vocab_file": "{TRANSFORMER}",
"do_lower_case": false,
"max_seq_length": 512,
"max_subword_length": 15,
"token_masking_prob": 0.0,
"in": ["x"],
"out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"]
},
{
"id": "tag_vocab",
"class_name": "simple_vocab",
"unk_token": ["O"],
"pad_with_zeros": true,
"save_path": "{MODEL_PATH}/tag.dict",
"load_path": "{MODEL_PATH}/tag.dict",
"fit_on": ["y"],
"in": ["y"],
"out": ["y_ind"]
},
{
"class_name": "torch_transformers_sequence_tagger",
"n_tags": "#tag_vocab.len",
"pretrained_bert": "{TRANSFORMER}",
"attention_probs_keep_prob": 0.5,
"use_crf": true,
"encoder_layer_ids": [-1],
"save_path": "{MODEL_PATH}/model",
"load_path": "{MODEL_PATH}/model",
"in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"],
"in_y": ["y_ind"],
"out": ["y_pred_ind", "probas"]
},
{
"ref": "tag_vocab",
"in": ["y_pred_ind"],
"out": ["y_pred"]
}
],
"out": ["x_tokens", "y_pred"]
},
"train": {
"metrics": [
{
"name": "ner_f1",
"inputs": ["y", "y_pred"]
},
{
"name": "ner_token_f1",
"inputs": ["y", "y_pred"]
}
],
"evaluation_targets": ["valid", "test"],
"class_name": "torch_trainer"
},
"metadata": {
"variables": {
"ROOT_PATH": "~/.deeppavlov",
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
"MODELS_PATH": "{ROOT_PATH}/models",
"TRANSFORMER": "microsoft/deberta-v3-base",
"MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_deberta_crf"
},
"download": [
{
"url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_deberta_crf.tar.gz",
"subdir": "{MODEL_PATH}"
}
]
}
}
4 changes: 3 additions & 1 deletion deeppavlov/core/common/requirements_registry.json
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,9 @@
],
"torch_transformers_ner_preprocessor": [
"{DEEPPAVLOV_PATH}/requirements/pytorch.txt",
"{DEEPPAVLOV_PATH}/requirements/transformers.txt"
"{DEEPPAVLOV_PATH}/requirements/transformers.txt",
"{DEEPPAVLOV_PATH}/requirements/sentencepiece.txt",
"{DEEPPAVLOV_PATH}/requirements/protobuf.txt"
],
"torch_transformers_nll_ranker": [
"{DEEPPAVLOV_PATH}/requirements/pytorch.txt",
Expand Down
1 change: 1 addition & 0 deletions deeppavlov/requirements/protobuf.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
protobuf<=3.20
1 change: 1 addition & 0 deletions deeppavlov/requirements/sentencepiece.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sentencepiece==0.2.0
Loading

0 comments on commit aff2748

Please sign in to comment.