Skip to content

Commit

Permalink
Release 0.0.6
Browse files Browse the repository at this point in the history
Release 0.0.6
  • Loading branch information
seliverstov authored Jul 10, 2018
2 parents 1da5fd3 + 8feebcd commit ea94139
Show file tree
Hide file tree
Showing 131 changed files with 8,472 additions and 467 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ node('gpu') {
sh """
virtualenv --python=python3 ".venv-$BUILD_NUMBER"
. .venv-$BUILD_NUMBER/bin/activate
sed -ri 's/^ *tensorflow *(=|<|>|\$)/tensorflow-gpu\\1/g' requirements.txt
sed -ri 's/^\\s*tensorflow\\s*(=|<|>|;|\$)/tensorflow-gpu\\1/g' requirements.txt
sed -i "s/stream=True/stream=False/g" deeppavlov/core/data/utils.py
python setup.py develop
pip install http://lnsigo.mipt.ru/export/en_core_web_sm-2.0.0.tar.gz
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
include README.MD
include LICENSE
include requirements.txt
recursive-include requirements *.txt
recursive-include deeppavlov *.json
recursive-include deeppavlov *.md
recursive-include utils *.json
153 changes: 78 additions & 75 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,75 @@
[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/deepmipt/DeepPavlov/blob/master/LICENSE)
![Python 3.6](https://img.shields.io/badge/python-3.6-green.svg)

**We are in a really early Alpha release. You should be ready for hard adventures.
In version 0.0.5 we updraded to TensorFlow 1.8, please re-download our pre-trained models.**
_We are still in a really early Alpha release._
__In version 0.0.6 everything from package `deeppavlov.skills` except `deeppavlov.skills.pattern_matching_skill` was moved to `deeppavlov.models` so your imports might break__


DeepPavlov is an open-source conversational AI library built on [TensorFlow](https://www.tensorflow.org/) and [Keras](https://keras.io/). It is designed for
* development of production ready chat-bots and complex conversational systems,
* NLP and dialog systems research.

Our goal is to enable AI-application developers and researchers with:
* set of pre-trained NLP models, pre-defined dialog system components (ML/DL/Rule-based) and pipeline templates;
* a framework for implementing and testing their own dialog models;
* tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.);
* benchmarking environment for conversational models and uniform access to relevant datasets.

# Hello Bot in DeepPavlov

Import key components to build HelloBot.
```python
from deeppavlov.core.agent import Agent, HighestConfidenceSelector
from deeppavlov.skills.pattern_matching_skill import PatternMatchingSkill
```

Create skills as pre-defined responses for a user's input containing specific keywords. Every skill returns response and confidence.
```python
hello = PatternMatchingSkill(responses=['Hello world! :)'], patterns=["hi", "hello", "good day"])
bye = PatternMatchingSkill(['Goodbye world! :(', 'See you around.'], ["bye", "chao", "see you"])
fallback = PatternMatchingSkill(["I don't understand, sorry :/", 'I can say "Hello world!" 8)'])
```

Agent executes skills and then takes response from the skill with the highest confidence.
```python
HelloBot = Agent([hello, bye, fallback], skills_selector=HighestConfidenceSelector())
```

Give the floor to the HelloBot!
```python
print(HelloBot(['Hello!', 'Boo...', 'Bye.']))
```

[Jupyther notebook with HelloBot example.](examples/hello_bot.ipynb)


# Installation

0. Currently we support only `Linux` platform and `Python 3.6` (**`Python 3.5` is not supported!**)

1. Create a virtual environment with `Python 3.6`
```
virtualenv env
```
2. Activate the environment.
```
source ./env/bin/activate
```
3. Clone the repo and `cd` to project root
```
git clone https://github.com/deepmipt/DeepPavlov.git
cd DeepPavlov
```
4. Install basic requirements:
```
python setup.py develop
```
# Demo
Demo of selected features is available at [demo.ipavlov.ai](https://demo.ipavlov.ai/)
# Conceptual overview
<!-- ### Principles
The library is designed according to the following principles:
* hybrid ML/DL/Rule-based architecture as a current approach
* support of modular dialog system design
* end-to-end deep learning architecture as a long-term goal
* component-based software engineering, maximization of reusability
* multiple alternative solutions for the same NLP task to enable flexible data-driven configuration
* easy extension and benchmarking -->

<!-- ### Target Architecture
Target architecture of our library: -->
Our goal is to enable AI-application developers and researchers with:
* set of pre-trained NLP models, pre-defined dialog system components (ML/DL/Rule-based) and pipeline templates;
* a framework for implementing and testing their own dialog models;
* tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.);
* benchmarking environment for conversational models and uniform access to relevant datasets.
<p align="left">
<img src="https://deeppavlov.ai/dp_agnt_diag.png"/>
Expand All @@ -56,34 +95,15 @@ DeepPavlov is built on top of machine learning frameworks [TensorFlow](https://w
---
# Installation
0. Currently we support only `Linux` platform and `Python 3.6` (**`Python 3.5` is not supported!**)

1. Create a virtual environment with `Python 3.6`
```
virtualenv env
```
2. Activate the environment.
```
source ./env/bin/activate
```
3. Clone the repo and `cd` to project root
```
git clone https://github.com/deepmipt/DeepPavlov.git
cd DeepPavlov
```
4. Install the requirements:
```
python setup.py develop
```
5. Install `spacy` dependencies:
```
python -m spacy download en
```
# Quick start
To use our pre-trained models, you should first download them:
To use our pre-trained models, you should first install their requirements:
```
python -m deeppavlov install <path_to_config>
```
Then download the models and data for them:
```
python -m deeppavlov download <path_to_config>
```
Expand Down Expand Up @@ -111,52 +131,29 @@ Every line of input text will be used as a pipeline input parameter, so one exam
as many input parameters your pipeline expects.
You can also specify batch size with `-b` or `--batch-size` parameter.
Available model configs are:
- ```deeppavlov/configs/go_bot/*.json```
- ```deeppavlov/configs/intents/*.json```
- ```deeppavlov/configs/morpho_tagger/*.json```
- ```deeppavlov/configs/ner/*.json```
- ```deeppavlov/configs/odqa/*.json```
- ```deeppavlov/configs/ranking/*.json```
- ```deeppavlov/configs/sentiment/*.json```
- ```deeppavlov/configs/seq2seq_go_bot/*.json```
- ```deeppavlov/configs/spelling_correction/*.json```
- ```deeppavlov/configs/squad/*.json```
# Features
| Component | Description |
| --------- | ----------- |
| [NER component](deeppavlov/models/ner/README.md) | Based on neural Named Entity Recognition network. The NER component reproduces architecture from the paper [Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition](https://arxiv.org/pdf/1709.09686.pdf) which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. |
| [Slot filling components](deeppavlov/models/slotfill/README.md) | Based on fuzzy Levenshtein search to extract normalized slot values from text. The components either rely on NER results or perform needle in haystack search.|
| [Classification component](deeppavlov/models/classifiers/intents/README.md) | Component for classification tasks (intents, sentiment, etc). Based on shallow-and-wide Convolutional Neural Network architecture from [Kim Y. Convolutional neural networks for sentence classification – 2014](https://arxiv.org/pdf/1408.5882) and others. The model allows multilabel classification of sentences. |
| [Goal-oriented bot](deeppavlov/models/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand. |
| [Seq2seq goal-oriented bot](deeppavlov/models/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
| [Automatic spelling correction component](deeppavlov/models/spelling_correction/README.md) | Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors. |
| [Ranking component](deeppavlov/models/ranking/README.md) | Based on [LSTM-based deep learning models for non-factoid answer selection](https://arxiv.org/abs/1511.04108). The model performs ranking of responses or contexts from some database by their relevance for the given context. |
| [Question Answering component](deeppavlov/models/squad/README.md) | Based on [R-NET: Machine Reading Comprehension with Self-matching Networks](https://www.microsoft.com/en-us/research/publication/mrc/). The model solves the task of looking for an answer on a question in a given context ([SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) task format). |
| [Morphological tagging component](deeppavlov/models/morpho_tagger/README.md) | Based on character-based approach to morphological tagging [Heigold et al., 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages](http://www.aclweb.org/anthology/E17-1048). A state-of-the-art model for Russian and several other languages. Model assigns morphological tags in UD format to sequences of words.|
| **Skills** | |
| [Goal-oriented bot](deeppavlov/skills/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand. |
| [Seq2seq goal-oriented bot](deeppavlov/skills/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
|[ODQA](deeppavlov/skills/odqa/README.md) | An open domain question answering skill. The skill accepts free-form questions about the world and outputs an answer based on its Wikipedia knowledge.|
| **Parameters Evolution** | |
| [Parameters evolution for models](deeppavlov/models/evolution/README.md) | Implementation of parameters evolution for DeepPavlov models that requires only some small changes in a config file. |
| **Embeddings** | |
| [Pre-trained embeddings for the Russian language](pretrained-vectors.md) | Word vectors for the Russian language trained on joint [Russian Wikipedia](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0) and [Lenta.ru](https://lenta.ru/) corpora. |
# Basic examples
View video demo of deployment of a goal-oriented bot and a slot-filling model with Telegram UI
[![Alt text for your video](https://img.youtube.com/vi/yzoiCa_sMuY/0.jpg)](https://youtu.be/yzoiCa_sMuY)
# Examples of some components
* Run goal-oriented bot with Telegram interface:
```
python -m deeppavlov interactbot deeppavlov/configs/go_bot/gobot_dstc2.json -d -t <TELEGRAM_TOKEN>
Expand Down Expand Up @@ -185,6 +182,12 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
```
python -m deeppavlov predict deeppavlov/configs/intents/intents_snips.json -d --batch-size 15 < /data/in.txt > /data/out.txt
```
View [video demo](https://youtu.be/yzoiCa_sMuY) of deployment of a goal-oriented bot and a slot-filling model with Telegram UI
# Tutorials
Jupyter notebooks and videos explaining how to use DeepPalov for different tasks can be found in [/examples/tutorials/](examples/tutorials/)
---
Expand Down Expand Up @@ -239,7 +242,7 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
</tr>
</table>
## Config
## Config of component
An NLP pipeline config is a JSON file that contains one required element `chainer`:
Expand Down
89 changes: 0 additions & 89 deletions deeppavlov/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,92 +18,3 @@
# check version
import sys
assert sys.hexversion >= 0x3060000, 'Does not work in python3.5 or lower'

import deeppavlov.core.models.keras_model
import deeppavlov.core.data.vocab
import deeppavlov.core.data.simple_vocab
import deeppavlov.core.data.sqlite_database
import deeppavlov.dataset_readers.babi_reader
import deeppavlov.dataset_readers.dstc2_reader
import deeppavlov.dataset_readers.kvret_reader
import deeppavlov.dataset_readers.conll2003_reader
import deeppavlov.dataset_readers.typos_reader
import deeppavlov.dataset_readers.basic_classification_reader
import deeppavlov.dataset_readers.squad_dataset_reader
import deeppavlov.dataset_readers.morphotagging_dataset_reader

import deeppavlov.dataset_iterators.dialog_iterator
import deeppavlov.dataset_iterators.kvret_dialog_iterator
import deeppavlov.dataset_iterators.dstc2_ner_iterator
import deeppavlov.dataset_iterators.dstc2_intents_iterator
import deeppavlov.dataset_iterators.typos_iterator
import deeppavlov.dataset_iterators.basic_classification_iterator
import deeppavlov.dataset_iterators.squad_iterator
import deeppavlov.dataset_iterators.sqlite_iterator
import deeppavlov.dataset_iterators.morphotagger_iterator

import deeppavlov.models.classifiers.intents.intent_model
import deeppavlov.models.commutators.random_commutator
import deeppavlov.models.embedders.fasttext_embedder
import deeppavlov.models.embedders.dict_embedder
import deeppavlov.models.embedders.glove_embedder
import deeppavlov.models.embedders.bow_embedder
import deeppavlov.models.spelling_correction.brillmoore.error_model
import deeppavlov.models.spelling_correction.levenstein.searcher_component
import deeppavlov.models.spelling_correction.electors.kenlm_elector
import deeppavlov.models.spelling_correction.electors.top1_elector
import deeppavlov.models.trackers.hcn_at
import deeppavlov.models.trackers.hcn_et
import deeppavlov.models.preprocessors.str_lower
import deeppavlov.models.preprocessors.squad_preprocessor
import deeppavlov.models.preprocessors.capitalization
import deeppavlov.models.preprocessors.dirty_comments_preprocessor
import deeppavlov.models.tokenizers.nltk_tokenizer
import deeppavlov.models.tokenizers.nltk_moses_tokenizer
import deeppavlov.models.tokenizers.spacy_tokenizer
import deeppavlov.models.tokenizers.split_tokenizer
import deeppavlov.models.tokenizers.ru_tokenizer
import deeppavlov.models.squad.squad
import deeppavlov.models.morpho_tagger.tagger
import deeppavlov.models.morpho_tagger.common
import deeppavlov.models.api_requester

import deeppavlov.skills.go_bot.bot
import deeppavlov.skills.go_bot.network
import deeppavlov.skills.go_bot.tracker
import deeppavlov.skills.seq2seq_go_bot.bot
import deeppavlov.skills.seq2seq_go_bot.network
import deeppavlov.skills.seq2seq_go_bot.kb
import deeppavlov.skills.odqa.tfidf_ranker
import deeppavlov.vocabs.typos
import deeppavlov.vocabs.wiki_sqlite
import deeppavlov.dataset_readers.insurance_reader
import deeppavlov.dataset_iterators.ranking_iterator
import deeppavlov.models.ner.network
import deeppavlov.models.ranking.ranking_model
import deeppavlov.models.ranking.metrics
import deeppavlov.models.preprocessors.char_splitter
import deeppavlov.models.preprocessors.mask
import deeppavlov.models.preprocessors.assemble_embeddins_matrix
import deeppavlov.models.preprocessors.capitalization
import deeppavlov.models.preprocessors.field_getter
import deeppavlov.models.preprocessors.sanitizer
import deeppavlov.models.preprocessors.lazy_tokenizer
import deeppavlov.models.slotfill.slotfill_raw
import deeppavlov.models.slotfill.slotfill
import deeppavlov.models.preprocessors.one_hotter
import deeppavlov.dataset_readers.ontonotes_reader

import deeppavlov.models.classifiers.tokens_matcher.tokens_matcher


import deeppavlov.metrics.accuracy
import deeppavlov.metrics.fmeasure
import deeppavlov.metrics.bleu
import deeppavlov.metrics.squad_metrics
import deeppavlov.metrics.roc_auc_score
import deeppavlov.metrics.fmeasure_classification

import deeppavlov.core.common.log

import deeppavlov.download
Loading

0 comments on commit ea94139

Please sign in to comment.