Release 0.0.6

deeppavlov · Jul 10, 2018 · ea94139 · ea94139
2 parents 1da5fd3 + 8feebcd
commit ea94139
Show file tree

Hide file tree

Showing 131 changed files with 8,472 additions and 467 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -11,7 +11,7 @@ node('gpu') {
             sh """
                 virtualenv --python=python3 ".venv-$BUILD_NUMBER"
                 . .venv-$BUILD_NUMBER/bin/activate
-                sed -ri 's/^ *tensorflow *(=|<|>|\$)/tensorflow-gpu\\1/g' requirements.txt
+                sed -ri 's/^\\s*tensorflow\\s*(=|<|>|;|\$)/tensorflow-gpu\\1/g' requirements.txt
                 sed -i "s/stream=True/stream=False/g" deeppavlov/core/data/utils.py
                 python setup.py develop
                 pip install http://lnsigo.mipt.ru/export/en_core_web_sm-2.0.0.tar.gz

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,6 +1,7 @@
 include README.MD
 include LICENSE
 include requirements.txt
+recursive-include requirements *.txt
 recursive-include deeppavlov *.json
 recursive-include deeppavlov *.md
 recursive-include utils *.json
diff --git a/README.md b/README.md
@@ -1,36 +1,75 @@
 [![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/deepmipt/DeepPavlov/blob/master/LICENSE)
 ![Python 3.6](https://img.shields.io/badge/python-3.6-green.svg)
 
-**We are in a really early Alpha release. You should be ready for hard adventures. 
-In version 0.0.5 we updraded to TensorFlow 1.8, please re-download our pre-trained models.**
+_We are still in a really early Alpha release._  
+__In version 0.0.6 everything from package `deeppavlov.skills` except `deeppavlov.skills.pattern_matching_skill` was moved to `deeppavlov.models` so your imports might break__  
+
 
 DeepPavlov is an open-source conversational AI library built on [TensorFlow](https://www.tensorflow.org/) and [Keras](https://keras.io/). It is designed for
  * development of production ready chat-bots and complex conversational systems,
  * NLP and dialog systems research.
-
-Our goal is to enable AI-application developers and researchers with:
- * set of pre-trained NLP models, pre-defined dialog system components (ML/DL/Rule-based) and pipeline templates;
- * a framework for implementing and testing their own dialog models;
- * tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.);
- * benchmarking environment for conversational models and uniform access to relevant datasets.
+
+# Hello Bot in DeepPavlov
+
+Import key components to build HelloBot. 
+```python
+from deeppavlov.core.agent import Agent, HighestConfidenceSelector
+from deeppavlov.skills.pattern_matching_skill import PatternMatchingSkill
+```
+
+Create skills as pre-defined responses for a user's input containing specific keywords. Every skill returns response and confidence.
+```python
+hello = PatternMatchingSkill(responses=['Hello world! :)'], patterns=["hi", "hello", "good day"])
+bye = PatternMatchingSkill(['Goodbye world! :(', 'See you around.'], ["bye", "chao", "see you"])
+fallback = PatternMatchingSkill(["I don't understand, sorry :/", 'I can say "Hello world!" 8)'])
+```
+
+Agent executes skills and then takes response from the skill with the highest confidence.
+```python
+HelloBot = Agent([hello, bye, fallback], skills_selector=HighestConfidenceSelector())
+```
+
+Give the floor to the HelloBot!
+```python
+print(HelloBot(['Hello!', 'Boo...', 'Bye.']))
+```
+
+[Jupyther notebook with HelloBot example.](examples/hello_bot.ipynb)
+
+
+# Installation
+
+0. Currently we support only `Linux` platform and `Python 3.6` (**`Python 3.5` is not supported!**)
+
+1. Create a virtual environment with `Python 3.6`
+    ```
+    virtualenv env
+    ```
+2. Activate the environment.
+    ```
+    source ./env/bin/activate
+    ```
+3. Clone the repo and `cd` to project root
+   ```
+   git clone https://github.com/deepmipt/DeepPavlov.git
+   cd DeepPavlov
+   ```
+4. Install basic requirements:
+    ```
+    python setup.py develop
+    ```
 
 # Demo 
 
 Demo of selected features is available at [demo.ipavlov.ai](https://demo.ipavlov.ai/)
 
 # Conceptual overview
 
-<!-- ### Principles
-The library is designed according to the following principles:
- * hybrid ML/DL/Rule-based architecture as a current approach
- * support of modular dialog system design
- * end-to-end deep learning architecture as a long-term goal
- * component-based software engineering, maximization of reusability
- * multiple alternative solutions for the same NLP task to enable flexible data-driven configuration
- * easy extension and benchmarking -->
-
-<!-- ### Target Architecture
-Target architecture of our library: -->
+Our goal is to enable AI-application developers and researchers with:
+ * set of pre-trained NLP models, pre-defined dialog system components (ML/DL/Rule-based) and pipeline templates;
+ * a framework for implementing and testing their own dialog models;
+ * tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.);
+ * benchmarking environment for conversational models and uniform access to relevant datasets.
 
 <p align="left">
 <img src="https://deeppavlov.ai/dp_agnt_diag.png"/>
@@ -56,34 +95,15 @@ DeepPavlov is built on top of machine learning frameworks [TensorFlow](https://w
 
 ---
 
-# Installation
-0. Currently we support only `Linux` platform and `Python 3.6` (**`Python 3.5` is not supported!**)
-
-1. Create a virtual environment with `Python 3.6`
-    ```
-    virtualenv env
-    ```
-2. Activate the environment.
-    ```
-    source ./env/bin/activate
-    ```
-3. Clone the repo and `cd` to project root
-   ```
-   git clone https://github.com/deepmipt/DeepPavlov.git
-   cd DeepPavlov
-   ```
-4. Install the requirements:
-    ```
-    python setup.py develop
-    ```
-5. Install `spacy` dependencies:
-    ```
-    python -m spacy download en
-    ```
-
 # Quick start
 
-To use our pre-trained models, you should first download them:
+To use our pre-trained models, you should first install their requirements:
+```
+python -m deeppavlov install <path_to_config>
+```
+
+  
+Then download the models and data for them:
 ```
 python -m deeppavlov download <path_to_config>
 ```
@@ -111,52 +131,29 @@ Every line of input text will be used as a pipeline input parameter, so one exam
 as many input parameters your pipeline expects.  
 You can also specify batch size with `-b` or `--batch-size` parameter.
 
-Available model configs are:
-
-- ```deeppavlov/configs/go_bot/*.json```
-
-- ```deeppavlov/configs/intents/*.json```
-
-- ```deeppavlov/configs/morpho_tagger/*.json```
-
-- ```deeppavlov/configs/ner/*.json```
-
-- ```deeppavlov/configs/odqa/*.json```
-
-- ```deeppavlov/configs/ranking/*.json```
-
-- ```deeppavlov/configs/sentiment/*.json```
-
-- ```deeppavlov/configs/seq2seq_go_bot/*.json```
-
-- ```deeppavlov/configs/spelling_correction/*.json```
-
-- ```deeppavlov/configs/squad/*.json```
-
 # Features
 
 | Component | Description |
 | --------- | ----------- |
 | [NER component](deeppavlov/models/ner/README.md) | Based on neural Named Entity Recognition network. The NER component reproduces architecture from the paper [Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition](https://arxiv.org/pdf/1709.09686.pdf) which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. |
 | [Slot filling components](deeppavlov/models/slotfill/README.md) | Based on fuzzy Levenshtein search to extract normalized slot values from text. The components either rely on NER results or perform needle in haystack search.|
 | [Classification component](deeppavlov/models/classifiers/intents/README.md) | Component for classification tasks (intents, sentiment, etc). Based on shallow-and-wide Convolutional Neural Network architecture from [Kim Y. Convolutional neural networks for sentence classification – 2014](https://arxiv.org/pdf/1408.5882) and others. The model allows multilabel classification of sentences. |
+| [Goal-oriented bot](deeppavlov/models/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand.  |
+| [Seq2seq goal-oriented bot](deeppavlov/models/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
 | [Automatic spelling correction component](deeppavlov/models/spelling_correction/README.md) | Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors. |
 | [Ranking component](deeppavlov/models/ranking/README.md) |  Based on [LSTM-based deep learning models for non-factoid answer selection](https://arxiv.org/abs/1511.04108). The model performs ranking of responses or contexts from some database by their relevance for the given context. |
 | [Question Answering component](deeppavlov/models/squad/README.md) | Based on [R-NET: Machine Reading Comprehension with Self-matching Networks](https://www.microsoft.com/en-us/research/publication/mrc/). The model solves the task of looking for an answer on a question in a given context ([SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) task format). |
 | [Morphological tagging component](deeppavlov/models/morpho_tagger/README.md) | Based on character-based approach to morphological tagging [Heigold et al., 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages](http://www.aclweb.org/anthology/E17-1048). A state-of-the-art model for Russian and several other languages. Model assigns morphological tags in UD format to sequences of words.|
 | **Skills** |  |
-| [Goal-oriented bot](deeppavlov/skills/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand.  |
-| [Seq2seq goal-oriented bot](deeppavlov/skills/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
 |[ODQA](deeppavlov/skills/odqa/README.md) | An open domain question answering skill. The skill accepts free-form questions about the world and outputs an answer based on its Wikipedia knowledge.|
+| **Parameters Evolution** |  |
+| [Parameters evolution for models](deeppavlov/models/evolution/README.md) | Implementation of parameters evolution for DeepPavlov models that requires only some small changes in a config file. |
 | **Embeddings** |  |
 | [Pre-trained embeddings for the Russian language](pretrained-vectors.md) | Word vectors for the Russian language trained on joint [Russian Wikipedia](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0) and [Lenta.ru](https://lenta.ru/) corpora. |
 
-# Basic examples
-
-View video demo of deployment of a goal-oriented bot and a slot-filling model with Telegram UI
 
-[![Alt text for your video](https://img.youtube.com/vi/yzoiCa_sMuY/0.jpg)](https://youtu.be/yzoiCa_sMuY)
-          
+# Examples of some components
+       
  * Run goal-oriented bot with Telegram interface:
  ```
  python -m deeppavlov interactbot deeppavlov/configs/go_bot/gobot_dstc2.json -d -t <TELEGRAM_TOKEN>
@@ -185,6 +182,12 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
  ```
  python -m deeppavlov predict deeppavlov/configs/intents/intents_snips.json -d --batch-size 15 < /data/in.txt > /data/out.txt
  ```
+ 
+ View [video demo](https://youtu.be/yzoiCa_sMuY) of deployment of a goal-oriented bot and a slot-filling model with Telegram UI
+
+#  Tutorials
+
+Jupyter notebooks and videos explaining how to use DeepPalov for different tasks can be found in [/examples/tutorials/](examples/tutorials/)
 
 ---
 
@@ -239,7 +242,7 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
 </tr>
 </table>
 
-## Config
+## Config of component
 
 An NLP pipeline config is a JSON file that contains one required element `chainer`:
 

diff --git a/deeppavlov/__init__.py b/deeppavlov/__init__.py
@@ -18,92 +18,3 @@
 # check version
 import sys
 assert sys.hexversion >= 0x3060000, 'Does not work in python3.5 or lower'
-
-import deeppavlov.core.models.keras_model
-import deeppavlov.core.data.vocab
-import deeppavlov.core.data.simple_vocab
-import deeppavlov.core.data.sqlite_database
-import deeppavlov.dataset_readers.babi_reader
-import deeppavlov.dataset_readers.dstc2_reader
-import deeppavlov.dataset_readers.kvret_reader
-import deeppavlov.dataset_readers.conll2003_reader
-import deeppavlov.dataset_readers.typos_reader
-import deeppavlov.dataset_readers.basic_classification_reader
-import deeppavlov.dataset_readers.squad_dataset_reader
-import deeppavlov.dataset_readers.morphotagging_dataset_reader
-
-import deeppavlov.dataset_iterators.dialog_iterator
-import deeppavlov.dataset_iterators.kvret_dialog_iterator
-import deeppavlov.dataset_iterators.dstc2_ner_iterator
-import deeppavlov.dataset_iterators.dstc2_intents_iterator
-import deeppavlov.dataset_iterators.typos_iterator
-import deeppavlov.dataset_iterators.basic_classification_iterator
-import deeppavlov.dataset_iterators.squad_iterator
-import deeppavlov.dataset_iterators.sqlite_iterator
-import deeppavlov.dataset_iterators.morphotagger_iterator
-
-import deeppavlov.models.classifiers.intents.intent_model
-import deeppavlov.models.commutators.random_commutator
-import deeppavlov.models.embedders.fasttext_embedder
-import deeppavlov.models.embedders.dict_embedder
-import deeppavlov.models.embedders.glove_embedder
-import deeppavlov.models.embedders.bow_embedder
-import deeppavlov.models.spelling_correction.brillmoore.error_model
-import deeppavlov.models.spelling_correction.levenstein.searcher_component
-import deeppavlov.models.spelling_correction.electors.kenlm_elector
-import deeppavlov.models.spelling_correction.electors.top1_elector
-import deeppavlov.models.trackers.hcn_at
-import deeppavlov.models.trackers.hcn_et
-import deeppavlov.models.preprocessors.str_lower
-import deeppavlov.models.preprocessors.squad_preprocessor
-import deeppavlov.models.preprocessors.capitalization
-import deeppavlov.models.preprocessors.dirty_comments_preprocessor
-import deeppavlov.models.tokenizers.nltk_tokenizer
-import deeppavlov.models.tokenizers.nltk_moses_tokenizer
-import deeppavlov.models.tokenizers.spacy_tokenizer
-import deeppavlov.models.tokenizers.split_tokenizer
-import deeppavlov.models.tokenizers.ru_tokenizer
-import deeppavlov.models.squad.squad
-import deeppavlov.models.morpho_tagger.tagger
-import deeppavlov.models.morpho_tagger.common
-import deeppavlov.models.api_requester
-
-import deeppavlov.skills.go_bot.bot
-import deeppavlov.skills.go_bot.network
-import deeppavlov.skills.go_bot.tracker
-import deeppavlov.skills.seq2seq_go_bot.bot
-import deeppavlov.skills.seq2seq_go_bot.network
-import deeppavlov.skills.seq2seq_go_bot.kb
-import deeppavlov.skills.odqa.tfidf_ranker
-import deeppavlov.vocabs.typos
-import deeppavlov.vocabs.wiki_sqlite
-import deeppavlov.dataset_readers.insurance_reader
-import deeppavlov.dataset_iterators.ranking_iterator
-import deeppavlov.models.ner.network
-import deeppavlov.models.ranking.ranking_model
-import deeppavlov.models.ranking.metrics
-import deeppavlov.models.preprocessors.char_splitter
-import deeppavlov.models.preprocessors.mask
-import deeppavlov.models.preprocessors.assemble_embeddins_matrix
-import deeppavlov.models.preprocessors.capitalization
-import deeppavlov.models.preprocessors.field_getter
-import deeppavlov.models.preprocessors.sanitizer
-import deeppavlov.models.preprocessors.lazy_tokenizer
-import deeppavlov.models.slotfill.slotfill_raw
-import deeppavlov.models.slotfill.slotfill
-import deeppavlov.models.preprocessors.one_hotter
-import deeppavlov.dataset_readers.ontonotes_reader
-
-import deeppavlov.models.classifiers.tokens_matcher.tokens_matcher
-
-
-import deeppavlov.metrics.accuracy
-import deeppavlov.metrics.fmeasure
-import deeppavlov.metrics.bleu
-import deeppavlov.metrics.squad_metrics
-import deeppavlov.metrics.roc_auc_score
-import deeppavlov.metrics.fmeasure_classification
-
-import deeppavlov.core.common.log
-
-import deeppavlov.download