Run end-of-file and trailing-whitespace fixer on all files

mesur-io · Jun 28, 2022 · 94f0da0 · 94f0da0
1 parent ae1f07f
commit 94f0da0
Show file tree

Hide file tree

Showing 44 changed files with 410 additions and 442 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -10,4 +10,4 @@ before_script: cd tests
 script:
   - pip freeze
   - 'if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then pytest --runintegration; fi'
-  - 'if [ "$TRAVIS_PULL_REQUEST" = "false" ]; then pytest; fi'
+  - 'if [ "$TRAVIS_PULL_REQUEST" = "false" ]; then pytest; fi'
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,24 +1,24 @@
 # Contributing to Flair
 
-We are happy to accept your contributions to make `flair` better and more awesome! To avoid unnecessary work on either 
+We are happy to accept your contributions to make `flair` better and more awesome! To avoid unnecessary work on either
 side, please stick to the following process:
 
 1. Check if there is already [an issue](https://github.com/zalandoresearch/flair/issues) for your concern.
 2. If there is not, open a new one to start a discussion. We hate to close finished PRs!
-3. If we decide your concern needs code changes, we would be happy to accept a pull request. Please consider the 
+3. If we decide your concern needs code changes, we would be happy to accept a pull request. Please consider the
 commit guidelines below.
 
-In case you just want to help out and don't know where to start, 
-[issues with "help wanted" label](https://github.com/zalandoresearch/flair/labels/help%20wanted) are good for 
-first-time contributors. 
+In case you just want to help out and don't know where to start,
+[issues with "help wanted" label](https://github.com/zalandoresearch/flair/labels/help%20wanted) are good for
+first-time contributors.
 
 
 ## Git Commit Guidelines
 
-If there is already a ticket, use this number at the start of your commit message. 
+If there is already a ticket, use this number at the start of your commit message.
 Use meaningful commit messages that described what you did.
 
-**Example:** `GH-42: Added new type of embeddings: DocumentEmbedding.` 
+**Example:** `GH-42: Added new type of embeddings: DocumentEmbedding.`
 
 
 ## Developing locally
@@ -62,7 +62,7 @@ To run integration tests execute:
 pytest --runintegration
 ```
 The integration tests will train small models and therefore take more time.
-In general, it is recommended to ensure all basic tests are running through before testing the integration tests 
+In general, it is recommended to ensure all basic tests are running through before testing the integration tests
 
 ### code formatting
 
@@ -75,4 +75,4 @@ You can automatically format the code via `black --config pyproject.toml flair/
 
 If you want to automatically format your code on every commit, you can use [pre-commit](https://pre-commit.com/).
 Just install it via `pip install pre-commit` and execute `pre-commit install` in the root folder.
-This will add a hook to the repository, which reformats files on every commit.
+This will add a hook to the repository, which reformats files on every commit.
diff --git a/LICENSE b/LICENSE
@@ -6,4 +6,4 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
 
 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 
-THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/MAINTAINERS b/MAINTAINERS
@@ -1,2 +1,2 @@
 Alan Akbik <[email protected]>
-Tanja Bergmann <[email protected]>
+Tanja Bergmann <[email protected]>
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ A very simple framework for **state-of-the-art NLP**. Developed by [Humboldt Uni
 Flair is:
 
 * **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
-models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), 
+models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS),
   special support for [biomedical data](/resources/docs/HUNFLAIR.md),
  sense disambiguation and classification, with support for a rapidly growing number of languages.
 
@@ -27,7 +27,7 @@ Now at [version 0.11](https://github.com/flairNLP/flair/releases)!
 
 ## State-of-the-Art Models
 
-Flair ships with state-of-the-art models for a range of NLP tasks. For instance, check out our latest NER models: 
+Flair ships with state-of-the-art models for a range of NLP tasks. For instance, check out our latest NER models:
 
 | Language | Dataset | Flair | Best published | Model card & demo
 |  ---  | ----------- | ---------------- | ------------- | ------------- |
@@ -37,7 +37,7 @@ Flair ships with state-of-the-art models for a range of NLP tasks. For instance,
 | Dutch  | Conll-03  (4-class)  |  **95.25**  | *93.7 [(Yu et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Dutch 4-class NER demo](https://huggingface.co/flair/ner-dutch-large)  |
 | Spanish  | Conll-03 (4-class)   |  **90.54** | *90.3 [(Yu et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.577.pdf)* | [Flair Spanish 4-class NER demo](https://huggingface.co/flair/ner-spanish-large)  |
 
-**New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted 
+**New:** Most Flair sequence tagging models (named entity recognition, part-of-speech tagging etc.) are now hosted
 on the [__🤗 HuggingFace model hub__](https://huggingface.co/models?library=flair&sort=downloads)! You can browse models, check detailed information on how they were trained, and even try each model out online!
 
 

diff --git a/SECURITY.md b/SECURITY.md
@@ -1,5 +1,5 @@
 We acknowledge that every line of code that we write may potentially contain security issues.
-We are trying to deal with it responsibly and provide patches as quickly as possible. 
+We are trying to deal with it responsibly and provide patches as quickly as possible.
 
 We host our bug bounty program on HackerOne, it is currently private, therefore if you would like to report a vulnerability and get rewarded for it, please ask to join our program by filling this form:
 

diff --git a/resources/docs/EXPERIMENTS.md b/resources/docs/EXPERIMENTS.md
@@ -4,7 +4,7 @@ Here, we collect the best embedding configurations for each NLP task. If
 you achieve better numbers, let us know which exact configuration of Flair
 you used and we will add your experiment here!
 
-**Data.** For each experiment, you need to first get the evaluation dataset. Then execute the code as provided in this 
+**Data.** For each experiment, you need to first get the evaluation dataset. Then execute the code as provided in this
 documentation. Also check out the [tutorials](/resources/docs/TUTORIAL_1_BASICS.md) to get a better overview of
 how Flair works.
 
@@ -17,7 +17,7 @@ how Flair works.
 
 #### Data
 The [CoNLL-03 data set for English](https://www.clips.uantwerpen.be/conll2003/ner/) is probably the most
-well-known dataset to evaluate NER on. It contains 4 entity classes. Follows the steps on the task Web site to 
+well-known dataset to evaluate NER on. It contains 4 entity classes. Follows the steps on the task Web site to
 get the dataset and place train, test and dev data in `/resources/tasks/conll_03/` as follows:
 
 ```
@@ -26,7 +26,7 @@ resources/tasks/conll_03/eng.testb
 resources/tasks/conll_03/eng.train
 ```
 
-This allows the `CONLL_03()` corpus object to read the data into our data structures. Initialize the corpus as follows: 
+This allows the `CONLL_03()` corpus object to read the data into our data structures. Initialize the corpus as follows:
 
 ```python
 from flair.datasets import CONLL_03
@@ -37,7 +37,7 @@ This gives you a `Corpus` object that contains the data. Now, select `ner` as th
 
 #### Best Known Configuration
 
-The full code to get a state-of-the-art model for English NER is as follows: 
+The full code to get a state-of-the-art model for English NER is as follows:
 
 ```python
 from flair.data import Corpus
@@ -83,7 +83,7 @@ from flair.trainers import ModelTrainer
 trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-ner',
-              train_with_dev=True,  
+              train_with_dev=True,
               max_epochs=150)
 ```
 
@@ -146,7 +146,7 @@ from flair.trainers import ModelTrainer
 trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-ner',
-              train_with_dev=True,  
+              train_with_dev=True,
               max_epochs=150)
 ```
 
@@ -248,7 +248,7 @@ from flair.trainers import ModelTrainer
 trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-ner',
-              train_with_dev=True,  
+              train_with_dev=True,
               max_epochs=150)
 ```
 
@@ -262,8 +262,8 @@ trainer.train('resources/taggers/example-ner',
 #### Data
 
 The [Ontonotes corpus](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf) is one of the best resources
-for different types of NLP and contains rich NER annotation. Get the corpus and split it into train, test and dev 
-splits using the scripts provided by the [CoNLL-12 shared task](http://conll.cemantix.org/2012/data.html). 
+for different types of NLP and contains rich NER annotation. Get the corpus and split it into train, test and dev
+splits using the scripts provided by the [CoNLL-12 shared task](http://conll.cemantix.org/2012/data.html).
 
 Place train, test and dev data in CoNLL-03 format in  `resources/tasks/onto-ner/` as follows:
 
@@ -275,8 +275,8 @@ resources/tasks/onto-ner/eng.train
 
 #### Best Known Configuration
 
-Once you have the data, reproduce our experiments exactly like for CoNLL-03, just with a different dataset and with 
-FastText embeddings (they work better on this dataset). You also need to provide a `column_format` for the `ColumnCorpus` object indicating which column in the training file is the 'ner' information. The full code then is as follows: 
+Once you have the data, reproduce our experiments exactly like for CoNLL-03, just with a different dataset and with
+FastText embeddings (they work better on this dataset). You also need to provide a `column_format` for the `ColumnCorpus` object indicating which column in the training file is the 'ner' information. The full code then is as follows:
 
 ```python
 from flair.data import Corpus
@@ -318,7 +318,7 @@ trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-ner',
               learning_rate=0.1,
-              train_with_dev=True,  
+              train_with_dev=True,
               # it's a big dataset so maybe set embeddings_storage_mode to 'none' (embeddings are not kept in memory)
               embeddings_storage_mode='none')
 ```
@@ -335,16 +335,16 @@ trainer.train('resources/taggers/example-ner',
 
 Get the [Penn treebank](https://catalog.ldc.upenn.edu/ldc99t42) and follow the guidelines
 in [Collins (2002)](http://www.cs.columbia.edu/~mcollins/papers/tagperc.pdf) to produce train, dev and test splits.
-Convert splits into CoNLLU-U format and place train, test and dev data in `/path/to/penn/` as follows: 
+Convert splits into CoNLLU-U format and place train, test and dev data in `/path/to/penn/` as follows:
 
 ```
 /path/to/penn/test.conll
 /path/to/penn/train.conll
 /path/to/penn/valid.conll
 ```
 
-Then, run the experiments with extvec embeddings and contextual string embeddings. Also, select 'pos' as `tag_type`, 
-so the algorithm knows that POS tags and not NER are to be predicted from this data. 
+Then, run the experiments with extvec embeddings and contextual string embeddings. Also, select 'pos' as `tag_type`,
+so the algorithm knows that POS tags and not NER are to be predicted from this data.
 
 #### Best Known Configuration
 
@@ -385,7 +385,7 @@ from flair.trainers import ModelTrainer
 trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-pos',
-              train_with_dev=True,  
+              train_with_dev=True,
               max_epochs=150)
 ```
 
@@ -400,8 +400,8 @@ Data is included in Flair and will get automatically downloaded when you run the
 
 
 #### Best Known Configuration
-Run the code with extvec embeddings and our proposed contextual string embeddings. Use 'np' as `tag_type`, 
-so the algorithm knows that chunking tags and not NER are to be predicted from this data. 
+Run the code with extvec embeddings and our proposed contextual string embeddings. Use 'np' as `tag_type`,
+so the algorithm knows that chunking tags and not NER are to be predicted from this data.
 
 ```python
 from flair.data import Corpus
@@ -441,6 +441,6 @@ from flair.trainers import ModelTrainer
 trainer: ModelTrainer = ModelTrainer(tagger, corpus)
 
 trainer.train('resources/taggers/example-chunk',
-              train_with_dev=True,  
+              train_with_dev=True,
               max_epochs=150)
 ```
diff --git a/resources/docs/HUNFLAIR.md b/resources/docs/HUNFLAIR.md
@@ -1,37 +1,37 @@
 # HunFlair
 
-*HunFlair* is a state-of-the-art NER tagger for biomedical texts. It comes with 
-models for genes/proteins, chemicals, diseases, species and cell lines. *HunFlair* 
-builds on pretrained domain-specific language models and outperforms other biomedical 
-NER tools on unseen corpora. Furthermore, it contains harmonized versions of [31 biomedical 
+*HunFlair* is a state-of-the-art NER tagger for biomedical texts. It comes with
+models for genes/proteins, chemicals, diseases, species and cell lines. *HunFlair*
+builds on pretrained domain-specific language models and outperforms other biomedical
+NER tools on unseen corpora. Furthermore, it contains harmonized versions of [31 biomedical
 NER data sets](HUNFLAIR_CORPORA.md) and comes with a Flair language model ("pubmed-X") and
 FastText embeddings ("pubmed") that were trained on roughly 3 million full texts and about
 25 million abstracts from the biomedical domain.
 
-<b>Content:</b> 
-[Quick Start](#quick-start) | 
+<b>Content:</b>
+[Quick Start](#quick-start) |
 [BioNER-Tool Comparison](#comparison-to-other-biomedical-ner-tools) |
-[Tutorials](#tutorials) | 
-[Citing HunFlair](#citing-hunflair) 
+[Tutorials](#tutorials) |
+[Citing HunFlair](#citing-hunflair)
 
 ## Quick Start
 
 #### Requirements and Installation
-*HunFlair* is based on Flair 0.6+ and Python 3.6+. 
+*HunFlair* is based on Flair 0.6+ and Python 3.6+.
 If you do not have Python 3.6, install it first. [Here is how for Ubuntu 16.04](https://vsupalov.com/developing-with-python3-6-on-ubuntu-16-04/).
 Then, in your favorite virtual environment, simply do:
 ```
 pip install flair
 ```
-Furthermore, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing 
+Furthermore, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing
 and tokenization of scientific / biomedical texts:
  ```
 pip install scispacy==0.2.5
 pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
 ```
- 
+
 #### Example Usage
-Let's run named entity recognition (NER) over an example sentence. All you need to do is 
+Let's run named entity recognition (NER) over an example sentence. All you need to do is
 make a Sentence, load a pre-trained model and use it to predict tags for the sentence:
 ```python
 from flair.data import Sentence
@@ -63,15 +63,15 @@ Span[6:7]: "Mouse" → Species (0.9979)
 ~~~
 
 ## Comparison to other biomedical NER tools
-Tools for biomedical NER are typically trained and evaluated on rather small gold standard data sets. 
-However, they are applied "in the wild" to a much larger collection of texts, often varying in 
-topic, entity distribution, genre (e.g. patents vs. scientific articles) and text type (e.g. abstract 
+Tools for biomedical NER are typically trained and evaluated on rather small gold standard data sets.
+However, they are applied "in the wild" to a much larger collection of texts, often varying in
+topic, entity distribution, genre (e.g. patents vs. scientific articles) and text type (e.g. abstract
 vs. full text), which can lead to severe drops in performance.
 
 *HunFlair* outperforms other biomedical NER tools on corpora not used for training of neither *HunFlair*
 or any of the competitor tools.
 
-| Corpus         | Entity Type  | Misc<sup><sub>[1](#f1)</sub></sup>   | SciSpaCy | HUNER | HunFlair | 
+| Corpus         | Entity Type  | Misc<sup><sub>[1](#f1)</sub></sup>   | SciSpaCy | HUNER | HunFlair |
 | ---            | ---          | ---    | ---   | ---  | ---         |
 | [CRAFT v4.0](https://github.com/UCDenver-ccp/CRAFT)     | Chemical     | 42.88 | 35.73 | 42.99 | *__59.83__* |
 |                | Gene/Protein | 64.93 | 47.76 | 50.77 | *__73.51__* |
@@ -82,16 +82,16 @@ or any of the competitor tools.
 |                | Species      | *__80.53__* | 57.11 | 67.84 | 76.41 |
 | [Plant-Disease](http://gcancer.org/pdr/)  | Species      | 80.63 | 75.90 | 73.64 | *__83.44__*  |
 
-<sub>All results are F1 scores using partial matching of predicted text offsets with the original char offsets 
+<sub>All results are F1 scores using partial matching of predicted text offsets with the original char offsets
 of the gold standard data. We allow a shift by max one character.</sub>
 
-<sub><a name="f1">1</a>:  Misc displays the results of multiple taggers: 
-[tmChem](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmchem/) for Chemical, 
-[GNormPus](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/gnormplus/) for Gene and Species, and 
+<sub><a name="f1">1</a>:  Misc displays the results of multiple taggers:
+[tmChem](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmchem/) for Chemical,
+[GNormPus](https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/gnormplus/) for Gene and Species, and
 [DNorm](https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/DNorm.html) for Disease
 </sub>
 
-Here's how to [reproduce these numbers](HUNFLAIR_EXPERIMENTS.md) using Flair. 
+Here's how to [reproduce these numbers](HUNFLAIR_EXPERIMENTS.md) using Flair.
 You can find detailed evaluations and discussions in [our paper](https://arxiv.org/abs/2008.07347).
 
 ## Tutorials
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,4 +6,4 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of

		The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

		THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
		THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.