Quick fix :) (huggingface#606)

* Changing the name * style + quality * update doc and logo * clean up * circle-CI on the branche for now * fix daily dialog dataset * fix urls Co-authored-by: Quentin Lhoest <[email protected]>
SeekPoint · Sep 10, 2020 · 5f4c6e8 · 5f4c6e8
1 parent c53558f
commit 5f4c6e8
Show file tree

Hide file tree

Showing 428 changed files with 5,147 additions and 4,898 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -1,7 +1,7 @@
 version: 2
 jobs:
     run_dataset_script_tests_pyarrow_0p17:
-        working_directory: ~/nlp
+        working_directory: ~/datasets
         docker:
             - image: circleci/python:3.6
         resource_class: medium
@@ -11,10 +11,10 @@ jobs:
             - run: source venv/bin/activate
             - run: pip install .[tests]
             - run: pip install pyarrow==0.17.1
-            - run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
+            - run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/
 
     run_dataset_script_tests_pyarrow_1:
-        working_directory: ~/nlp
+        working_directory: ~/datasets
         docker:
             - image: circleci/python:3.6
         resource_class: medium
@@ -24,10 +24,10 @@ jobs:
             - run: source venv/bin/activate
             - run: pip install .[tests]
             - run: pip install pyarrow==1.0.0
-            - run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
+            - run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/
 
     check_code_quality:
-        working_directory: ~/nlp
+        working_directory: ~/datasets
         docker:
             - image: circleci/python:3.6
         resource_class: medium
@@ -39,7 +39,7 @@ jobs:
             - run: isort --check-only tests src benchmarks datasets metrics
             - run: flake8 tests src benchmarks datasets metrics
     build_doc:
-        working_directory: ~/nlp
+        working_directory: ~/datasets
         docker:
             - image: circleci/python:3.6
         steps:
@@ -49,7 +49,7 @@ jobs:
             - store_artifacts:
                   path: ./docs/_build
     deploy_doc:
-        working_directory: ~/nlp
+        working_directory: ~/datasets
         docker:
             - image: circleci/python:3.6
         steps:

diff --git a/.circleci/deploy.sh b/.circleci/deploy.sh
@@ -28,12 +28,12 @@ function deploy_doc(){
 	fi
 }
 
-# You can find the commit for each tag on https://github.com/huggingface/nlp/tags
-# Deploys the master documentation on huggingface.co/nlp/master
+# You can find the commit for each tag on https://github.com/huggingface/datasets/tags
+# Deploys the master documentation on huggingface.co/datasets/master
 deploy_doc "master" master
 
 # Example of how to deploy a doc on a certain commit (the commit doesn't have to be on the master branch).
-# The following commit would live on huggingface.co/nlp/v1.0.0
+# The following commit would live on huggingface.co/datasets/v1.0.0
 #deploy_doc "b33a385" v1.0.0
 deploy_doc "99e0ee6" v0.3.0
 deploy_doc "21e8091" v0.4.0

diff --git a/AUTHORS b/AUTHORS
@@ -1,4 +1,4 @@
-# This is the list of HuggingFace NLP authors for copyright purposes.
+# This is the list of HuggingFace Datasets authors for copyright purposes.
 #
 # This does not necessarily list everyone who has contributed code, since in
 # some cases, their employer may be the copyright holder.  To see the full list

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,13 +1,13 @@
-# How to contribute to nlp?
+# How to contribute to Datasets?
 
-1. Fork the [repository](https://github.com/huggingface/nlp) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
+1. Fork the [repository](https://github.com/huggingface/datasets) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
 
 2. Clone your fork to your local disk, and add the base repository as a remote:
 
 	```bash
-	git clone [email protected]:<your Github handle>/nlp.git
-	cd nlp
-	git remote add upstream https://github.com/huggingface/nlp.git
+	git clone [email protected]:<your Github handle>/datasets.git
+	cd datasets
+	git remote add upstream https://github.com/huggingface/datasets.git
 	```
 
 3. Create a new branch to hold your development changes:
@@ -24,11 +24,11 @@
 	pip install -e ".[dev]"
 	```
 
-   (If nlp was already installed in the virtual environment, remove
-   it with `pip uninstall nlp` before reinstalling it in editable
+   (If datasets was already installed in the virtual environment, remove
+   it with `pip uninstall datasets` before reinstalling it in editable
    mode with the `-e` flag.)
 
-5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/nlp/add_dataset.html) and [share a dataset](https://huggingface.co/nlp/share_dataset.html) in the documentation.
+5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/datasets/add_dataset.html) and [share a dataset](https://huggingface.co/datasets/share_dataset.html) in the documentation.
 
 6. Format your code. Run black and isort so that your newly added files look nice with the following command:
 
@@ -60,20 +60,20 @@
 8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.
 
 ## How-To-Add a dataset
-1. Make sure you followed steps 1-4 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp).
+1. Make sure you followed steps 1-4 of the section [*How to contribute to datasets?*](#how-to-contribute-to-datasets).
 
-2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(nlp.GeneratorBasedBuilder)` for the dataset `book_corpus`).
+2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(datasets.GeneratorBasedBuilder)` for the dataset `book_corpus`).
 
-3. **Make sure you run all of the following commands from the root of your `nlp` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
+3. **Make sure you run all of the following commands from the root of your `datasets` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
 
 	```bash
-	python nlp-cli test datasets/<your-dataset-folder> --save_infos --all_configs
+	python datasets-cli test datasets/<your-dataset-folder> --save_infos --all_configs
 	```
 
 4. If the command was succesful, you should now create some dummy data. Use the following command to get in-detail instructions on how to create the dummy data:
 
 	```bash
-	python nlp-cli dummy_data datasets/<your-dataset-folder>
+	python datasets-cli dummy_data datasets/<your-dataset-folder>
 	```
 
 5. Now test that both the real data and the dummy data work correctly using the following commands:
@@ -89,7 +89,7 @@
 	RUN_SLOW=1 pytest tests/test_dataset_common.py::LocalDatasetTest::test_load_dataset_all_configs_<your-dataset-name>
 	```
 
-6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
+6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to 🤗Datasets?*](#how-to-contribute-to-🤗Datasets). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
 
 
 ### Help for dummy data tests
@@ -98,7 +98,7 @@ Follow these steps in case the dummy data test keeps failing:
 
 - Verify that all filenames are spelled correctly. Rerun the command
 	```bash
-	python nlp-cli dummy_data datasets/<your-dataset-folder>
+	python datasets-cli dummy_data datasets/<your-dataset-folder>
 	```
 	and make sure you follow the exact instructions provided by the command of step 5).