Skip to content

Commit

Permalink
Quick fix :) (huggingface#606)
Browse files Browse the repository at this point in the history
* Changing the name

* style + quality

* update doc and logo

* clean up

* circle-CI on the branche for now

* fix daily dialog dataset

* fix urls

Co-authored-by: Quentin Lhoest <[email protected]>
  • Loading branch information
thomwolf and lhoestq authored Sep 10, 2020
1 parent c53558f commit 5f4c6e8
Show file tree
Hide file tree
Showing 428 changed files with 5,147 additions and 4,898 deletions.
14 changes: 7 additions & 7 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: 2
jobs:
run_dataset_script_tests_pyarrow_0p17:
working_directory: ~/nlp
working_directory: ~/datasets
docker:
- image: circleci/python:3.6
resource_class: medium
Expand All @@ -11,10 +11,10 @@ jobs:
- run: source venv/bin/activate
- run: pip install .[tests]
- run: pip install pyarrow==0.17.1
- run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
- run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/

run_dataset_script_tests_pyarrow_1:
working_directory: ~/nlp
working_directory: ~/datasets
docker:
- image: circleci/python:3.6
resource_class: medium
Expand All @@ -24,10 +24,10 @@ jobs:
- run: source venv/bin/activate
- run: pip install .[tests]
- run: pip install pyarrow==1.0.0
- run: HF_SCRIPTS_VERSION=master python -m pytest -sv ./tests/
- run: HF_SCRIPTS_VERSION=datasets python -m pytest -sv ./tests/

check_code_quality:
working_directory: ~/nlp
working_directory: ~/datasets
docker:
- image: circleci/python:3.6
resource_class: medium
Expand All @@ -39,7 +39,7 @@ jobs:
- run: isort --check-only tests src benchmarks datasets metrics
- run: flake8 tests src benchmarks datasets metrics
build_doc:
working_directory: ~/nlp
working_directory: ~/datasets
docker:
- image: circleci/python:3.6
steps:
Expand All @@ -49,7 +49,7 @@ jobs:
- store_artifacts:
path: ./docs/_build
deploy_doc:
working_directory: ~/nlp
working_directory: ~/datasets
docker:
- image: circleci/python:3.6
steps:
Expand Down
6 changes: 3 additions & 3 deletions .circleci/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ function deploy_doc(){
fi
}

# You can find the commit for each tag on https://github.com/huggingface/nlp/tags
# Deploys the master documentation on huggingface.co/nlp/master
# You can find the commit for each tag on https://github.com/huggingface/datasets/tags
# Deploys the master documentation on huggingface.co/datasets/master
deploy_doc "master" master

# Example of how to deploy a doc on a certain commit (the commit doesn't have to be on the master branch).
# The following commit would live on huggingface.co/nlp/v1.0.0
# The following commit would live on huggingface.co/datasets/v1.0.0
#deploy_doc "b33a385" v1.0.0
deploy_doc "99e0ee6" v0.3.0
deploy_doc "21e8091" v0.4.0
Expand Down
2 changes: 1 addition & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This is the list of HuggingFace NLP authors for copyright purposes.
# This is the list of HuggingFace Datasets authors for copyright purposes.
#
# This does not necessarily list everyone who has contributed code, since in
# some cases, their employer may be the copyright holder. To see the full list
Expand Down
30 changes: 15 additions & 15 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# How to contribute to nlp?
# How to contribute to Datasets?

1. Fork the [repository](https://github.com/huggingface/nlp) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
1. Fork the [repository](https://github.com/huggingface/datasets) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

2. Clone your fork to your local disk, and add the base repository as a remote:

```bash
git clone [email protected]:<your Github handle>/nlp.git
cd nlp
git remote add upstream https://github.com/huggingface/nlp.git
git clone [email protected]:<your Github handle>/datasets.git
cd datasets
git remote add upstream https://github.com/huggingface/datasets.git
```

3. Create a new branch to hold your development changes:
Expand All @@ -24,11 +24,11 @@
pip install -e ".[dev]"
```

(If nlp was already installed in the virtual environment, remove
it with `pip uninstall nlp` before reinstalling it in editable
(If datasets was already installed in the virtual environment, remove
it with `pip uninstall datasets` before reinstalling it in editable
mode with the `-e` flag.)

5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/nlp/add_dataset.html) and [share a dataset](https://huggingface.co/nlp/share_dataset.html) in the documentation.
5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/datasets/add_dataset.html) and [share a dataset](https://huggingface.co/datasets/share_dataset.html) in the documentation.

6. Format your code. Run black and isort so that your newly added files look nice with the following command:

Expand Down Expand Up @@ -60,20 +60,20 @@
8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

## How-To-Add a dataset
1. Make sure you followed steps 1-4 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp).
1. Make sure you followed steps 1-4 of the section [*How to contribute to datasets?*](#how-to-contribute-to-datasets).

2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(nlp.GeneratorBasedBuilder)` for the dataset `book_corpus`).
2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(datasets.GeneratorBasedBuilder)` for the dataset `book_corpus`).

3. **Make sure you run all of the following commands from the root of your `nlp` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
3. **Make sure you run all of the following commands from the root of your `datasets` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:

```bash
python nlp-cli test datasets/<your-dataset-folder> --save_infos --all_configs
python datasets-cli test datasets/<your-dataset-folder> --save_infos --all_configs
```

4. If the command was succesful, you should now create some dummy data. Use the following command to get in-detail instructions on how to create the dummy data:

```bash
python nlp-cli dummy_data datasets/<your-dataset-folder>
python datasets-cli dummy_data datasets/<your-dataset-folder>
```

5. Now test that both the real data and the dummy data work correctly using the following commands:
Expand All @@ -89,7 +89,7 @@
RUN_SLOW=1 pytest tests/test_dataset_common.py::LocalDatasetTest::test_load_dataset_all_configs_<your-dataset-name>
```

6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to 🤗Datasets?*](#how-to-contribute-to-🤗Datasets). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.


### Help for dummy data tests
Expand All @@ -98,7 +98,7 @@ Follow these steps in case the dummy data test keeps failing:

- Verify that all filenames are spelled correctly. Rerun the command
```bash
python nlp-cli dummy_data datasets/<your-dataset-folder>
python datasets-cli dummy_data datasets/<your-dataset-folder>
```
and make sure you follow the exact instructions provided by the command of step 5).

Expand Down
Loading

0 comments on commit 5f4c6e8

Please sign in to comment.