-
Notifications
You must be signed in to change notification settings - Fork 395
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
43 changed files
with
8,910 additions
and
5,393 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,88 @@ | ||
::::{tab-set} | ||
<div class="start-page__intro" markdown="1"> | ||
|
||
:::{tab-item} Feedback datasets | ||
# Welcome to | ||
|
||
```python | ||
# install datasets library with pip install datasets | ||
import argilla as rg | ||
from datasets import load_dataset | ||
## Argilla is a platform to build high-quality AI datasets | ||
|
||
If you need support join the [Argilla Slack community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) | ||
|
||
</div> | ||
|
||
<div class="start-page__content" markdown="1"> | ||
|
||
Get started by publishing your first dataset. | ||
|
||
# load an Argilla Feedback Dataset from the Hugging Face Hub | ||
# look for other datasets at https://huggingface.co/datasets?other=argilla | ||
dataset = rg.FeedbackDataset.from_huggingface("argilla/oasst_response_quality", split="train") | ||
### 1. Open an IDE, Jupyter or Collab | ||
|
||
# push the dataset to Argilla | ||
dataset.push_to_argilla("oasst_response_quality") | ||
If you're a Collab user, you can directly use our [introductory tutorial](https://colab.research.google.com/github/argilla-io/argilla/blob/develop/docs/_source/getting_started/quickstart_workflow_feedback.ipynb). | ||
|
||
### 2. Install the SDK with pip | ||
|
||
To work with Argilla datasets, you need to use the Argilla SDK. You can install the SDK with pip as follows: | ||
|
||
```sh | ||
pip install argilla -U | ||
``` | ||
::: | ||
|
||
:::{tab-item} Other datasets | ||
### 3. Connect to your Argilla server | ||
|
||
Get your `ARGILLA_API_URL`: | ||
|
||
- If you are using Docker, it is the URL shown in your browser (by default `http://localhost:6900`) | ||
- If you are using HF Spaces, it should be constructed as follows: `https://[your-owner-name]-[your_space_name].hf.space` | ||
|
||
Get your `ARGILLA_API_KEY` you find in ["My settings"](/user-settings) and copy the API key. | ||
|
||
Make sure to replace `ARGILLA_API_URL` and `ARGILLA_API_KEY` in the code below. If you are using a private HF Space, you need to specify your `HF_TOKEN` which can be found [here](https://huggingface.co/settings/tokens). | ||
|
||
```python | ||
# install datasets library with pip install datasets | ||
import argilla as rg | ||
from datasets import load_dataset | ||
|
||
# load dataset from the hub | ||
dataset = load_dataset("argilla/gutenberg_spacy-ner", split="train") | ||
rg.init( | ||
api_url="ARGILLA_API_URL", | ||
api_key="ARGILLA_API_KEY", | ||
# extra_headers={"Authorization": f"Bearer {"HF_TOKEN"}"} | ||
) | ||
``` | ||
|
||
### 4. Create your first dataset | ||
|
||
Specify a workspace where the dataset will be created. Check your workspaces in ["My settings"](/user_settings). To create a new workspace, check the [docs](https://docs.argilla.io/en/latest/getting_started/installation/configurations/workspace_management.html). | ||
|
||
Create a Dataset with two labels ("sadness" and "joy"). Don't forget to replace "<your-workspace>". Here, we are using a task template, check the docs to [create a fully custom dataset](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/create_dataset.html). | ||
|
||
```python | ||
dataset = rg.FeedbackDataset.for_text_classification( | ||
labels=["sadness", "joy"], | ||
multi_label=False, | ||
use_markdown=True, | ||
guidelines=None, | ||
metadata_properties=None, | ||
vectors_settings=None, | ||
) | ||
dataset.push_to_argilla(name="my-first-dataset", workspace="<your-workspace>") | ||
``` | ||
|
||
### 5. Add records | ||
|
||
# read in dataset, assuming its a dataset for token classification | ||
dataset_rg = rg.read_datasets(dataset, task="TokenClassification") | ||
Create a list with the records you want to add. Ensure that you match the fields with the ones specified in the previous step. | ||
|
||
# log the dataset | ||
rg.log(dataset_rg, "gutenberg_spacy-ner") | ||
You can also use `pandas` or `load_dataset` to [read an existing dataset and create records from it](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/records.html#add-records). | ||
|
||
```python | ||
records = [ | ||
rg.FeedbackRecord( | ||
fields={ | ||
"text": "I am so happy today", | ||
}, | ||
), | ||
rg.FeedbackRecord( | ||
fields={ | ||
"text": "I feel sad today", | ||
}, | ||
) | ||
] | ||
dataset.add_records(records) | ||
``` | ||
::: | ||
:::: | ||
|
||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+99 KB
...ource/_static/tutorials/add-text-descriptives-as-metadata/text-descriptives.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.