Skip to content

Commit

Permalink
Merge branch 'main' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
damianpumar committed Dec 21, 2023
2 parents e3f0992 + b1cb46c commit 3157945
Show file tree
Hide file tree
Showing 43 changed files with 8,910 additions and 5,393 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,22 @@ These are the section headers that we use:

## [Unreleased]()

## [1.21.0](https://github.com/argilla-io/argilla/compare/v1.20.0...v1.21.0)

### Added

- Added new draft queue for annotation view ([#4334](https://github.com/argilla-io/argilla/pull/4334))
- Added annotation metrics module for the `FeedbackDataset` (`argilla.client.feedback.metrics`). ([#4175](https://github.com/argilla-io/argilla/pull/4175)).
- Added strategy to handle and translate errors from the server for `401` HTTP status code` ([#4362](https://github.com/argilla-io/argilla/pull/4362))
- Added integration for `textdescriptives` using `TextDescriptivesExtractor` to configure `metadata_properties` in `FeedbackDataset` and `FeedbackRecord`. ([#4400](https://github.com/argilla-io/argilla/pull/4400)). Contributed by @m-newhauser
- Added `POST /api/v1/me/responses/bulk` endpoint to create responses in bulk for current user. ([#4380](https://github.com/argilla-io/argilla/pull/4380))
- Added list support for term metadata properties. (Closes [#4359](https://github.com/argilla-io/argilla/issues/4359))
- Added new CLI task to reindex datasets and records into the search engine. ([#4404](https://github.com/argilla-io/argilla/pull/4404))
- Added `httpx_extra_kwargs` argument to `rg.init` and `Argilla` to allow passing extra arguments to `httpx.Client` used by `Argilla`. ([#4440](https://github.com/argilla-io/argilla/pull/4441))

### Changed

- More productive and simpler shortcuts system ([#4215](https://github.com/argilla-io/argilla/pull/4215))
- Move `ArgillaSingleton`, `init` and `active_client` to a new module `singleton`. ([#4347](https://github.com/argilla-io/argilla/pull/4347))
- Updated `argilla.load` functions to also work with `FeedbackDataset`s. ([#4347](https://github.com/argilla-io/argilla/pull/4347))
- [breaking] Updated `argilla.delete` functions to also work with `FeedbackDataset`s. It now raises an error if the dataset does not exist. ([#4347](https://github.com/argilla-io/argilla/pull/4347))
Expand All @@ -36,6 +42,10 @@ These are the section headers that we use:
- Fixed error in `TextClassificationSettings.from_dict` method in which the `label_schema` created was a list of `dict` instead of a list of `str`. ([#4347](https://github.com/argilla-io/argilla/pull/4347))
- Fixed total records on pagination component ([#4424](https://github.com/argilla-io/argilla/pull/4424))

### Removed

- Removed `draft` auto save for annotation view ([#4334](https://github.com/argilla-io/argilla/pull/4334))

## [1.20.0](https://github.com/argilla-io/argilla/compare/v1.19.0...v1.20.0)

### Added
Expand Down
98 changes: 75 additions & 23 deletions docs/_source/_common/snippets/start_page.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,88 @@
::::{tab-set}
<div class="start-page__intro" markdown="1">

:::{tab-item} Feedback datasets
# Welcome to

```python
# install datasets library with pip install datasets
import argilla as rg
from datasets import load_dataset
## Argilla is a platform to build high-quality AI datasets

If you need support join the [Argilla Slack community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g)

</div>

<div class="start-page__content" markdown="1">

Get started by publishing your first dataset.

# load an Argilla Feedback Dataset from the Hugging Face Hub
# look for other datasets at https://huggingface.co/datasets?other=argilla
dataset = rg.FeedbackDataset.from_huggingface("argilla/oasst_response_quality", split="train")
### 1. Open an IDE, Jupyter or Collab

# push the dataset to Argilla
dataset.push_to_argilla("oasst_response_quality")
If you're a Collab user, you can directly use our [introductory tutorial](https://colab.research.google.com/github/argilla-io/argilla/blob/develop/docs/_source/getting_started/quickstart_workflow_feedback.ipynb).

### 2. Install the SDK with pip

To work with Argilla datasets, you need to use the Argilla SDK. You can install the SDK with pip as follows:

```sh
pip install argilla -U
```
:::

:::{tab-item} Other datasets
### 3. Connect to your Argilla server

Get your `ARGILLA_API_URL`:

- If you are using Docker, it is the URL shown in your browser (by default `http://localhost:6900`)
- If you are using HF Spaces, it should be constructed as follows: `https://[your-owner-name]-[your_space_name].hf.space`

Get your `ARGILLA_API_KEY` you find in ["My settings"](/user-settings) and copy the API key.

Make sure to replace `ARGILLA_API_URL` and `ARGILLA_API_KEY` in the code below. If you are using a private HF Space, you need to specify your `HF_TOKEN` which can be found [here](https://huggingface.co/settings/tokens).

```python
# install datasets library with pip install datasets
import argilla as rg
from datasets import load_dataset

# load dataset from the hub
dataset = load_dataset("argilla/gutenberg_spacy-ner", split="train")
rg.init(
api_url="ARGILLA_API_URL",
api_key="ARGILLA_API_KEY",
# extra_headers={"Authorization": f"Bearer {"HF_TOKEN"}"}
)
```

### 4. Create your first dataset

Specify a workspace where the dataset will be created. Check your workspaces in ["My settings"](/user_settings). To create a new workspace, check the [docs](https://docs.argilla.io/en/latest/getting_started/installation/configurations/workspace_management.html).

Create a Dataset with two labels ("sadness" and "joy"). Don't forget to replace "<your-workspace>". Here, we are using a task template, check the docs to [create a fully custom dataset](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/create_dataset.html).

```python
dataset = rg.FeedbackDataset.for_text_classification(
labels=["sadness", "joy"],
multi_label=False,
use_markdown=True,
guidelines=None,
metadata_properties=None,
vectors_settings=None,
)
dataset.push_to_argilla(name="my-first-dataset", workspace="<your-workspace>")
```

### 5. Add records

# read in dataset, assuming its a dataset for token classification
dataset_rg = rg.read_datasets(dataset, task="TokenClassification")
Create a list with the records you want to add. Ensure that you match the fields with the ones specified in the previous step.

# log the dataset
rg.log(dataset_rg, "gutenberg_spacy-ner")
You can also use `pandas` or `load_dataset` to [read an existing dataset and create records from it](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/records.html#add-records).

```python
records = [
rg.FeedbackRecord(
fields={
"text": "I am so happy today",
},
),
rg.FeedbackRecord(
fields={
"text": "I feel sad today",
},
)
]
dataset.add_records(records)
```
:::
::::

</div>
8 changes: 4 additions & 4 deletions docs/_source/_common/tabs/unfication_strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dataset = FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
strategy = LabelQuestionStrategy("majority") # "disagreement", "majority_weighted (WIP)"
dataset.unify_responses(
dataset.compute_unified_responses(
question=dataset.question_by_name("title_question_fit"),
strategy=strategy,
)
Expand All @@ -28,7 +28,7 @@ dataset = FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
strategy = MultiLabelQuestionStrategy("majority") # "disagreement", "majority_weighted (WIP)"
dataset.unify_responses(
dataset.compute_unified_responses(
question=dataset.question_by_name("tags"),
strategy=strategy,
)
Expand All @@ -46,7 +46,7 @@ dataset = FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
strategy = RankingQuestionStrategy("majority") # "mean", "max", "min"
dataset.unify_responses(
dataset.compute_unified_responses(
question=dataset.question_by_name("relevance_ranking"),
strategy=strategy,
)
Expand All @@ -64,7 +64,7 @@ dataset = FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
strategy = RatingQuestionStrategy("majority") # "mean", "max", "min"
dataset.unify_responses(
dataset.compute_unified_responses(
question=dataset.question_by_name("answer_quality"),
strategy=strategy,
)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3157945

Please sign in to comment.