You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{"cells": [{"cell_type": "markdown", "id": "settled-lodging", "metadata": {}, "source": ["# Datasets"]}, {"cell_type": "markdown", "id": "demanding-charge", "metadata": {}, "source": ["* Datasets are collections of data rows (image, video, or text to be labeled)\n", "* Datasets are used to define units of work.\n", " * Attaching a dataset to a project will add all data rows in the dataset to the project (and add them to the queue)\n", "* Datasets are not required to be fixed in size (you can add data rows at any time). \n", " * However, if you add data rows to a dataset, all projects associated with this dataset will add the new data rows to its queue"]}, {"cell_type": "code", "execution_count": 1, "id": "attached-ticket", "metadata": {}, "outputs": [], "source": "!pip install labelbox"}, {"cell_type": "code", "execution_count": 2, "id": "educational-locking", "metadata": {}, "outputs": [], "source": "from labelbox import Client\nfrom getpass import getpass\nimport uuid\nimport os"}, {"cell_type": "code", "execution_count": 3, "id": "secret-shore", "metadata": {}, "outputs": [], "source": "# If you don't want to give google access to drive you can skip this cell\n# and manually set `API_KEY` below.\n\nCOLAB = \"google.colab\" in str(get_ipython())\nif COLAB:\n !pip install colab-env -qU\n from colab_env import envvar_handler\n envvar_handler.envload()\n\nAPI_KEY = os.environ.get(\"LABELBOX_API_KEY\")\nif not os.environ.get(\"LABELBOX_API_KEY\"):\n API_KEY = getpass(\"Please enter your labelbox api key\")\n if COLAB:\n envvar_handler.add_env(\"LABELBOX_API_KEY\", API_KEY)"}, {"cell_type": "markdown", "id": "geological-clear", "metadata": {}, "source": ["* Set the following cell with your data to run this notebook"]}, {"cell_type": "code", "execution_count": 4, "id": "looking-airport", "metadata": {}, "outputs": [], "source": "# Pick a dataset that has attached data_rows\nDATASET_ID = \"ckm4xyfua04cf0z7a3wz58kgj\"\n# Only update this if you have an on-prem deployment\nENDPOINT = \"https://api.labelbox.com/graphql\""}, {"cell_type": "code", "execution_count": 5, "id": "retained-illustration", "metadata": {}, "outputs": [], "source": "client = Client(api_key=API_KEY, endpoint=ENDPOINT)"}, {"cell_type": "markdown", "id": "explicit-thunder", "metadata": {}, "source": ["### Read"]}, {"cell_type": "code", "execution_count": 6, "id": "inclusive-herald", "metadata": {}, "outputs": [], "source": "# Can be fetched by name (using a query - see basics), or using an id directly\ndataset = client.get_dataset(DATASET_ID)"}, {"cell_type": "code", "execution_count": 7, "id": "increased-joshua", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<Dataset {'created_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'animal_demo_ds', 'uid': 'ckm4xyfua04cf0z7a3wz58kgj', 'updated_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc)}>\n"]}], "source": "print(dataset)"}, {"cell_type": "code", "execution_count": 8, "id": "thermal-making", "metadata": {}, "outputs": [{"data": {"text/plain": ["<DataRow ID: ckm4y6s531rnq0rb6bobqa6j7>"]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": "# We can see the data rows associated with a dataset\ndata_rows = dataset.data_rows()\nnext(data_rows) # Print first one"}, {"cell_type": "code", "execution_count": 9, "id": "cellular-rhythm", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Projects with this dataset attached : [<Project ID: ckm4xyfncfgja0760vpfdxoro>]\n", "Dataset name : animal_demo_ds\n"]}], "source": "# Attached projects\nprint(\"Projects with this dataset attached :\", list(dataset.projects()))\nprint(\"Dataset name :\", dataset.name)"}, {"cell_type": "code", "execution_count": 10, "id": "liquid-stocks", "metadata": {}, "outputs": [], "source": "# A dataset is the way to list all data rows\ndata_row = next(dataset.data_rows())"}, {"cell_type": "markdown", "id": "sonic-classic", "metadata": {}, "source": ["### Create\n", "* See data_rows notebook on how to add data_rows to a dataset."]}, {"cell_type": "code", "execution_count": 11, "id": "valuable-bench", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<Dataset {'created_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'my_new_dataset', 'uid': 'ckmdcg8lf04px0y9ge67bbxa5', 'updated_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc)}>\n"]}], "source": "new_dataset = client.create_dataset(name=\"my_new_dataset\")\nprint(new_dataset)"}, {"cell_type": "markdown", "id": "varying-louisville", "metadata": {}, "source": ["### Update"]}, {"cell_type": "code", "execution_count": 12, "id": "clinical-parks", "metadata": {}, "outputs": [], "source": "new_dataset.update(name=\"new_name\")"}, {"cell_type": "markdown", "id": "outdoor-projector", "metadata": {}, "source": ["* See the data rows notebook `Create` section on how to add data_rows to a dataset."]}, {"cell_type": "markdown", "id": "caroline-therapist", "metadata": {}, "source": ["### Delete"]}, {"cell_type": "code", "execution_count": 13, "id": "increased-grenada", "metadata": {}, "outputs": [], "source": "new_dataset.delete()"}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2"}}, "nbformat": 4, "nbformat_minor": 5}
1
+
{
2
+
"cells": [
3
+
{
4
+
"cell_type": "markdown",
5
+
"id": "settled-lodging",
6
+
"metadata": {},
7
+
"source": [
8
+
"# Datasets"
9
+
]
10
+
},
11
+
{
12
+
"cell_type": "markdown",
13
+
"id": "demanding-charge",
14
+
"metadata": {},
15
+
"source": [
16
+
"* Datasets are collections of data rows (image, video, or text to be labeled)\n",
17
+
"* Datasets are used to define units of work.\n",
18
+
" * Attaching a dataset to a project will add all data rows in the dataset to the project (and add them to the queue)\n",
19
+
"* Datasets are not required to be fixed in size (you can add data rows at any time). \n",
20
+
" * However, if you add data rows to a dataset, all projects associated with this dataset will add the new data rows to its queue"
"source": "# If you don't want to give google access to drive you can skip this cell\n# and manually set `API_KEY` below.\n\nCOLAB = \"google.colab\" in str(get_ipython())\nif COLAB:\n !pip install colab-env -qU\n from colab_env import envvar_handler\n envvar_handler.envload()\n\nAPI_KEY = os.environ.get(\"LABELBOX_API_KEY\")\nif not os.environ.get(\"LABELBOX_API_KEY\"):\n API_KEY = getpass(\"Please enter your labelbox api key\")\n if COLAB:\n envvar_handler.add_env(\"LABELBOX_API_KEY\", API_KEY)"
46
+
},
47
+
{
48
+
"cell_type": "markdown",
49
+
"id": "geological-clear",
50
+
"metadata": {},
51
+
"source": [
52
+
"* Set the following cell with your data to run this notebook"
53
+
]
54
+
},
55
+
{
56
+
"cell_type": "code",
57
+
"execution_count": 4,
58
+
"id": "looking-airport",
59
+
"metadata": {},
60
+
"outputs": [],
61
+
"source": "# Pick a dataset that has attached data_rows\nDATASET_ID = \"ckm4xyfua04cf0z7a3wz58kgj\"\n# Only update this if you have an on-prem deployment\nENDPOINT = \"https://api.labelbox.com/graphql\""
0 commit comments