Skip to content

Commit 287ddbe

Browse files
author
Matt Sokoloff
committed
unflatten notebooks
1 parent b1c2457 commit 287ddbe

18 files changed

+4798
-18
lines changed

examples/basics/basics.ipynb

Lines changed: 359 additions & 1 deletion
Large diffs are not rendered by default.

examples/basics/data_rows.ipynb

Lines changed: 295 additions & 1 deletion
Large diffs are not rendered by default.

examples/basics/datasets.ipynb

Lines changed: 234 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,234 @@
1-
{"cells": [{"cell_type": "markdown", "id": "settled-lodging", "metadata": {}, "source": ["# Datasets"]}, {"cell_type": "markdown", "id": "demanding-charge", "metadata": {}, "source": ["* Datasets are collections of data rows (image, video, or text to be labeled)\n", "* Datasets are used to define units of work.\n", " * Attaching a dataset to a project will add all data rows in the dataset to the project (and add them to the queue)\n", "* Datasets are not required to be fixed in size (you can add data rows at any time). \n", " * However, if you add data rows to a dataset, all projects associated with this dataset will add the new data rows to its queue"]}, {"cell_type": "code", "execution_count": 1, "id": "attached-ticket", "metadata": {}, "outputs": [], "source": "!pip install labelbox"}, {"cell_type": "code", "execution_count": 2, "id": "educational-locking", "metadata": {}, "outputs": [], "source": "from labelbox import Client\nfrom getpass import getpass\nimport uuid\nimport os"}, {"cell_type": "code", "execution_count": 3, "id": "secret-shore", "metadata": {}, "outputs": [], "source": "# If you don't want to give google access to drive you can skip this cell\n# and manually set `API_KEY` below.\n\nCOLAB = \"google.colab\" in str(get_ipython())\nif COLAB:\n !pip install colab-env -qU\n from colab_env import envvar_handler\n envvar_handler.envload()\n\nAPI_KEY = os.environ.get(\"LABELBOX_API_KEY\")\nif not os.environ.get(\"LABELBOX_API_KEY\"):\n API_KEY = getpass(\"Please enter your labelbox api key\")\n if COLAB:\n envvar_handler.add_env(\"LABELBOX_API_KEY\", API_KEY)"}, {"cell_type": "markdown", "id": "geological-clear", "metadata": {}, "source": ["* Set the following cell with your data to run this notebook"]}, {"cell_type": "code", "execution_count": 4, "id": "looking-airport", "metadata": {}, "outputs": [], "source": "# Pick a dataset that has attached data_rows\nDATASET_ID = \"ckm4xyfua04cf0z7a3wz58kgj\"\n# Only update this if you have an on-prem deployment\nENDPOINT = \"https://api.labelbox.com/graphql\""}, {"cell_type": "code", "execution_count": 5, "id": "retained-illustration", "metadata": {}, "outputs": [], "source": "client = Client(api_key=API_KEY, endpoint=ENDPOINT)"}, {"cell_type": "markdown", "id": "explicit-thunder", "metadata": {}, "source": ["### Read"]}, {"cell_type": "code", "execution_count": 6, "id": "inclusive-herald", "metadata": {}, "outputs": [], "source": "# Can be fetched by name (using a query - see basics), or using an id directly\ndataset = client.get_dataset(DATASET_ID)"}, {"cell_type": "code", "execution_count": 7, "id": "increased-joshua", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<Dataset {'created_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'animal_demo_ds', 'uid': 'ckm4xyfua04cf0z7a3wz58kgj', 'updated_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc)}>\n"]}], "source": "print(dataset)"}, {"cell_type": "code", "execution_count": 8, "id": "thermal-making", "metadata": {}, "outputs": [{"data": {"text/plain": ["<DataRow ID: ckm4y6s531rnq0rb6bobqa6j7>"]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": "# We can see the data rows associated with a dataset\ndata_rows = dataset.data_rows()\nnext(data_rows) # Print first one"}, {"cell_type": "code", "execution_count": 9, "id": "cellular-rhythm", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Projects with this dataset attached : [<Project ID: ckm4xyfncfgja0760vpfdxoro>]\n", "Dataset name : animal_demo_ds\n"]}], "source": "# Attached projects\nprint(\"Projects with this dataset attached :\", list(dataset.projects()))\nprint(\"Dataset name :\", dataset.name)"}, {"cell_type": "code", "execution_count": 10, "id": "liquid-stocks", "metadata": {}, "outputs": [], "source": "# A dataset is the way to list all data rows\ndata_row = next(dataset.data_rows())"}, {"cell_type": "markdown", "id": "sonic-classic", "metadata": {}, "source": ["### Create\n", "* See data_rows notebook on how to add data_rows to a dataset."]}, {"cell_type": "code", "execution_count": 11, "id": "valuable-bench", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<Dataset {'created_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'my_new_dataset', 'uid': 'ckmdcg8lf04px0y9ge67bbxa5', 'updated_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc)}>\n"]}], "source": "new_dataset = client.create_dataset(name=\"my_new_dataset\")\nprint(new_dataset)"}, {"cell_type": "markdown", "id": "varying-louisville", "metadata": {}, "source": ["### Update"]}, {"cell_type": "code", "execution_count": 12, "id": "clinical-parks", "metadata": {}, "outputs": [], "source": "new_dataset.update(name=\"new_name\")"}, {"cell_type": "markdown", "id": "outdoor-projector", "metadata": {}, "source": ["* See the data rows notebook `Create` section on how to add data_rows to a dataset."]}, {"cell_type": "markdown", "id": "caroline-therapist", "metadata": {}, "source": ["### Delete"]}, {"cell_type": "code", "execution_count": 13, "id": "increased-grenada", "metadata": {}, "outputs": [], "source": "new_dataset.delete()"}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2"}}, "nbformat": 4, "nbformat_minor": 5}
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "settled-lodging",
6+
"metadata": {},
7+
"source": [
8+
"# Datasets"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "demanding-charge",
14+
"metadata": {},
15+
"source": [
16+
"* Datasets are collections of data rows (image, video, or text to be labeled)\n",
17+
"* Datasets are used to define units of work.\n",
18+
" * Attaching a dataset to a project will add all data rows in the dataset to the project (and add them to the queue)\n",
19+
"* Datasets are not required to be fixed in size (you can add data rows at any time). \n",
20+
" * However, if you add data rows to a dataset, all projects associated with this dataset will add the new data rows to its queue"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 1,
26+
"id": "attached-ticket",
27+
"metadata": {},
28+
"outputs": [],
29+
"source": "!pip install labelbox"
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": 2,
34+
"id": "educational-locking",
35+
"metadata": {},
36+
"outputs": [],
37+
"source": "from labelbox import Client\nfrom getpass import getpass\nimport uuid\nimport os"
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": 3,
42+
"id": "secret-shore",
43+
"metadata": {},
44+
"outputs": [],
45+
"source": "# If you don't want to give google access to drive you can skip this cell\n# and manually set `API_KEY` below.\n\nCOLAB = \"google.colab\" in str(get_ipython())\nif COLAB:\n !pip install colab-env -qU\n from colab_env import envvar_handler\n envvar_handler.envload()\n\nAPI_KEY = os.environ.get(\"LABELBOX_API_KEY\")\nif not os.environ.get(\"LABELBOX_API_KEY\"):\n API_KEY = getpass(\"Please enter your labelbox api key\")\n if COLAB:\n envvar_handler.add_env(\"LABELBOX_API_KEY\", API_KEY)"
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"id": "geological-clear",
50+
"metadata": {},
51+
"source": [
52+
"* Set the following cell with your data to run this notebook"
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": 4,
58+
"id": "looking-airport",
59+
"metadata": {},
60+
"outputs": [],
61+
"source": "# Pick a dataset that has attached data_rows\nDATASET_ID = \"ckm4xyfua04cf0z7a3wz58kgj\"\n# Only update this if you have an on-prem deployment\nENDPOINT = \"https://api.labelbox.com/graphql\""
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": 5,
66+
"id": "retained-illustration",
67+
"metadata": {},
68+
"outputs": [],
69+
"source": "client = Client(api_key=API_KEY, endpoint=ENDPOINT)"
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"id": "explicit-thunder",
74+
"metadata": {},
75+
"source": [
76+
"### Read"
77+
]
78+
},
79+
{
80+
"cell_type": "code",
81+
"execution_count": 6,
82+
"id": "inclusive-herald",
83+
"metadata": {},
84+
"outputs": [],
85+
"source": "# Can be fetched by name (using a query - see basics), or using an id directly\ndataset = client.get_dataset(DATASET_ID)"
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": 7,
90+
"id": "increased-joshua",
91+
"metadata": {},
92+
"outputs": [
93+
{
94+
"name": "stdout",
95+
"output_type": "stream",
96+
"text": [
97+
"<Dataset {'created_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'animal_demo_ds', 'uid': 'ckm4xyfua04cf0z7a3wz58kgj', 'updated_at': datetime.datetime(2021, 3, 11, 14, 3, 12, tzinfo=datetime.timezone.utc)}>\n"
98+
]
99+
}
100+
],
101+
"source": "print(dataset)"
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": 8,
106+
"id": "thermal-making",
107+
"metadata": {},
108+
"outputs": [
109+
{
110+
"data": {
111+
"text/plain": [
112+
"<DataRow ID: ckm4y6s531rnq0rb6bobqa6j7>"
113+
]
114+
},
115+
"execution_count": 27,
116+
"metadata": {},
117+
"output_type": "execute_result"
118+
}
119+
],
120+
"source": "# We can see the data rows associated with a dataset\ndata_rows = dataset.data_rows()\nnext(data_rows) # Print first one"
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": 9,
125+
"id": "cellular-rhythm",
126+
"metadata": {},
127+
"outputs": [
128+
{
129+
"name": "stdout",
130+
"output_type": "stream",
131+
"text": [
132+
"Projects with this dataset attached : [<Project ID: ckm4xyfncfgja0760vpfdxoro>]\n",
133+
"Dataset name : animal_demo_ds\n"
134+
]
135+
}
136+
],
137+
"source": "# Attached projects\nprint(\"Projects with this dataset attached :\", list(dataset.projects()))\nprint(\"Dataset name :\", dataset.name)"
138+
},
139+
{
140+
"cell_type": "code",
141+
"execution_count": 10,
142+
"id": "liquid-stocks",
143+
"metadata": {},
144+
"outputs": [],
145+
"source": "# A dataset is the way to list all data rows\ndata_row = next(dataset.data_rows())"
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"id": "sonic-classic",
150+
"metadata": {},
151+
"source": [
152+
"### Create\n",
153+
"* See data_rows notebook on how to add data_rows to a dataset."
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": 11,
159+
"id": "valuable-bench",
160+
"metadata": {},
161+
"outputs": [
162+
{
163+
"name": "stdout",
164+
"output_type": "stream",
165+
"text": [
166+
"<Dataset {'created_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc), 'description': '', 'name': 'my_new_dataset', 'uid': 'ckmdcg8lf04px0y9ge67bbxa5', 'updated_at': datetime.datetime(2021, 3, 17, 11, 11, 7, tzinfo=datetime.timezone.utc)}>\n"
167+
]
168+
}
169+
],
170+
"source": "new_dataset = client.create_dataset(name=\"my_new_dataset\")\nprint(new_dataset)"
171+
},
172+
{
173+
"cell_type": "markdown",
174+
"id": "varying-louisville",
175+
"metadata": {},
176+
"source": [
177+
"### Update"
178+
]
179+
},
180+
{
181+
"cell_type": "code",
182+
"execution_count": 12,
183+
"id": "clinical-parks",
184+
"metadata": {},
185+
"outputs": [],
186+
"source": "new_dataset.update(name=\"new_name\")"
187+
},
188+
{
189+
"cell_type": "markdown",
190+
"id": "outdoor-projector",
191+
"metadata": {},
192+
"source": [
193+
"* See the data rows notebook `Create` section on how to add data_rows to a dataset."
194+
]
195+
},
196+
{
197+
"cell_type": "markdown",
198+
"id": "caroline-therapist",
199+
"metadata": {},
200+
"source": [
201+
"### Delete"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": 13,
207+
"id": "increased-grenada",
208+
"metadata": {},
209+
"outputs": [],
210+
"source": "new_dataset.delete()"
211+
}
212+
],
213+
"metadata": {
214+
"kernelspec": {
215+
"display_name": "Python 3",
216+
"language": "python",
217+
"name": "python3"
218+
},
219+
"language_info": {
220+
"codemirror_mode": {
221+
"name": "ipython",
222+
"version": 3
223+
},
224+
"file_extension": ".py",
225+
"mimetype": "text/x-python",
226+
"name": "python",
227+
"nbconvert_exporter": "python",
228+
"pygments_lexer": "ipython3",
229+
"version": "3.8.2"
230+
}
231+
},
232+
"nbformat": 4,
233+
"nbformat_minor": 5
234+
}

0 commit comments

Comments
 (0)