Skip to content

Commit 3d41776

Browse files
committed
merge two episodes into one
1 parent 03a2c26 commit 3d41776

File tree

3 files changed

+200
-201
lines changed

3 files changed

+200
-201
lines changed

content/customizing-plots.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,203 @@
1+
(gapminder)=
2+
3+
# Plotting the Gapminder dataset
4+
5+
6+
## Loading and plotting a dataset
7+
8+
In this lesson will work with one of the
9+
[Gapminder](https://www.gapminder.org/tools/) datasets.
10+
11+
Let us together read and plot the data and then we explain what is happening
12+
and we will improve the figure together. First we read and inspect the data:
13+
```python
14+
# import necessary libraries
15+
import altair as alt
16+
import pandas as pd
17+
18+
# read the data
19+
url_prefix = "https://raw.githubusercontent.com/plotly/datasets/master/"
20+
data = pd.read_csv(url_prefix + "gapminder_with_codes.csv")
21+
22+
# print overview of the dataset
23+
data
24+
```
25+
26+
With very few lines we can get the first plot:
27+
```python
28+
alt.Chart(data).mark_point().encode(
29+
x="gdpPercap",
30+
y="lifeExp",
31+
)
32+
```
33+
34+
```{figure} img/gapminder/all-data.svg
35+
:alt: First raw plot with all countries and all years.
36+
37+
First raw plot with all countries and all years.
38+
```
39+
40+
Observe how we connect (encode) **visual channels** to data columns:
41+
- x-coordinate with "gdpPercap"
42+
- y-coordinate with "lifeExp"
43+
44+
The following code would have the same effect but the above version might be
45+
easier to read:
46+
```python
47+
alt.Chart(data).mark_point().encode(x="gdpPercap", y="lifeExp")
48+
```
49+
50+
```{discussion} Let us pause and explain the code
51+
- `alt` is a short-hand for `altair` which we imported on top of the notebook
52+
- `Chart()` is a function defined inside `altair` which takes the data as argument
53+
- `mark_point()` is a function that produces scatter plots
54+
- `encode()` is a function which encodes data columns to visual channels
55+
```
56+
57+
58+
## Filtering data directly in [Vega-Altair](https://altair-viz.github.io)
59+
60+
In [Vega-Altair](https://altair-viz.github.io) we can chain functions. Let us
61+
add two more:
62+
```{code-block} python
63+
---
64+
emphasize-lines: 4
65+
---
66+
alt.Chart(data).mark_point().encode(
67+
x="gdpPercap",
68+
y="lifeExp",
69+
).transform_filter(alt.datum.year == 2007).interactive()
70+
```
71+
72+
```{figure} img/gapminder/only-2007.svg
73+
:alt: Now we only keep the year 2007.
74+
75+
Now we only keep the year 2007.
76+
```
77+
78+
79+
## Using color as additional channel
80+
81+
A very neat feature of [Vega-Altair](https://altair-viz.github.io) is that it
82+
is easy to add and modify visual channels. Let us try to add one more so that
83+
we do something with the "continent" data column:
84+
```{code-block} python
85+
---
86+
emphasize-lines: 4
87+
---
88+
alt.Chart(data).mark_point().encode(
89+
x="gdpPercap",
90+
y="lifeExp",
91+
color="continent",
92+
).transform_filter(alt.datum.year == 2007).interactive()
93+
```
94+
95+
```{figure} img/gapminder/color.svg
96+
:alt: Using different colors for different continents.
97+
98+
Using different colors for different continents.
99+
```
100+
101+
102+
## Changing to log scale
103+
104+
For this data set we will get a better insight when switching the x-axis from
105+
linear to log scale (we changed two lines to show both the "method syntax" and
106+
the "attribute syntax"):
107+
```{code-block} python
108+
---
109+
emphasize-lines: 2-3
110+
---
111+
alt.Chart(data).mark_point().encode(
112+
x=alt.X("gdpPercap").scale(type="log"),
113+
y=alt.Y("lifeExp"),
114+
color="continent",
115+
).transform_filter(alt.datum.year == 2007).interactive()
116+
```
117+
118+
```{figure} img/gapminder/log-scale.svg
119+
:alt: Changing the x axis to log scale.
120+
121+
Changing the x axis to log scale.
122+
```
123+
124+
125+
## Improving axis titles
126+
127+
```{code-block} python
128+
---
129+
emphasize-lines: 2-3
130+
---
131+
alt.Chart(data).mark_point().encode(
132+
x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"),
133+
y=alt.Y("lifeExp").title("Life expectancy (years)"),
134+
color="continent",
135+
).transform_filter(alt.datum.year == 2007).interactive()
136+
```
137+
138+
```{figure} img/gapminder/axis-titles.svg
139+
:alt: Improving the axis titles.
140+
141+
Improving the axis titles.
142+
```
143+
144+
145+
## Faceted charts
146+
147+
To see what faceted charts are and how easy it is to do this, add the following
148+
line:
149+
```{code-block} python
150+
---
151+
emphasize-lines: 5
152+
---
153+
alt.Chart(data).mark_point().encode(
154+
x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"),
155+
y=alt.Y("lifeExp").title("Life expectancy (years)"),
156+
color="continent",
157+
row="continent",
158+
).transform_filter(alt.datum.year == 2007).interactive()
159+
```
160+
161+
Guess what happens when you change `row="continent"` to `column="continent"`?
162+
163+
164+
## Changing from points to circles
165+
166+
Let us add one more visual channel, mapping size of the circle to the
167+
population size of a country:
168+
```{code-block} python
169+
---
170+
emphasize-lines: 1,5
171+
---
172+
alt.Chart(data).mark_circle().encode(
173+
x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"),
174+
y=alt.Y("lifeExp").title("Life expectancy (years)"),
175+
color="continent",
176+
size="pop",
177+
).transform_filter(alt.datum.year == 2007).interactive()
178+
```
179+
180+
```{figure} img/gapminder/population-size.svg
181+
:alt: Circle sizes are proportional to population sizes.
182+
183+
Circle sizes are proportional to population sizes.
184+
```
185+
186+
---
187+
188+
```{discussion} Where to go from here?
189+
In few steps and few lines of code we have achieved a lot!
190+
191+
These plots are perhaps not publication quality yet but we will learn how to
192+
customize and improve in {ref}`customizing-plots`.
193+
```
194+
195+
```{keypoints}
196+
- Avoid manual post-processing, try to script all steps.
197+
- Browse a number of example galleries to help you choose the library that fits best your work/style.
198+
- Figures for presentation slides and figures for manuscripts have different
199+
requirements. More about that later.
200+
```
1201
(customizing-plots)=
2202

3203
# Customizing plots

0 commit comments

Comments
 (0)