|
| 1 | +(gapminder)= |
| 2 | + |
| 3 | +# Plotting the Gapminder dataset |
| 4 | + |
| 5 | + |
| 6 | +## Loading and plotting a dataset |
| 7 | + |
| 8 | +In this lesson will work with one of the |
| 9 | +[Gapminder](https://www.gapminder.org/tools/) datasets. |
| 10 | + |
| 11 | +Let us together read and plot the data and then we explain what is happening |
| 12 | +and we will improve the figure together. First we read and inspect the data: |
| 13 | +```python |
| 14 | +# import necessary libraries |
| 15 | +import altair as alt |
| 16 | +import pandas as pd |
| 17 | + |
| 18 | +# read the data |
| 19 | +url_prefix = "https://raw.githubusercontent.com/plotly/datasets/master/" |
| 20 | +data = pd.read_csv(url_prefix + "gapminder_with_codes.csv") |
| 21 | + |
| 22 | +# print overview of the dataset |
| 23 | +data |
| 24 | +``` |
| 25 | + |
| 26 | +With very few lines we can get the first plot: |
| 27 | +```python |
| 28 | +alt.Chart(data).mark_point().encode( |
| 29 | + x="gdpPercap", |
| 30 | + y="lifeExp", |
| 31 | +) |
| 32 | +``` |
| 33 | + |
| 34 | +```{figure} img/gapminder/all-data.svg |
| 35 | +:alt: First raw plot with all countries and all years. |
| 36 | + |
| 37 | +First raw plot with all countries and all years. |
| 38 | +``` |
| 39 | + |
| 40 | +Observe how we connect (encode) **visual channels** to data columns: |
| 41 | +- x-coordinate with "gdpPercap" |
| 42 | +- y-coordinate with "lifeExp" |
| 43 | + |
| 44 | +The following code would have the same effect but the above version might be |
| 45 | +easier to read: |
| 46 | +```python |
| 47 | +alt.Chart(data).mark_point().encode(x="gdpPercap", y="lifeExp") |
| 48 | +``` |
| 49 | + |
| 50 | +```{discussion} Let us pause and explain the code |
| 51 | +- `alt` is a short-hand for `altair` which we imported on top of the notebook |
| 52 | +- `Chart()` is a function defined inside `altair` which takes the data as argument |
| 53 | +- `mark_point()` is a function that produces scatter plots |
| 54 | +- `encode()` is a function which encodes data columns to visual channels |
| 55 | +``` |
| 56 | + |
| 57 | + |
| 58 | +## Filtering data directly in [Vega-Altair](https://altair-viz.github.io) |
| 59 | + |
| 60 | +In [Vega-Altair](https://altair-viz.github.io) we can chain functions. Let us |
| 61 | +add two more: |
| 62 | +```{code-block} python |
| 63 | +--- |
| 64 | +emphasize-lines: 4 |
| 65 | +--- |
| 66 | +alt.Chart(data).mark_point().encode( |
| 67 | + x="gdpPercap", |
| 68 | + y="lifeExp", |
| 69 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 70 | +``` |
| 71 | + |
| 72 | +```{figure} img/gapminder/only-2007.svg |
| 73 | +:alt: Now we only keep the year 2007. |
| 74 | + |
| 75 | +Now we only keep the year 2007. |
| 76 | +``` |
| 77 | + |
| 78 | + |
| 79 | +## Using color as additional channel |
| 80 | + |
| 81 | +A very neat feature of [Vega-Altair](https://altair-viz.github.io) is that it |
| 82 | +is easy to add and modify visual channels. Let us try to add one more so that |
| 83 | +we do something with the "continent" data column: |
| 84 | +```{code-block} python |
| 85 | +--- |
| 86 | +emphasize-lines: 4 |
| 87 | +--- |
| 88 | +alt.Chart(data).mark_point().encode( |
| 89 | + x="gdpPercap", |
| 90 | + y="lifeExp", |
| 91 | + color="continent", |
| 92 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 93 | +``` |
| 94 | + |
| 95 | +```{figure} img/gapminder/color.svg |
| 96 | +:alt: Using different colors for different continents. |
| 97 | + |
| 98 | +Using different colors for different continents. |
| 99 | +``` |
| 100 | + |
| 101 | + |
| 102 | +## Changing to log scale |
| 103 | + |
| 104 | +For this data set we will get a better insight when switching the x-axis from |
| 105 | +linear to log scale (we changed two lines to show both the "method syntax" and |
| 106 | +the "attribute syntax"): |
| 107 | +```{code-block} python |
| 108 | +--- |
| 109 | +emphasize-lines: 2-3 |
| 110 | +--- |
| 111 | +alt.Chart(data).mark_point().encode( |
| 112 | + x=alt.X("gdpPercap").scale(type="log"), |
| 113 | + y=alt.Y("lifeExp"), |
| 114 | + color="continent", |
| 115 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 116 | +``` |
| 117 | + |
| 118 | +```{figure} img/gapminder/log-scale.svg |
| 119 | +:alt: Changing the x axis to log scale. |
| 120 | + |
| 121 | +Changing the x axis to log scale. |
| 122 | +``` |
| 123 | + |
| 124 | + |
| 125 | +## Improving axis titles |
| 126 | + |
| 127 | +```{code-block} python |
| 128 | +--- |
| 129 | +emphasize-lines: 2-3 |
| 130 | +--- |
| 131 | +alt.Chart(data).mark_point().encode( |
| 132 | + x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"), |
| 133 | + y=alt.Y("lifeExp").title("Life expectancy (years)"), |
| 134 | + color="continent", |
| 135 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 136 | +``` |
| 137 | + |
| 138 | +```{figure} img/gapminder/axis-titles.svg |
| 139 | +:alt: Improving the axis titles. |
| 140 | + |
| 141 | +Improving the axis titles. |
| 142 | +``` |
| 143 | + |
| 144 | + |
| 145 | +## Faceted charts |
| 146 | + |
| 147 | +To see what faceted charts are and how easy it is to do this, add the following |
| 148 | +line: |
| 149 | +```{code-block} python |
| 150 | +--- |
| 151 | +emphasize-lines: 5 |
| 152 | +--- |
| 153 | +alt.Chart(data).mark_point().encode( |
| 154 | + x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"), |
| 155 | + y=alt.Y("lifeExp").title("Life expectancy (years)"), |
| 156 | + color="continent", |
| 157 | + row="continent", |
| 158 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 159 | +``` |
| 160 | + |
| 161 | +Guess what happens when you change `row="continent"` to `column="continent"`? |
| 162 | + |
| 163 | + |
| 164 | +## Changing from points to circles |
| 165 | + |
| 166 | +Let us add one more visual channel, mapping size of the circle to the |
| 167 | +population size of a country: |
| 168 | +```{code-block} python |
| 169 | +--- |
| 170 | +emphasize-lines: 1,5 |
| 171 | +--- |
| 172 | +alt.Chart(data).mark_circle().encode( |
| 173 | + x=alt.X("gdpPercap").scale(type="log").title("GDP per capita (PPP dollars)"), |
| 174 | + y=alt.Y("lifeExp").title("Life expectancy (years)"), |
| 175 | + color="continent", |
| 176 | + size="pop", |
| 177 | +).transform_filter(alt.datum.year == 2007).interactive() |
| 178 | +``` |
| 179 | + |
| 180 | +```{figure} img/gapminder/population-size.svg |
| 181 | +:alt: Circle sizes are proportional to population sizes. |
| 182 | + |
| 183 | +Circle sizes are proportional to population sizes. |
| 184 | +``` |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +```{discussion} Where to go from here? |
| 189 | +In few steps and few lines of code we have achieved a lot! |
| 190 | +
|
| 191 | +These plots are perhaps not publication quality yet but we will learn how to |
| 192 | +customize and improve in {ref}`customizing-plots`. |
| 193 | +``` |
| 194 | + |
| 195 | +```{keypoints} |
| 196 | +- Avoid manual post-processing, try to script all steps. |
| 197 | +- Browse a number of example galleries to help you choose the library that fits best your work/style. |
| 198 | +- Figures for presentation slides and figures for manuscripts have different |
| 199 | + requirements. More about that later. |
| 200 | +``` |
1 | 201 | (customizing-plots)=
|
2 | 202 |
|
3 | 203 | # Customizing plots
|
|
0 commit comments