Skip to content

Commit

Permalink
08-putting-it-all-together.md: rewrite to use random data and draw hi…
Browse files Browse the repository at this point in the history
…stograms

Pull Request: datacarpentry/python-ecology-lesson#320

+ minor fixes in 07-visualization-ggplot-python.md
  • Loading branch information
wrightaprilm authored and maxim-belkin committed Oct 19, 2018
1 parent e38161e commit 4f44253
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 26 deletions.
4 changes: 2 additions & 2 deletions _episodes/07-visualization-ggplot-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ group, a boxplot can be used:
![png](../fig/06_boxplot.png)
By adding points of he individual observations to the boxplot, we can have a
By adding points of the individual observations to the boxplot, we can have a
better idea of the number of measurements and of their distribution:
~~~
Expand Down Expand Up @@ -452,7 +452,7 @@ arranged via formula notation (`rows ~ columns`; a `.` can be used as a
placeholder that indicates only one row or column).
~~~
# only selecte the years of interest
# only select the years of interest
survey_2000 = surveys_complete[surveys_complete["year"].isin([2000, 2001])]

(p9.ggplot(data=survey_2000,
Expand Down
76 changes: 52 additions & 24 deletions _episodes/08-putting-it-all-together.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,25 +146,26 @@ styles and the source codes that create them.
### `plt` pyplot versus object-based matplotlib
Matplotlib integrates nicely with the Numpy package and can use Numpy arrays
as input of the available plot functions. Consider the following example data,
created with Numpy:
Matplotlib integrates nicely with the NumPy package and can use NumPy arrays
as input to the available plot functions. Consider the following example data,
created with NumPy by drawing 1000 samples from a normal distribution with a mean value of 0 and
a standard deviation of 0.1:
~~~
import numpy
x = numpy.linspace(0, 5, 10)
y = x ** 2
sample_data = numpy.random.normal(0, 0.1, 1000)

~~~
{: .language-python}
To make a scatter plot of `x` and `y`, we can use the `plot` command directly:
To plot a histogram of our draws from the normal distribution, we can use the `hist` function directly:
~~~
plt.plot(x, y, '-')
plt.hist(sample_data)
~~~
{: .language-python}
![Line plot of y versus x](../fig/08_line_plot.png)
![Histogram of 1000 samples from normal distribution](../fig/08-normal-distribution.png)
> ## Tip: Cross-Platform Visualization of Figures
> Jupyter Notebooks make many aspects of data analysis and visualization much simpler. This includes
Expand All @@ -175,36 +176,47 @@ plt.plot(x, y, '-')
> colleagues who aren't using a Jupyter notebook to reproduce your work on their platform.
{: .callout}
or create matplotlib `figure` and `axis` objects first and add the plot later on:
or create matplotlib `figure` and `axis` objects first and subsequently add a histogram with 30
data bins:
~~~
fig, ax = plt.subplots() # initiate an empty figure and axis matplotlib object
ax.plot(x, y, '-')
ax.hist(sample_data, 30)
~~~
{: .language-python}
![Simple line plot](../fig/08_line_plot.png)
Although the latter approach requires a little bit more code to create the same plot,
the advantage is that it gives us **full control** over the plot and we can add new items
such as labels, grid lines, title, etc.. For example, we can add additional axes to
the figure and customize their labels:
such as labels, grid lines, title, and other visual elements. For example, we can add
additional axes to the figure and customize their labels:
~~~
fig, ax1 = plt.subplots() # prepare a matplotlib figure
ax1.plot(x, y, '-')
ax1.hist(sample_data, 30)

# Add a plot of a Beta distribution
a = 5
b = 10
beta_draws = np.random.beta(a, b)
# adapt the labels
ax1.set_ylabel('y')
ax1.set_xlabel('x')
ax1.set_ylabel('density')
ax1.set_xlabel('value')

# add additional axes to the figure
ax2 = fig.add_axes([0.2, 0.5, 0.4, 0.3])
ax2.plot(x, y*2, 'r-')
ax2 = fig.add_axes([0.125, 0.575, 0.3, 0.3])
#ax2 = fig.add_axes([left, bottom, right, top])
ax2.hist(beta_draws)
~~~
{: .language-python}
![Plot with additional axes](../fig/08_line_plot_inset.png)
![Plot with additional axes](../fig/08-dualdistribution.png)
> ## Challenge - Drawing from distributions
> Have a look at the NumPy
> random documentation <https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html>.
> Choose a distribution you have no familiarity with, and try to sample from and visualize it.
{: .challenge}
### Link matplotlib, Pandas and plotnine
Expand Down Expand Up @@ -253,9 +265,9 @@ plt.show() # not necessary in Jupyter Notebooks
> ## Challenge - Pandas and matplotlib
> Load the streamgage data set with Pandas, subset the week of the 2013 Front Range flood
> (September 9 through 15) and create a hydrograph (line plot) of the discharge data using
> Pandas, linking it to an empty maptlotlib `ax` object. Adapt the title, x-axis and y-axis label
> using matplotlib.
> (September 11 through 15) and create a hydrograph (line plot) of the discharge data using
> Pandas, linking it to an empty maptlotlib `ax` object. Create a second axis that displays the
> whole dataset. Adapt the title and axes' labels using matplotlib.
>
> > ## Answers
> >
Expand All @@ -273,6 +285,23 @@ plt.show() # not necessary in Jupyter Notebooks
> > ax.set_xlabel("") # no label
> > ax.set_ylabel("Discharge, cubic feet per second")
> > ax.set_title(" Front Range flood event 2013")
> > discharge = pd.read_csv("../data/bouldercreek_09_2013.txt",
> > skiprows=27, delimiter="\t",
> > names=["agency", "site_id", "datetime",
> > "timezone", "flow_rate", "height"])
> > fig, ax = plt.subplots()
> > flood = discharge[(discharge["datetime"] >= "2013-09-11") &
(discharge["datetime"] < "2013-09-15")]
>>
> > ax2 = fig.add_axes([0.65, 0.575, 0.25, 0.3])
>> flood.plot(x ="datetime", y="flow_rate", ax=ax)
> > discharge.plot(x ="datetime", y="flow_rate", ax=ax2)
> > ax2.legend().set_visible(False)
> > ax.set_xlabel("") # no label
> > ax.set_ylabel("Discharge, cubic feet per second")
> > ax.legend().set_visible(False)
> > ax.set_title(" Front Range flood event 2013")
> > ~~~
> > {: .language-python}
> >
Expand Down Expand Up @@ -311,7 +340,6 @@ Which will save the `fig` created using Pandas/matplotlib as a png file with the
> {: .solution}
{: .challenge}
## Make other types of plots:
Matplotlib can make many other types of plots in much the same way that it makes two-dimensional line plots. Look through the examples in
Expand Down
Binary file added fig/08-dualdistribution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/08-normal-distribution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/08_flood_event.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4f44253

Please sign in to comment.