Skip to content

Commit bc5ed17

Browse files
author
Susan Vanderplas
committed
update freeze files and programming with data
1 parent a094874 commit bc5ed17

13 files changed

+27
-23
lines changed

_freeze/part-gen-prog/07-prog-data/execute-results/html.json

+2-2
Large diffs are not rendered by default.
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

part-gen-prog/07-prog-data.qmd

+25-21
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,15 @@ As you've probably guessed by now, this section will primarily be focused on ex
2121

2222
- Create new variables and columns or reformat existing columns in provided data structures
2323

24+
```{python}
25+
#| include: false
26+
#| echo: false
27+
#| message: false
28+
#| warning: false
29+
import warnings
30+
warnings.filterwarnings("ignore", category=FutureWarning)
31+
```
32+
2433
## Artwork Dimensions
2534

2635
The Tate Art Museum assembled a collection of 70,000 artworks (last updated in 2014). They cataloged information including accession number, artwork dimensions, units, title, date, medium, inscription, and even URLs for images of the art.
@@ -146,7 +155,7 @@ We might be interested in the aspect ratio of the artwork - let's take a look at
146155

147156
::: panel-tabset
148157

149-
####R {-}
158+
#### R {-}
150159

151160
```{r hist-dims-art, fig.height = 4, fig.width = 12}
152161
par(mfrow=c(1, 3)) # 3 plots on one row
@@ -267,7 +276,7 @@ The downside to this is that we have to write out `artwork$aspect_hw` or `artwor
267276

268277
One mistake I see people make frequently is to calculate `height/width`, but then not assign that value to a variable.
269278

270-
If you're not using `<-` in R^[(or `=` or `->` if you're a total heathen)] or `=` in Python, then you're not saving that information to be referenced later - you're just calculating values temporarily and possibly printing them as output.
279+
**If you're not using `<-` in R^[(or `=`, or `->` if you're a total heathen)] or `=` in Python, then you're not saving that information** to be referenced later - you're just calculating values temporarily and possibly printing them as output.
271280

272281
:::
273282

@@ -393,7 +402,7 @@ You may need to run `pip install plotnine` in the terminal if you have not used
393402
from plotnine import *
394403
395404
(
396-
ggplot(aes(x = 'LicenseIssuedDate'), data = dogs) +
405+
ggplot(mapping = aes(x = 'LicenseIssuedDate'), data = dogs) +
397406
geom_histogram() # Create a histogram
398407
)
399408
@@ -443,7 +452,7 @@ dogs["License_length_yr"] = dogs.License_length.dt.days/365.25
443452

444453
```{python dog-license-length2-py}
445454
(
446-
ggplot(aes(x = "License_length_yr"), data = dogs) +
455+
ggplot(mapping = aes(x = "License_length_yr"), data = dogs) +
447456
geom_histogram(bins = 30)+
448457
scale_x_continuous(limits = (0,10))
449458
)
@@ -501,7 +510,6 @@ dogs.head()
501510

502511
Now that we have borough, let's write a function that will take a dataset and spit out a list of the top 5 dog breeds registered in that area.
503512

504-
### Custom Summary Function
505513

506514
::: panel-tabset
507515

@@ -528,8 +536,6 @@ def top_5_breeds(data):
528536
:::
529537

530538

531-
### For Loop Summary
532-
533539
Now, using that function, lets write a for loop that loops through the 5 boroughs and spits out the top 5 breeds in each borough:
534540

535541
::: panel-tabset
@@ -580,7 +586,6 @@ for i in boroughs:
580586

581587
:::
582588

583-
### Summary Data Frame
584589

585590
If we wanted to save these results as a summary data frame, we could totally do that!
586591

@@ -711,10 +716,11 @@ library(ggplot2)
711716
712717
ggplot(
713718
data = tarantino,
714-
aes(x = minutes_in, color = movie)
719+
aes(x = minutes_in, color = type)
715720
) +
716721
geom_density() +
717-
facet_wrap(~type)
722+
scale_color_manual(values = c("black", "grey")) +
723+
facet_wrap(~movie)
718724
```
719725

720726
#### Python {-}
@@ -724,12 +730,11 @@ You may need to run `pip install plotnine` in the terminal if you have not used
724730
```{python tarantino-hist-py}
725731
from plotnine import *
726732
727-
(
728-
ggplot(tarantino, aes(x = 'minutes_in', color = 'movie')) +
729-
geom_density() +
730-
facet_wrap("type")
731-
)
732-
733+
plot = ggplot(data = tarantino, mapping = aes(x = 'minutes_in', color = "type"))
734+
plot = plot + geom_density()
735+
plot = plot + scale_color_manual(values = ["black", "grey"])
736+
plot = plot + facet_wrap("movie")
737+
plot.show()
733738
```
734739

735740
:::
@@ -786,12 +791,11 @@ tarantino_words = tarantino.query("type == 'word'")
786791
787792
# Step 2 - 6 most common words
788793
794+
plot = ggplot(tarantino, aes(x = 'minutes_in', color = 'movie'))
795+
plot = plot + geom_density()
796+
plot = plot + facet_wrap("type")
789797
790-
(
791-
ggplot(tarantino, aes(x = 'minutes_in', color = 'movie')) +
792-
geom_density() +
793-
facet_wrap("type")
794-
)
798+
plot.show()
795799
796800
```
797801

0 commit comments

Comments
 (0)