The R4DS Online Learning Community is a community of learners at all skill levels working together to improve our data-science-related skills. We offer free data-related education through book clubs and free live question-answering on our Slack, and by curating a dataset every week here at TidyTuesday.
(NOTE: Unfortunately, since this post, the Open Collective Foundation dissolved, so this is no longer true.)
We are now a fiscally sponsored project of Open Collective Foundation (https://opencollective.foundation), a 501(c)(3) public charity. That means donations to the R4DS Online Learning Community are now tax-deductible in the US! It also means that we are now eligible for a number of grants, including some of the grants listed on Grants.gov.
We have exported all grants past and present from that site, and we are making them available here for you to explore and visualize. We also scraped details for all posted grants. Please let us know if you find anything interesting!
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2023-10-03')
## OR
tuesdata <- tidytuesdayR::tt_load(2023, week = 40)
grants <- tuesdata$grants
grant_opportunity_details <- tuesdata$grant_opportunity_details
# Option 2: Read directly from GitHub
grants <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-03/grants.csv')
grant_opportunity_details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-03/grant_opportunity_details.csv')
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
variable | class | description |
---|---|---|
opportunity_id | integer | Integer ID for this opportunity, which can be used to find details at https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id} |
opportunity_number | character | Funding opportunity ID number |
opportunity_title | character | Title of the opportunity |
agency_code | character | Abbreviated name for the funding agency |
agency_name | character | Full name of the funding agency |
estimated_funding | double | Estimated funding amount in dollars |
expected_number_of_awards | integer | Expected count of awards |
grantor_contact | character | Information about how to contact the grantor. Often includes email address |
agency_contact_phone | character | Phone number for the agency when available |
agency_contact_email | character | Contact email address for the agency (almost always available) |
estimated_post_date | date | When the opportunity is/was expected to be posted |
estimated_application_due_date | date | Date by which applications are/were expected to be received |
posted_date | date | When the opportunity was posted |
close_date | date | When the opportunity was closed or will close |
last_updated_date_time | datetime | Date and time when the opportunity was updated |
version | character | Integer version number of the opportunity |
opportunity_status | character | Whether the opportunity is Archived, Closed, Forecasted, or Posted |
variable | class | description |
---|---|---|
opportunity_id | integer | Integer ID for this opportunity, which can be used to find these details at https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id} |
funding_opportunity_number | character | Funding opportunity ID number |
funding_opportunity_title | character | Title of the opportunity |
opportunity_category | character | "Continuation", "Discretionary", "Earmark", "Mandatory", or "Other" |
opportunity_category_explanation | character | More details about why the opportunity is in that category (mostly details about Other) |
expected_number_of_awards | integer | Expected count of awards |
cost_sharing_or_matching_requirement | logical | Whether the opportunity requires a cost-sharing or cost-matching agreement |
version | integer | Integer version number of the opportunity |
posted_date | date | When the opportunity was posted |
last_updated_date | date | When the opportunity was updated |
original_closing_date_for_applications | date | When the opportunity was originally scheduled to close |
current_closing_date_for_applications | date | When the opportunity is currently scheduled to close |
archive_date | date | When the opportunity will be archived |
estimated_total_program_funding | double | Estimated funding amount in dollars |
award_ceiling | double | Maximum individual award amount in dollars |
award_floor | double | Minimum individual award amount in dollars |
agency_name | character | Full name of the granting agency |
description | character | Free text description of the opportunity. Sometimes includes tables or potentially other figures, which did not necessarily scrape accurately |
link_to_additional_information | character | The text of any links to additional information (unlikely to be useful in this format) |
grantor_contact_information | character | Information about who to contact about the grant; may have contained links, which are not included in the scraped data |
eligibility_individuals | logical | Are individuals eligible? |
eligibility_state_governments | logical | Are state governments eligible? |
eligibility_county_governments | logical | Are county governments eligible? |
eligibility_independent_school_districts | logical | Are independent school districts eligible? |
eligibility_city_or_township_governments | logical | Are city or township governments eligible? |
eligibility_special_district_governments | logical | Are special district governments eligible? |
eligibility_native_american_tribal_governments_federally_recognized | logical | Are Native American tribal governments (Federally recognized) eligible? |
eligibility_native_american_tribal_organizations_other | logical | Are Native American tribal organizations (other than Federally recognized tribal governments) eligible? |
eligibility_nonprofits_501c3 | logical | Are nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education eligible? |
eligibility_nonprofits_non_501c3 | logical | Are nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher education eligible? |
eligibility_for_profit | logical | Are for profit organizations other than small businesses eligible? |
eligibility_small_businesses | logical | Are small businesses eligible? |
eligibility_private_institutions_of_higher_education | logical | Are private institutions of higher education eligible? |
eligibility_public_institutions_of_higher_education | logical | Are public and State controlled institutions of higher education eligible? |
eligibility_public_indian_housing_authorities | logical | Are public housing authorities and Indian housing authorities eligible? |
eligibility_others | logical | Are other groups eligible? |
eligibility_unrestricted | logical | Is eligibility unrestricted? |
additional_information_on_eligibility | character | Additional details about eligibility |
funding_cooperative_agreement | logical | Is the opportunity funded via a cooperative agreement? |
funding_grant | logical | Is the opportunity funded via a grant? |
funding_procurement_contract | logical | Is the opportunity funded via a procurement contract? |
funding_other | logical | Is the opportunity funded via some other instrument? |
cfda_numbers | character | Catalog of Federal Domestic Assistance number(s) (see https://sam.gov/content/assistance-listings) |
category_agriculture | logical | Category: Agriculture |
category_arts | logical | Category: Arts (see "Cultural Affairs" in CFDA) |
category_business | logical | Category: Business and Commerce |
category_community_development | logical | Category: Community Development |
category_consumer_protection | logical | Category: Consumer Protection |
category_disaster | logical | Category: Disaster Prevention and Relief |
category_education | logical | Category: Education |
category_employment | logical | Category: Employment, Labor and Training |
category_energy | logical | Category: Energy |
category_environment | logical | Category: Environment |
category_food | logical | Category: Food and Nutrition |
category_health | logical | Category: Health |
category_housing | logical | Category: Housing |
category_humanities | logical | Category: Humanities (see "Cultural Affairs" in CFDA) |
category_iija | logical | Category: Infrastructure Investment and Jobs Act (IIJA) |
category_income_security | logical | Category: Income Security and Social Services |
category_info | logical | Category: Information and Statistics |
category_law | logical | Category: Law, Justice and Legal Services |
category_natural_resources | logical | Category: Natural Resources |
category_opportunity_zone | logical | Category: Opportunity Zone Benefits |
category_regional_development | logical | Category: Regional Development |
category_science | logical | Category: Science and Technology and other Research and Development |
category_transportation | logical | Category: Transportation |
category_other | logical | Category: Other (see category_explanation for clarification) |
category_explanation | character | More details about the funding category or categories |
library(tidyverse)
library(janitor)
library(here)
library(fs)
# Requires dev rvest from this draft pull request:
# https://github.com/tidyverse/rvest/pull/362
#
# pak::pak("tidyverse/rvest#362")
library(rvest)
library(chromote)
working_dir <- here::here("data", "2023", "2023-10-03")
# I wanted to be able to download this CSV periodically, so I found a way to do
# it with {chromote}.
url <- "https://www.grants.gov/web/grants/search-grants.html"
# This probably SHOULD be done in chromote directly, but I'm using this
# rvest::read_html_live() function later and became familiar-enough with it.
live_page <- rvest::read_html_live(url)
# I used sleep()s to make sure the page was ready to continue.
Sys.sleep(10)
js_export_dataset <- readLines(
fs::path(working_dir, "export_dataset.js")
) |>
paste(collapse = "\n")
live_page$session$Browser$setDownloadBehavior(behavior = "allow", downloadPath = tempdir())
live_page$session$Runtime$evaluate(
js_export_dataset,
wait_ = TRUE,
awaitPromise = TRUE
)
# I can't figure out how to await promises with JS just yet, but I can make sure
# the file is there and isn't continuing to save.
grants_path <- fs::dir_info(tempdir(), glob = "*/grants-gov*.csv") |>
dplyr::arrange(desc(modification_time)) |>
head(1) |>
dplyr::pull(path)
while (!length(grants_path)) {
Sys.sleep(1)
grants_path <- fs::dir_info(tempdir(), glob = "*/grants-gov*.csv") |>
dplyr::arrange(desc(modification_time)) |>
head(1) |>
dplyr::pull(path)
}
grants_size <- fs::file_size(grants_path)
grants_ready <- FALSE
while (!grants_ready) {
Sys.sleep(1)
grants_ready <- grants_size == fs::file_size(grants_path)
}
live_page$session$close()
if (grants_size < 20000000) {
cli::cli_abort("Grants csv did not download properly.")
}
# Many rows have extra commas at the end, which cause confusion but otherwise
# don't damage the data. You can probably safely ignore the warnings.
grants <-
grants_path |>
readr::read_csv(
col_types = cols(
`OPPORTUNITY NUMBER` = col_character(),
`OPPORTUNITY TITLE` = col_character(),
`AGENCY CODE` = col_character(),
`AGENCY NAME` = col_character(),
`ESTIMATED FUNDING` = col_character(),
`EXPECTED NUMBER OF AWARDS` = col_character(),
`GRANTOR CONTACT` = col_character(),
`AGENCY CONTACT PHONE` = col_character(),
`AGENCY CONTACT EMAIL` = col_character(),
`ESTIMATED POST DATE` = col_character(),
`ESTIMATED APPLICATION DUE DATE` = col_character(),
`POSTED DATE` = col_character(),
`CLOSE DATE` = col_character(),
`LAST UPDATED DATE/TIME` = col_character(),
VERSION = col_character(),
`OPPORTUNITY STATUS` = col_character()
)
) |>
janitor::clean_names() |>
dplyr::mutate(
estimated_funding = case_match(
estimated_funding,
"Not available" ~ NA,
.default = estimated_funding
) |>
stringr::str_remove_all(",") |>
as.double(),
last_updated_date_time = lubridate::mdy_hms(last_updated_date_time),
opportunity_status = stringr::str_remove_all(opportunity_status, ",")
) |>
dplyr::mutate(
dplyr::across(
dplyr::ends_with("_date"),
lubridate::mdy
)
) |>
tidyr::separate_wider_regex(
opportunity_number,
c(
".+oppId=",
opportunity_id = "\\d+",
"\",\"",
opportunity_number = "[^\"]+",
"\"\\)"
)
) |>
dplyr::mutate(opportunity_id = as.integer(opportunity_id))
# Create a couple helper functions to get the grant details.
extract_synopsis_table <- function(html_document, div_id) {
headings <-
html_document |>
rvest::html_elements(glue::glue("#{div_id} > table > tbody > tr > th")) |>
rvest::html_text2()
bodies <-
html_document |>
rvest::html_elements(glue::glue("#{div_id} > table > tbody > tr > td")) |>
rvest::html_text2()
bodies <- as.list(bodies)
names(bodies) <- headings
tibble::as_tibble(bodies)
}
get_grant_details <- function(opportunity_id, sleep = 0.5) {
# Let's put an escape hatch in, in case something just won't load.
if (sleep > 10) {
return(
tibble::tibble(
opportunity_id = opportunity_id,
synopsis_failed = TRUE
)
)
}
url <- glue::glue(
"https://www.grants.gov/web/grants/view-opportunity.html?oppId={opportunity_id}"
)
live_page <- rvest::read_html_live(url)
# If anybody can help me figure out how to make this wait for promise
# evaluation more correctly, please let me know!
Sys.sleep(sleep)
iframe_html <-
live_page$session$Runtime$evaluate(
"document.querySelector('iframe').contentDocument.documentElement.innerHTML",
wait_ = TRUE,
awaitPromise = TRUE,
returnByValue = TRUE
)
html_document <-
iframe_html$result$value |>
rvest::read_html()
general_info_left <-
html_document |>
extract_synopsis_table("synopsisDetailsGeneralInfoTableLeft")
general_info_right <-
html_document |>
extract_synopsis_table("synopsisDetailsGeneralInfoTableRight")
eligibility <-
html_document |>
extract_synopsis_table("synopsisDetailsEligibilityTable")
additional_info <-
html_document |>
extract_synopsis_table("synopsisDetailsAdditionalInfoTable")
synopsis <-
dplyr::bind_cols(
general_info_left,
general_info_right,
eligibility,
additional_info
) |>
janitor::clean_names()
live_page$session$close()
if(nrow(synopsis)) {
synopsis <- dplyr::bind_cols(
tibble::tibble(opportunity_id = opportunity_id),
synopsis
)
return(synopsis)
}
return(
get_grant_details(opportunity_id, sleep = sleep + 0.5)
)
}
known_details <- tibble::tibble(opportunity_id = integer(), version = integer())
to_parse <-
grants |>
dplyr::filter(opportunity_status == "Posted") |>
dplyr::pull(opportunity_id)
if (fs::file_exists(fs::path(working_dir, "grant_opportunity_details.csv"))) {
known_details <-
readr::read_csv(
fs::path(working_dir, "grant_opportunity_details.csv"),
show_col_types = FALSE
) |>
dplyr::mutate(
opportunity_id = as.integer(opportunity_id)
)
to_parse <-
grants |>
dplyr::filter(opportunity_status == "Posted") |>
dplyr::mutate(
version = stringr::str_remove(version, "Synopsis ") |>
readr::parse_number()
) |>
dplyr::anti_join(known_details, by = c("opportunity_id", "version")) |>
dplyr::pull(opportunity_id)
}
grant_opportunity_details <-
to_parse |>
purrr::map(get_grant_details) |>
purrr::list_rbind()
if (nrow(grant_opportunity_details)) {
grant_opportunity_details <-
grant_opportunity_details |>
dplyr::mutate(
category_explanation = stringr::str_squish(category_explanation),
expected_number_of_awards = readr::parse_integer(expected_number_of_awards),
cost_sharing_or_matching_requirement = cost_sharing_or_matching_requirement == "Yes",
version = stringr::str_remove(version, "Synopsis ") |>
readr::parse_integer(),
dplyr::across(
dplyr::contains("date"),
\(x) {
x |>
stringr::str_extract("\\w+ \\d+, \\d+") |>
stringr::str_squish() |>
lubridate::mdy()
}
),
dplyr::across(
c(
"estimated_total_program_funding",
"award_ceiling",
"award_floor"
),
readr::parse_number
)
) |>
# Extract information about the various eligibility groups for easier filtering.
dplyr::mutate(
eligibility_individuals = stringr::str_detect(eligible_applicants, "Individuals"),
eligibility_state_governments = stringr::str_detect(eligible_applicants, "State governments"),
eligibility_county_governments = stringr::str_detect(eligible_applicants, "County governments"),
eligibility_independent_school_districts = stringr::str_detect(eligible_applicants, "County governments"),
eligibility_city_or_township_governments = stringr::str_detect(eligible_applicants, "City or township governments"),
eligibility_special_district_governments = stringr::str_detect(eligible_applicants, "Special district governments"),
eligibility_native_american_tribal_governments_federally_recognized = stringr::str_detect(eligible_applicants, stringr::fixed("Native American tribal governments (Federally recognized)")),
eligibility_native_american_tribal_organizations_other = stringr::str_detect(eligible_applicants, stringr::fixed("Native American tribal organizations (other than Federally recognized tribal governments)")),
eligibility_nonprofits_501c3 = stringr::str_detect(eligible_applicants, stringr::fixed("Nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education")),
eligibility_nonprofits_non_501c3 = stringr::str_detect(eligible_applicants, stringr::fixed("Nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher education")),
eligibility_for_profit = stringr::str_detect(eligible_applicants, "For profit organizations other than small businesses"),
eligibility_small_businesses = stringr::str_detect(eligible_applicants, "Small businesses"),
eligibility_private_institutions_of_higher_education = stringr::str_detect(eligible_applicants, "Private institutions of higher education"),
eligibility_public_institutions_of_higher_education = stringr::str_detect(eligible_applicants, "Public and State controlled institutions of higher education"),
eligibility_public_indian_housing_authorities = stringr::str_detect(eligible_applicants, stringr::fixed("Public housing authorities/Indian housing authorities")),
eligibility_others = stringr::str_detect(eligible_applicants, stringr::fixed("Others (see text field entitled \"Additional Information on Eligibility\" for clarification)")),
eligibility_unrestricted = stringr::str_detect(eligible_applicants, stringr::fixed("Unrestricted (i.e., open to any type of entity above), subject to any clarification in text field entitled \"Additional Information on Eligibility\""))
) |>
dplyr::relocate(additional_information_on_eligibility, .after = eligibility_unrestricted) |>
dplyr::select(-eligible_applicants) |>
# Extract information about the various funding_instrument_types for easier filtering.
dplyr::mutate(
funding_cooperative_agreement = stringr::str_detect(funding_instrument_type, "Cooperative Agreement"),
funding_grant = stringr::str_detect(funding_instrument_type, "Grant"),
funding_procurement_contract = stringr::str_detect(funding_instrument_type, "Procurement Contract"),
funding_other = stringr::str_detect(funding_instrument_type, "Other")
) |>
dplyr::select(-funding_instrument_type) |>
# Clean up the CFDA numbers, at least somewhat.
dplyr::mutate(
cfda_numbers = stringr::str_extract_all(cfda_number_s, "\\d{2}\\.\\d{3} -- \\D+") |>
purrr::map_chr(paste, collapse = " | ")
) |>
dplyr::select(-cfda_number_s) |>
# Clean up the category_of_funding_activity, at least somewhat.
dplyr::mutate(
category_agriculture = stringr::str_detect(category_of_funding_activity, "Agriculture"),
category_arts = stringr::str_detect(category_of_funding_activity, stringr::fixed("Arts (see \"Cultural Affairs\" in CFDA)")),
category_business = stringr::str_detect(category_of_funding_activity, "Business and Commerce"),
category_community_development = stringr::str_detect(category_of_funding_activity, "Community Development"),
category_consumer_protection = stringr::str_detect(category_of_funding_activity, "Consumer Protection"),
category_disaster = stringr::str_detect(category_of_funding_activity, "Disaster Prevention and Relief"),
category_education = stringr::str_detect(category_of_funding_activity, "Education"),
category_employment = stringr::str_detect(category_of_funding_activity, "Employment, Labor and Training"),
category_energy = stringr::str_detect(category_of_funding_activity, "Energy"),
category_environment = stringr::str_detect(category_of_funding_activity, "Environment"),
category_food = stringr::str_detect(category_of_funding_activity, "Food and Nutrition"),
category_health = stringr::str_detect(category_of_funding_activity, "Health"),
category_housing = stringr::str_detect(category_of_funding_activity, "Housing"),
category_humanities = stringr::str_detect(category_of_funding_activity, stringr::fixed("Humanities (see \"Cultural Affairs\" in CFDA)")),
category_iija = stringr::str_detect(category_of_funding_activity, stringr::fixed("Infrastructure Investment and Jobs Act (IIJA)")),
category_income_security = stringr::str_detect(category_of_funding_activity, "Income Security and Social Services"),
category_info = stringr::str_detect(category_of_funding_activity, "Information and Statistics"),
category_law = stringr::str_detect(category_of_funding_activity, "Law, Justice and Legal Services"),
category_natural_resources = stringr::str_detect(category_of_funding_activity, "Natural Resources"),
category_opportunity_zone = stringr::str_detect(category_of_funding_activity, "Opportunity Zone Benefits"),
category_regional_development = stringr::str_detect(category_of_funding_activity, "Regional Development"),
category_science = stringr::str_detect(category_of_funding_activity, "Science and Technology and other Research and Development"),
category_transportation = stringr::str_detect(category_of_funding_activity, "Transportation"),
category_other = stringr::str_detect(category_of_funding_activity, stringr::fixed("Other (see text field entitled \"Explanation of Other Category of Funding Activity\" for clarification)"))
) |>
dplyr::relocate(category_explanation, .after = category_other) |>
dplyr::select(-category_of_funding_activity) |>
dplyr::select(-document_type)
}
if (nrow(known_details)) {
grant_opportunity_details <-
known_details |>
dplyr::bind_rows(grant_opportunity_details) |>
dplyr::arrange(opportunity_id, desc(version)) |>
dplyr::distinct(opportunity_id, .keep_all = TRUE)
}
readr::write_csv(
grants,
fs::path(working_dir, "grants.csv")
)
readr::write_csv(
grant_opportunity_details,
fs::path(working_dir, "grant_opportunity_details.csv")
)