Skip to content

Commit

Permalink
Allow wininet method to be used via new R_READABS_DL_METHOD env v…
Browse files Browse the repository at this point in the history
…ariable (#247)

* look for method env var

* update DESC/NEWS; use method env var in check_abs_connection()

* refresh internal data

* use \describe rather than \itemize in docs for some reason

* update docs

* iterate version
  • Loading branch information
MattCowgill authored May 27, 2024
1 parent 9217a92 commit 47b4f9d
Show file tree
Hide file tree
Showing 14 changed files with 95 additions and 96 deletions.
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: readabs
Type: Package
Title: Download and Tidy Time Series Data from the Australian Bureau of Statistics
Version: 0.4.14.903
Version: 0.4.15
Authors@R: c(
person("Matt", "Cowgill", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0003-0422-3300")),
person("Zoe", "Meers", role = "aut", email = "[email protected]"),
Expand All @@ -12,7 +12,6 @@ Authors@R: c(
Maintainer: Matt Cowgill <[email protected]>
Description: Downloads, imports, and tidies time series data from the
Australian Bureau of Statistics <https://www.abs.gov.au/>.
Date: 2023-08-03
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (>= 3.5)
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# readabs 0.4.14.90x
# readabs 0.4.15
* read_lfs_datacube() convenience function added
* New environment variable "R_READABS_DL_METHOD" can be set. When set, this is passed to the `method` argument of `download.file()`. Useful on networks where a method such as "wininet" must be used.

# readabs 0.4.14
* Fixes made to read_payrolls() to reflect changes by the ABS
Expand Down
3 changes: 2 additions & 1 deletion R/check_abs_connection.R
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@ url_exists <- function(url = "https://www.abs.gov.au") {
#' 200 range; `FALSE` otherwise.
#' @noRd
url_exists_nocurl <- function(url = "https://www.abs.gov.au") {
con <- url(url)
con <- url(url,
Sys.getenv("R_READABS_DL_METHOD", unset = "default"))
out <- suppressWarnings(tryCatch(readLines(con), error = function(e) e))
abs_url_works <- all(class(out) != "error")
close(con)
Expand Down
8 changes: 6 additions & 2 deletions R/download_abs.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,19 @@ download_abs <- function(urls,
return(TRUE)
}

dl_file <- function(url, destfile, quiet = TRUE) {
dl_file <- function(url,
destfile,
quiet = TRUE,
method = Sys.getenv("R_READABS_DL_METHOD", unset = "auto")) {
suppressWarnings(
utils::download.file(
url = url,
destfile = destfile,
mode = "wb",
quiet = quiet,
headers = readabs_header,
cacheOK = FALSE
cacheOK = FALSE,
method = method
)
)
}
Expand Down
9 changes: 9 additions & 0 deletions R/read_abs.R
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,15 @@
#' your `.Renviron` file and add \code{R_READABS_PATH = <path>} line.
#' The easiest way to edit this file is using \code{usethis::edit_r_environ()}.
#'
#' Certain corporate networks restrict your ability to download files in an R
#' session. On some of these networks, the `"wininet"` method must be used when
#' downloading files. Users can now specify the method that will be used to
#' download files by setting the `"R_READABS_DL_METHOD"` environment variable.
#'
#' For example, the following code sets the environment variable for your
#' current session: s`Sys.setenv("R_READABS_DL_METHOD" = "wininet")`
#' You can add `"R_READABS_DL_METHOD"` to your .Rprofile to have this persist across sessions.
#'
#' The `release_date` argument allows you to download table(s) other than the
#' latest release. This is useful for examining revisions to time series, or
#' for obtaining the version of series that were available on a given date.
Expand Down
2 changes: 1 addition & 1 deletion R/read_awe.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#' 6302.0, Average Weekly Earnings, Australia.
#' @title read_awe
#' @param wage_measure Character of length 1. Must be one of:
#' \itemize{
#' \describe{
#' \item{`awote`}{ Average weekly ordinary time earnings; also known as Full-time adult ordinary time earnings}
#' \item{`ftawe`}{ Full-time adult total earnings}
#' \item{`awe`}{ Average weekly total earnings of all employees}
Expand Down
10 changes: 1 addition & 9 deletions R/read_payrolls.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#' import the payrolls data, and then tidies it up.
#'
#' @param series Character. Must be one of:
#' \itemize{
#' \describe{
#' \item{"industry_jobs"}{ Payroll jobs by industry division, state, sex, and age
#' group (Table 4)}
#' \item{"subindustry_jobs"}{ Payroll jobs by industry sub-division and
Expand Down Expand Up @@ -59,14 +59,6 @@ read_payrolls <- function(series = c(
)) {
check_abs_connection()

if (series == "industry_wages") {
stop("The ABS removed wages totals from the Weekly Payrolls Jobs release.")
}

if (series %in% c("sa4_jobs", "sa3_jobs", "gccsa_jobs")) {
stop("The ABS removed the payroll jobs by SA3/SA4/capital city series from the Weekly Payroll Jobs release.")
}

series <- match.arg(series)

cube_name <- switch(series,
Expand Down
Binary file modified R/sysdata.rda
Binary file not shown.
22 changes: 18 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ payrolls_t4_path

The `download_abs_data_cube()` function downloads the file and returns the full file path to the saved file. You can then pipe that in to another function:

```{r}
```{r read-payrolls-manual, eval = FALSE}
payrolls_t4_path %>%
readxl::read_excel(
sheet = "Payroll jobs index",
Expand All @@ -168,13 +168,13 @@ payrolls_t4_path %>%

As it happens, if you want the ABS Weekly Payrolls data, you don't need to use `download_abs_data_cube()` directly. Instead, there is a convenience function available that downloads, imports, and tidies the data for you:

```{r}
```{r read-payrolls-fn, eval = FALSE}
read_payrolls()
```

There is also a convenience function available for data cube GM1 from the monthly Labour Force data, which contains labour force gross flows:

```{r}
```{r read-lfs-grossflows, eval = FALSE}
read_lfs_grossflows()
```

Expand All @@ -190,7 +190,7 @@ The {readabs} package includes functions to query the ABS.Stat API. Thank you to
* `read_api()` downloads data from the ABS.Stat API.

Let's list available dataflows:
```{r}
```{r api-flows}
flows <- read_api_dataflows()
```

Expand Down Expand Up @@ -218,6 +218,20 @@ read_api("ABORIGINAL_POP_PROJ", datakey = list(sex_abs = 1))

Note that in some cases, querying the API without filtering the data will return an error, as the table will be too big. In this case, you will need to supply a datakey that reduces the size of the data.

## Resolving network issues by manually setting the download method

Certain corporate networks restrict your ability to download files in an R session. On some of these networks, the `"wininet"` method must be used when downloading files. Users can now specify the method that will be used to download files by setting the `"R_READABS_DL_METHOD"` environment variable.

For example, the following code sets the environment variable for your current session:

```{r, eval = FALSE}
Sys.setenv("R_READABS_DL_METHOD" = "wininet")
```

You can add `"R_READABS_DL_METHOD"` to your .Rprofile to have this persist across sessions.

If you have other issues using `{readabs}` in your corporate environment, I would appreciate you opening an issue on GitHub.

## Bug reports and feedback
GitHub issues containing error reports or feature requests are welcome. Please try to make a [reprex](https://reprex.tidyverse.org) (a minimal, reproducible example) if possible.

Expand Down
118 changes: 44 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,19 +83,19 @@ This is what it looks like:

``` r
str(all_wpi)
#> tibble [65,579 × 12] (S3: tbl_df/tbl/data.frame)
#> $ table_no : chr [1:65579] "634501" "634501" "634501" "634501" ...
#> $ sheet_no : chr [1:65579] "Data1" "Data1" "Data1" "Data1" ...
#> $ table_title : chr [1:65579] "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" ...
#> $ date : Date[1:65579], format: "1997-09-01" "1997-09-01" ...
#> $ series : chr [1:65579] "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Public ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private and Public ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private ; All industries ;" ...
#> $ value : num [1:65579] 67.4 64.7 66.7 67.3 64.8 66.6 67.3 64.8 66.7 NA ...
#> $ series_type : chr [1:65579] "Original" "Original" "Original" "Seasonally Adjusted" ...
#> $ data_type : chr [1:65579] "INDEX" "INDEX" "INDEX" "INDEX" ...
#> $ collection_month: chr [1:65579] "3" "3" "3" "3" ...
#> $ frequency : chr [1:65579] "Quarter" "Quarter" "Quarter" "Quarter" ...
#> $ series_id : chr [1:65579] "A2603039T" "A2603989W" "A2603609J" "A2713846W" ...
#> $ unit : chr [1:65579] "Index Numbers" "Index Numbers" "Index Numbers" "Index Numbers" ...
#> tibble [68,137 × 12] (S3: tbl_df/tbl/data.frame)
#> $ table_no : chr [1:68137] "634501" "634501" "634501" "634501" ...
#> $ sheet_no : chr [1:68137] "Data1" "Data1" "Data1" "Data1" ...
#> $ table_title : chr [1:68137] "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" "Table 1. Total Hourly Rates of Pay Excluding Bonuses: Sector, Original, Seasonally Adjusted and Trend" ...
#> $ date : Date[1:68137], format: "1997-09-01" "1997-09-01" ...
#> $ series : chr [1:68137] "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Public ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private and Public ; All industries ;" "Quarterly Index ; Total hourly rates of pay excluding bonuses ; Australia ; Private ; All industries ;" ...
#> $ value : num [1:68137] 67.4 64.7 66.7 67.3 64.8 66.6 67.3 64.8 66.7 NA ...
#> $ series_type : chr [1:68137] "Original" "Original" "Original" "Seasonally Adjusted" ...
#> $ data_type : chr [1:68137] "INDEX" "INDEX" "INDEX" "INDEX" ...
#> $ collection_month: chr [1:68137] "3" "3" "3" "3" ...
#> $ frequency : chr [1:68137] "Quarter" "Quarter" "Quarter" "Quarter" ...
#> $ series_id : chr [1:68137] "A2603039T" "A2603989W" "A2603609J" "A2713846W" ...
#> $ unit : chr [1:68137] "Index Numbers" "Index Numbers" "Index Numbers" "Index Numbers" ...
```

It only takes you a few lines of code to make a graph from your data:
Expand Down Expand Up @@ -187,7 +187,7 @@ search_catalogues("payroll")
#> # A tibble: 2 × 4
#> heading sub_heading catalogue url
#> <chr> <chr> <chr> <chr>
#> 1 Jobs Weekly Payroll Jobs and Wages in Australia weekly-payr… http…
#> 1 Jobs Weekly Payroll Jobs weekly-payr… http…
#> 2 Jobs Weekly Payroll Jobs and Wages in Australia, Interim weekly-payr… http…
```

Expand All @@ -197,17 +197,16 @@ available to download from this catalogue:

``` r
show_available_files("weekly-payroll-jobs")
#> # A tibble: 8 × 3
#> # A tibble: 7 × 3
#> label file url
#> <chr> <chr> <chr>
#> 1 Table 20: Payroll jobs - characteristics distributionsContains se… 6160… http…
#> 2 Table 4: Payroll jobs and wages indexes 6160… http…
#> 3 Table 5: Sub-state - Payroll jobs indexes 6160… http…
#> 4 Table 6: Industry subdivision - Payroll jobs indexes 6160… http…
#> 5 Table 7: Employer characteristics - Payroll jobs index 6160… http…
#> 6 Table 8: Jobholder characteristics - Payroll jobs index 6160… http…
#> 7 Table 9: Sector - Payroll jobs index 6160… http…
#> 8 All data cubes 6160… http…
#> 2 Table 4: Payroll jobs indexes 6160… http…
#> 3 Table 6: Industry subdivision - Payroll jobs indexes 6160… http…
#> 4 Table 7: Employer characteristics - Payroll jobs index 6160… http…
#> 5 Table 8: Jobholder characteristics - Payroll jobs index 6160… http…
#> 6 Table 9: Sector - Payroll jobs index 6160… http…
#> 7 All data cubes 6160… http…
```

We want Table 4, which has the filename `6160055001_DO004.xlsx`.
Expand All @@ -216,10 +215,10 @@ We can download the file as follows:

``` r
payrolls_t4_path <- download_abs_data_cube("weekly-payroll-jobs", "004")
#> File downloaded in /var/folders/bc/6jy526c12dq5zpkxf8fj7f7h0000gq/T//RtmpcSzlYa/6160055001_DO004.xlsx
#> File downloaded in /tmp/Rtmpgdh0CC/6160055001_DO004.xlsx

payrolls_t4_path
#> [1] "/var/folders/bc/6jy526c12dq5zpkxf8fj7f7h0000gq/T//RtmpcSzlYa/6160055001_DO004.xlsx"
#> [1] "/tmp/Rtmpgdh0CC/6160055001_DO004.xlsx"
```

The `download_abs_data_cube()` function downloads the file and returns
Expand All @@ -232,26 +231,6 @@ payrolls_t4_path %>%
sheet = "Payroll jobs index",
skip = 5
)
#> # A tibble: 4,322 × 184
#> `State or Territory` `Industry division` Sex `Age group` `43834` `43841`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 0. Australia 0. All industries 0. Pers… 0. All ages 92.72 95.17
#> 2 0. Australia 0. All industries 0. Pers… 1. 15-19 92.3 94.95
#> 3 0. Australia 0. All industries 0. Pers… 2. 20-29 92.46 95.28
#> 4 0. Australia 0. All industries 0. Pers… 3. 30-39 93.28 95.66
#> 5 0. Australia 0. All industries 0. Pers… 4. 40-49 93.03 95.27
#> 6 0. Australia 0. All industries 0. Pers… 5. 50-59 92.94 95.27
#> 7 0. Australia 0. All industries 0. Pers… 6. 60-69 91.27 93.52
#> 8 0. Australia 0. All industries 0. Pers… 7. 70 and … 87.71 90.09
#> 9 0. Australia 0. All industries 1. Males 0. All ages 92.59 95.63
#> 10 0. Australia 0. All industries 1. Males 1. 15-19 NA NA
#> # ℹ 4,312 more rows
#> # ℹ 178 more variables: `43848` <chr>, `43855` <chr>, `43862` <chr>,
#> # `43869` <chr>, `43876` <chr>, `43883` <chr>, `43890` <chr>, `43897` <chr>,
#> # `43904` <chr>, `43911` <chr>, `43918` <chr>, `43925` <chr>, `43932` <chr>,
#> # `43939` <chr>, `43946` <chr>, `43953` <chr>, `43960` <chr>, `43967` <chr>,
#> # `43974` <chr>, `43981` <chr>, `43988` <chr>, `43995` <chr>, `44002` <chr>,
#> # `44009` <chr>, `44016` <chr>, `44023` <chr>, `44030` <chr>, …
```

#### Convenience functions for data cubes
Expand All @@ -263,43 +242,13 @@ data for you:

``` r
read_payrolls()
#> File downloaded in /var/folders/bc/6jy526c12dq5zpkxf8fj7f7h0000gq/T//RtmpcSzlYa/6160055001_DO004.xlsx
#> # A tibble: 151,920 × 7
#> state industry sex age date value series
#> <chr> <chr> <chr> <chr> <date> <dbl> <chr>
#> 1 Australia All industries Persons All ages 2020-01-04 92.7 jobs
#> 2 Australia All industries Persons All ages 2020-01-11 95.2 jobs
#> 3 Australia All industries Persons All ages 2020-01-18 96.7 jobs
#> 4 Australia All industries Persons All ages 2020-01-25 97.5 jobs
#> 5 Australia All industries Persons All ages 2020-02-01 98.1 jobs
#> 6 Australia All industries Persons All ages 2020-02-08 98.7 jobs
#> 7 Australia All industries Persons All ages 2020-02-15 99.2 jobs
#> 8 Australia All industries Persons All ages 2020-02-22 99.5 jobs
#> 9 Australia All industries Persons All ages 2020-02-29 99.5 jobs
#> 10 Australia All industries Persons All ages 2020-03-07 99.9 jobs
#> # ℹ 151,910 more rows
```

There is also a convenience function available for data cube GM1 from
the monthly Labour Force data, which contains labour force gross flows:

``` r
read_lfs_grossflows()
#> File downloaded in /var/folders/bc/6jy526c12dq5zpkxf8fj7f7h0000gq/T//RtmpcSzlYa/GM1.xlsx
#> # A tibble: 1,030,712 × 9
#> date sex age state lfs_current lfs_previous persons unit weights
#> <date> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 2003-07-01 Males 15-19 … New … Employed f… Employed fu… 30.6 000s curren…
#> 2 2003-07-01 Males 15-19 … New … Employed f… Employed pa… 2.87 000s curren…
#> 3 2003-07-01 Males 15-19 … New … Employed f… Not in the … 2.03 000s curren…
#> 4 2003-07-01 Males 15-19 … New … Employed f… Unmatched i… 4.44 000s curren…
#> 5 2003-07-01 Males 15-19 … New … Employed f… Incoming ro… 4.67 000s curren…
#> 6 2003-07-01 Males 15-19 … New … Employed p… Employed fu… 2.34 000s curren…
#> 7 2003-07-01 Males 15-19 … New … Employed p… Employed pa… 32.6 000s curren…
#> 8 2003-07-01 Males 15-19 … New … Employed p… Unemployed 3.13 000s curren…
#> 9 2003-07-01 Males 15-19 … New … Employed p… Not in the … 3.63 000s curren…
#> 10 2003-07-01 Males 15-19 … New … Employed p… Unmatched i… 2.05 000s curren…
#> # ℹ 1,030,702 more rows
```

### Finding and loading data from the ABS.Stat API
Expand Down Expand Up @@ -403,6 +352,27 @@ Note that in some cases, querying the API without filtering the data
will return an error, as the table will be too big. In this case, you
will need to supply a datakey that reduces the size of the data.

## Resolving network issues by manually setting the download method

Certain corporate networks restrict your ability to download files in an
R session. On some of these networks, the `"wininet"` method must be
used when downloading files. Users can now specify the method that will
be used to download files by setting the `"R_READABS_DL_METHOD"`
environment variable.

For example, the following code sets the environment variable for your
current session:

``` r
Sys.setenv("R_READABS_DL_METHOD" = "wininet")
```

You can add `"R_READABS_DL_METHOD"` to your .Rprofile to have this
persist across sessions.

If you have other issues using `{readabs}` in your corporate
environment, I would appreciate you opening an issue on GitHub.

## Bug reports and feedback

GitHub issues containing error reports or feature requests are welcome.
Expand Down
Binary file modified man/figures/README-all-in-one-example-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions man/read_abs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_awe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/read_payrolls.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 47b4f9d

Please sign in to comment.