Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DESCRIPTION #47

Merged
merged 14 commits into from
Jul 10, 2015
29 changes: 17 additions & 12 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Package: RSocrata
Type: Package
Title: Download 'Socrata' Data Sets as R Data Frames
Description: Provides easier interaction with
Socrata open data portals http://dev.socrata.com.
Expand All @@ -8,18 +9,22 @@ Description: Provides easier interaction with
returns an R data frame.
Converts dates to 'POSIX' format.
Manages throttling by 'Socrata'.
Version: 1.6.1-2
Date: 2015-6-5
URL: https://github.com/Chicago/RSocrata
BugReports: https://github.com/Chicago/RSocrata/issues
Imports:
httr (>= 0.3),
jsonlite (>= 0.9.14),
mime (>= 0.2),
Version: 1.6.2
Date: 2015-6-8
Authors@R: c(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll actually change this back to the old Author before pushing to CRAN. The CRAN guys are really picky that my email address in Authors@R match that of the Maintainer's email address. However, I like to separate the emails for general contact information from contacting me for package maintenance (CRAN will send e-mail blasts and I also use the email address in other projects I maintain). I prefer the Authors@R, but was getting cross with the CRAN submission process so used the old style.

person("Hugh", "Devlin, Ph. D.", role = c("aut")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep the URL and BugReports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a specious remove. The lines are "added" (i.e., kept) on lines 29 and 30 in the new file.

person("Tom", "Schenk", role = c("cre"), email = "[email protected]")
)
Maintainer: Tom Schenk <[email protected]>
Depends:
curl (>= 0.5)
R (>= 3.0.0)
Imports:
httr (>= 1.0.0),
jsonlite (>= 0.9.16),
mime (>= 0.3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check

Suggests:
RUnit
Author: Hugh Devlin, Ph. D. and Tom Schenk, Jr.
Maintainer: Tom Schenk Jr <[email protected]>
RUnit,
roxygen2 (>= 4.1.0)
License: MIT + file LICENSE
URL: https://github.com/Chicago/RSocrata
BugReports: https://github.com/Chicago/RSocrata/issues
13 changes: 8 additions & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Generated by roxygen2 (4.1.1): do not edit by hand

export(fieldName)
export(ls.socrata)
export(posixify)
export(read.socrata)
export(ls.socrata)
importFrom("httr", "parse_url", "build_url", "http_status", "stop_for_status", "GET", "content")
importFrom("mime", "guess_type")
importFrom("jsonlite", "fromJSON")
import("curl")
import(httr)
import(jsonlite)
import(mime)
importFrom(httr,build_url)
importFrom(httr,parse_url)
90 changes: 50 additions & 40 deletions R/RSocrata.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ isFourByFour <- function(fourByFour) {
#' URL. Will accept queries with optional API token as a separate
#' argument or will also accept API token in the URL query. Will
#' resolve conflicting API token by deferring to original URL.
#' @param url a string; character vector of length one
#' @param url a string; character vector of length one
#' @param app_token a string; SODA API token used to query the data
#' portal \url{http://dev.socrata.com/consumers/getting-started.html}
#' @return a valid Url
#' @import httr
#' @author Tom Schenk Jr \email{tom.schenk@@cityofchicago.org}
validateUrl <- function(url, app_token) {
url <- as.character(url)
parsedUrl <- httr::parse_url(url)
parsedUrl <- parse_url(url)
if(is.null(parsedUrl$scheme) | is.null(parsedUrl$hostname) | is.null(parsedUrl$path))
stop(url, " does not appear to be a valid URL.")
if(!is.null(app_token)) { # Handles the addition of API token and resolves invalid uses
Expand All @@ -67,14 +68,14 @@ validateUrl <- function(url, app_token) {
})
}
if(substr(parsedUrl$path, 1, 9) == 'resource/') {
return(httr::build_url(parsedUrl)) # resource url already
return(build_url(parsedUrl)) # resource url already
}
fourByFour <- basename(parsedUrl$path)
if(!isFourByFour(fourByFour))
stop(fourByFour, " is not a valid Socrata dataset unique identifier.")
else {
parsedUrl$path <- paste('resource/', fourByFour, '.csv', sep="")
httr::build_url(parsedUrl)
build_url(parsedUrl)
}
}

Expand All @@ -89,7 +90,7 @@ validateUrl <- function(url, app_token) {
#' @export
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @examples
#' #fieldName("Number.of.Stations") # number_of_stations
#' fieldName("Number.of.Stations") # number_of_stations
fieldName <- function(humanName) {
tolower(gsub('\\.', '_', as.character(humanName)))
}
Expand All @@ -110,35 +111,38 @@ posixify <- function(x) {
strptime(x, format="%m/%d/%Y %I:%M:%S %p") # long date-time format
}

# Wrap httr GET in some diagnostics
#
# In case of failure, report error details from Socrata
#
# @param url Socrata Open Data Application Program Interface (SODA) query
# @return httr response object
# @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' Wrap httr GET in some diagnostics
#'
#' In case of failure, report error details from Socrata
#'
#' @param url Socrata Open Data Application Program Interface (SODA) query
#' @return httr response object
#' @import httr
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
getResponse <- function(url) {
response <- httr::GET(url)
status <- httr::http_status(response)
response <- GET(url)
status <- http_status(response)
if(response$status_code != 200) {
msg <- paste("Error in httr GET:", response$status_code, response$headers$statusmessage, url)
if(!is.null(response$headers$`content-length`) && (response$headers$`content-length` > 0)) {
details <- httr::content(response)
details <- content(response)
msg <- paste(msg, details$code[1], details$message[1])
}
logMsg(msg)
}
httr::stop_for_status(response)
stop_for_status(response)
response
}

# Content parsers
#
# Return a data frame for csv
#
# @author Hugh J. Devlin \email{Hugh.Devlin@@cityofchicago.org}
# @param an httr response object
# @return data frame, possibly empty
#' Content parsers
#'
#' Return a data frame for csv
#'
#' @author Hugh J. Devlin \email{Hugh.Devlin@@cityofchicago.org}
#' @import httr
#' @param an httr response object
#' @return data frame, possibly empty
#' @noRd
getContentAsDataFrame <- function(response) { UseMethod('response') }
getContentAsDataFrame <- function(response) {
mimeType <- response$header$'content-type'
Expand All @@ -147,25 +151,27 @@ getContentAsDataFrame <- function(response) {
if(sep != -1) mimeType <- substr(mimeType, 0, sep[1] - 1)
switch(mimeType,
'text/csv' =
httr::content(response), # automatic parsing
content(response), # automatic parsing
'application/json' =
if(httr::content(response, as='text') == "[ ]") # empty json?
if(content(response, as='text') == "[ ]") # empty json?
data.frame() # empty data frame
else
data.frame(t(sapply(httr::content(response), unlist)), stringsAsFactors=FALSE)
data.frame(t(sapply(content(response), unlist)), stringsAsFactors=FALSE)
) # end switch
}

# Get the SoDA 2 data types
#
# Get the Socrata Open Data Application Program Interface data types from the http response header
# @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
# @param responseHeaders headers attribute from an httr response object
# @return a named vector mapping field names to data types
#' Get the SoDA 2 data types
#'
#' Get the Socrata Open Data Application Program Interface data types from the http response header
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @param responseHeaders headers attribute from an httr response object
#' @return a named vector mapping field names to data types
#' @import jsonlite
#' @noRd
getSodaTypes <- function(response) { UseMethod('response') }
getSodaTypes <- function(response) {
result <- jsonlite::fromJSON(response$headers[['x-soda2-types']])
names(result) <- jsonlite::fromJSON(response$headers[['x-soda2-fields']])
result <- fromJSON(response$headers[['x-soda2-types']])
names(result) <- fromJSON(response$headers[['x-soda2-fields']])
result
}

Expand All @@ -182,14 +188,16 @@ getSodaTypes <- function(response) {
#' @param app_token a string; SODA API token used to query the data
#' portal \url{http://dev.socrata.com/consumers/getting-started.html}
#' @return an R data frame with POSIX dates
#' @export
#' @author Hugh J. Devlin, Ph. D. \email{Hugh.Devlin@@cityofchicago.org}
#' @examples
#' df <- read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv")
#' @importFrom httr parse_url build_url
#' @import mime
#' @export
read.socrata <- function(url, app_token = NULL) {
validUrl <- validateUrl(url, app_token) # check url syntax, allow human-readable Socrata url
parsedUrl <- httr::parse_url(validUrl)
mimeType <- mime::guess_type(parsedUrl$path)
parsedUrl <- parse_url(validUrl)
mimeType <- guess_type(parsedUrl$path)
if(!(mimeType %in% c('text/csv','application/json')))
stop("Error in read.socrata: ", mimeType, " not a supported data format.")
response <- getResponse(validUrl)
Expand All @@ -214,17 +222,19 @@ read.socrata <- function(url, app_token = NULL) {
#' @param url A Socrata URL. This simply points to the site root.
#' @return an R data frame containing a listing of datasets along with
#' various metadata.
#' @export
#' @author Peter Schmiedeskamp \email{pschmied@@uw.edu}
#' @examples
#' df <- ls.socrata("http://soda.demo.socrata.com")
#' @import jsonlite
#' @import httr
#' @export
ls.socrata <- function(url) {
url <- as.character(url)
parsedUrl <- httr::parse_url(url)
parsedUrl <- parse_url(url)
if(is.null(parsedUrl$scheme) | is.null(parsedUrl$hostname))
stop(url, " does not appear to be a valid URL.")
parsedUrl$path <- "data.json"
df <- jsonlite::fromJSON(httr::build_url(parsedUrl))
df <- fromJSON(build_url(parsedUrl))
df <- as.data.frame(df$dataset)
df$issued <- as.POSIXct(df$issued)
df$modified <- as.POSIXct(df$modified)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ If you would like to contribute to this project, please see the [contributing do
1.4 Add json file format for Socrata downloads. Switch to RJSONIO rom rjson.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should correct this too. We went from RJSONIO to rjson


1.5 Several changes:
* Swapped ```jsonlite``` to ```RJSONIO```
* Swapped ```jsonlite``` from ```RJSONIO```
* Added handling for long and short dates
* Added unit test for reading private datasets

Expand Down
1 change: 1 addition & 0 deletions RSocrata.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ RnwWeave: Sweave
LaTeX: pdfLaTeX

BuildType: Package
PackageUseDevtools: Yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to invs.

PackageInstallArgs: --no-multiarch --with-keep.source
PackageCheckArgs: --as-cran
PackageRoxygenize: rd,collate,namespace