Specify default field type for csv data details API pulls #22

tpo5 · 2024-12-06T18:19:27Z

I recently discovered that R will occasionally pick a field type for a csv data details that is not compatible with all values in the dataset when doing an API pull. In my case, R was setting Visit_ID and Medical_Record_Number to a double, but there were a handful of character values in the dataset that got converted to NA in the resulting R dataframe. Character values are allowed in both of these fields, they just aren't that common. This results in the incompatible values being converted to NAs in the resulting dataframe. In my situation, I had automated the API pull, so I never saw any warnings from R indicating the data had been replaced with NAs.

Once I discovered the issue, the solution was simply to specify the field type for these 2 columns to be character in the API call function. I'm nervous now that there are other fields where this same type of thing may be occurring but I'm just not aware of it, but it is a lot of work to specify the field type of all 213 fields and I don't want to have to do this every time I pull data details.

It'd be ideal if users didn't have to stumble upon this issue or specify the field type for every field "just in case" this occurs. Would it be possible to update the get_essence_data() function to set the default field type for each column in data details so that it's already taken care of every time?

library(Rnssp)

# Dates
startdate <- "2015-01-01"
enddate <- format(Sys.Date(), "%Y-%m-%d")
updatedate <- format(Sys.Date(), "%d%h%y")

# Pull data from RI NSSP
myProfile <- create_profile()

# pull data from all facilities
url <- paste0("https://essence.syndromicsurveillance.org/nssp_essence/api/dataDetails/csv?datasource=va_er&startDate=7May2024&medicalGroupingSystem=essencesyndromes&userId=6861&endDate=7May2024&percentParam=noPercent&lastUpdatedDateTimeOperator=gte&site=922&aqtTarget=DataDetails&geographySystem=region&detector=probrepswitch&timeResolution=daily&lastUpdatedDateTime=",updatedate)

# R will autodetermine the field type for each column. Occasionally it will designate 
# a field to a type that is not compatible with all values in the dataset resulting in
# NAs being written to the dataframe.
nssporig <- get_essence_data(url, startdate, enddate)

# Column types can be set by the user. This ensures R will use a field type that 
# is compatible with all values in the dataset and prevent NAs from replacing actual values. 
nssporig <- get_essence_data(url, startdate, enddate, col_types=list(Visit_ID="c", Medical_Record_Number="c"))

I'd like to request that all columns in data details have a pre-specified field type that matches with the field type in ESSENCE to avoid NA values from errantly being brought into R.

rosericazondekon · 2024-12-10T16:18:01Z

Dear @tpo5 ,

Thank you for submitting this issue and sharing your thoughts. You're absolutely right, specifying the field type for "all 213 fields" is not a practical solution.
To address this, I recommend setting the default type for all fields to character in the get_essence_data() function. This can be achieved as follows:

nssporig <- get_essence_data(url, startdate, enddate, col_types = readr::cols(.default = "c"))

Once the data is loaded, you can use dplyr::mutate() or similar functions to specify the field types of particular columns based on your requirements.
Additionally, we value your suggestion and will incorporate it by introducing a default argument for the col_types parameter in future updates to the function (as of Rnssp v0.3.1). This should make the process more user-friendly and flexible.

Thank you again for your valuable input, and please feel free to share any additional thoughts or questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify default field type for csv data details API pulls #22

Specify default field type for csv data details API pulls #22

tpo5 commented Dec 6, 2024

rosericazondekon commented Dec 10, 2024

Specify default field type for csv data details API pulls #22

Specify default field type for csv data details API pulls #22

Comments

tpo5 commented Dec 6, 2024

rosericazondekon commented Dec 10, 2024