Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify default field type for csv data details API pulls #22

Open
tpo5 opened this issue Dec 6, 2024 · 1 comment
Open

Specify default field type for csv data details API pulls #22

tpo5 opened this issue Dec 6, 2024 · 1 comment

Comments

@tpo5
Copy link

tpo5 commented Dec 6, 2024

I recently discovered that R will occasionally pick a field type for a csv data details that is not compatible with all values in the dataset when doing an API pull. In my case, R was setting Visit_ID and Medical_Record_Number to a double, but there were a handful of character values in the dataset that got converted to NA in the resulting R dataframe. Character values are allowed in both of these fields, they just aren't that common. This results in the incompatible values being converted to NAs in the resulting dataframe. In my situation, I had automated the API pull, so I never saw any warnings from R indicating the data had been replaced with NAs.

Once I discovered the issue, the solution was simply to specify the field type for these 2 columns to be character in the API call function. I'm nervous now that there are other fields where this same type of thing may be occurring but I'm just not aware of it, but it is a lot of work to specify the field type of all 213 fields and I don't want to have to do this every time I pull data details.

It'd be ideal if users didn't have to stumble upon this issue or specify the field type for every field "just in case" this occurs. Would it be possible to update the get_essence_data() function to set the default field type for each column in data details so that it's already taken care of every time?

library(Rnssp)

# Dates
startdate <- "2015-01-01"
enddate <- format(Sys.Date(), "%Y-%m-%d")
updatedate <- format(Sys.Date(), "%d%h%y")

# Pull data from RI NSSP
myProfile <- create_profile()

# pull data from all facilities
url <- paste0("https://essence.syndromicsurveillance.org/nssp_essence/api/dataDetails/csv?datasource=va_er&startDate=7May2024&medicalGroupingSystem=essencesyndromes&userId=6861&endDate=7May2024&percentParam=noPercent&lastUpdatedDateTimeOperator=gte&site=922&aqtTarget=DataDetails&geographySystem=region&detector=probrepswitch&timeResolution=daily&lastUpdatedDateTime=",updatedate)

# R will autodetermine the field type for each column. Occasionally it will designate 
# a field to a type that is not compatible with all values in the dataset resulting in
# NAs being written to the dataframe.
nssporig <- get_essence_data(url, startdate, enddate)

# Column types can be set by the user. This ensures R will use a field type that 
# is compatible with all values in the dataset and prevent NAs from replacing actual values. 
nssporig <- get_essence_data(url, startdate, enddate, col_types=list(Visit_ID="c", Medical_Record_Number="c"))

I'd like to request that all columns in data details have a pre-specified field type that matches with the field type in ESSENCE to avoid NA values from errantly being brought into R.

@rosericazondekon
Copy link
Collaborator

Dear @tpo5 ,

Thank you for submitting this issue and sharing your thoughts. You're absolutely right, specifying the field type for "all 213 fields" is not a practical solution.
To address this, I recommend setting the default type for all fields to character in the get_essence_data() function. This can be achieved as follows:

nssporig <- get_essence_data(url, startdate, enddate, col_types = readr::cols(.default = "c"))

Once the data is loaded, you can use dplyr::mutate() or similar functions to specify the field types of particular columns based on your requirements.
Additionally, we value your suggestion and will incorporate it by introducing a default argument for the col_types parameter in future updates to the function (as of Rnssp v0.3.1). This should make the process more user-friendly and flexible.

Thank you again for your valuable input, and please feel free to share any additional thoughts or questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants