-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.socrata() fails on some datasets due to missing response headers from server #118
Comments
Thanks @pschmied, I can replicate locally and appears to be a valid bug. @chrismetcalf - do you think |
@Chicago/rsocrata-devs - I've created a new branch Looping-in @levyj to help navigate the NBE lingo. @nicklucius - can you look at this code? |
@tomschenkjr My hunch is that pvug-y23y WILL change. Maybe we could upload a classic, open, very unlikely to change dataset such as Before modifying the RSocrata package, we might see if @chrismetcalf can verify that indeed the correct set of headers are being returned from the Socrata side. If NBE-backed datasets are supposed to be returned with type information, then that would suggest a fix on our end. |
@pschmied - first, congrats on the new job! 🍾 I agree on having some test data sets that we can use for unit testing and suggested it in #45. We've been using the Earthquake example--and it's been fine--but having one for different versions of SoDA would be helpful. We'll hold off on work for now (unassigning @levyj and @nicklucius for now). |
@tomschenkjr - I just saw your last message, but was about to post this: I agree this code below causes an error because the response header
As a fix, for any URL where
|
Thank you, @tomschenkjr ! I'm very excited--have been beneficiary of Socrata-backed open-data portals for some time now. As for @nicklucius 's approach, that seems very reasonable. Sadly, I'm too new on the job to know what is the better long-term solution. That's where I'll rely on Chris, who'll have a much better sense of where the API is, and where it's likely to go ;-) |
Catching up...
I wouldn't do that, since the dataset is likely to change time. However, you could always create a unit test that passes a known @tomschenkjr @levyj In general, would it be good to have a couple of datasets that I maintain in a "known state" for creating test suites against? I could easily set up a couple of datasets somewhere that you could use to test different conditions. For |
@chrismetcalf - 10-4 on not relying row counts. Its be very helpful to have static data sets for testing with known conditions. For the underlying issues, is the SoDA header in NBE returning the correct headers in this case? _ On Fri, Nov 18, 2016 at 6:34 PM -0600, "Chris Metcalf" <[email protected]mailto:[email protected]> wrote: Catching up... @chrismetcalfhttps://github.com/chrismetcalf - do you think pvug-y23y will always have the same number of rows? Want to create a unit test for SoDA 2 support and curious if I can always check this data set has 68,087 rows. I wouldn't do that, since the dataset is likely to change time. However, you could always create a unit test that passes a known $limit and set an assertion on that. @tomschenkjrhttps://github.com/tomschenkjr @levyjhttps://github.com/levyj In general, would it be good to have a couple of datasets that I maintain in a "known state" for creating test suites against? I could easily set up a couple of datasets somewhere that you could use to test different conditions. For soda-ruby, I've stubbed out the actual HTTP calls, but it might be good to have some integration tests too. You are receiving this because you were mentioned. This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof. |
It does appear that the |
Hey @chrismetcalf - just wanted to follow-up on this to see if this was an error in the return headers or if we shouldn't be expecting those |
@tomschenkjr I just bumped the JIRA ticket with our engineering team to see if I can get an update. Internal ticket (you won't be able to open that link): https://socrata.atlassian.net/browse/EN-12140 |
Thanks!
_
Tom Schenk Jr.
Chief Data Officer
Department of Innovation and Technology
City of Chicago
(312) 744-2770
[email protected] | @ChicagoCDO
data.cityofchicago.org | opengrid.io | digital.cityofchicago.org | chicago.github.io | dev.cityofchicago.org
On Tue, Dec 6, 2016 at 2:11 AM -0300, "Chris Metcalf" <[email protected]<mailto:[email protected]>> wrote:
@tomschenkjr<https://github.com/tomschenkjr> I just bumped the JIRA ticket with our engineering team to see if I can get an update.
Internal ticket (you won't be able to open that link): https://socrata.atlassian.net/browse/EN-12140
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#118 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABkC0RQWjbe4RTq-JhjUXGRLYfgNHT8uks5rFO5ggaJpZM4K0cmO>.
…________________________________
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
|
@chrismetcalf - just following-up on this. |
@tomschenkjr I just checked in on the ticket and it's still in the queue. I bumped the priority, but if you file a ticket with Support and ask for it to be linked to |
Will do.
From: Chris Metcalf [mailto:[email protected]]
Sent: Tuesday, January 10, 2017 2:36 PM
To: Chicago/RSocrata <[email protected]>
Cc: Schenk, Tom <[email protected]>; Mention <[email protected]>
Subject: Re: [Chicago/RSocrata] read.socrata() fails on some datasets due to missing response headers from server (#118)
@tomschenkjr<https://github.com/tomschenkjr> I just checked in on the ticket and it's still in the queue. I bumped the priority, but if you file a ticket with Support and ask for it to be linked to EN-12140, that'll probably get it more attention.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#118 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABkC0UmgMdj54lSLeqSBudGEOA19wN97ks5rQ-vBgaJpZM4K0cmO>.
…________________________________
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
|
Thanks @tomschenkjr |
Received feedback from @socrata support team and appears the missing Thanks for reporting! |
@tomschenkjr FYI, just got the notification that the header fix for those datasets just hit staging, so a fix for the bug on our end should be out soon! Great seeing you guys yesterday! |
This has been resolved by the engineering team. Just let me know if you see it again. |
Hi, I'm getting this same error on two datasets, https://data.healthcare.gov/dataset/2016-QHP-Landscape-Individual-Market-Medical/v7sn-c66v/data and https://data.healthcare.gov/dataset/2017-QHP-Landscape-Individual-Market-Medical/enpz-m4q6
I also tried reading from the csv and json resource urls for these datasets with no change in error message. |
Thanks, @hrecht. This seems to be a @socrata server-side issue. I will ping them to let them know the issue still exists in other data sets--I am not sure of the underlying cause. If it persists or this is being encountered at a high-rate, we may need to release a bug-fix of sorts that does not do this validation; at least, throws a warning and not a fatal error. |
Thanks @tomschenkjr! Understood that it's a @socrata issue rather than RSocrata issue, hopefully they'll be able to fix. |
@tomschenkjr @hrecht I'm following up with the team internally and we'll get back to you once we know more. @tomschenkjr How reliant are you on the |
@chrismetcalf - we use it to retrieve the data types being used. The most important is using it to identify dates and then converting them into a proper data column in R. Less important, is we also use it to identify money fields so R handles it as a float/numeric field instead of a string. If this was deprecated, we could rely on guessing the field types. |
Reopening for now since we are revisiting a workaround for this. |
Following up on my comment above, we are in fact conditionally including the Now we need another, more reliable way to get you the column types. There are of course unsupported ways of getting this, but I'd really rather have you on something real, or pursue the guessing option. How often are people hitting this issue? It should only be happening for very wide datasets or datasets with very long column names? Another workaround would be to specify a |
Our development team sketched out how RSocrata will handle this: If I'll get started on the development for this fix. |
@tomschenkjr - Just noting this to help explain my changes: I rebased the |
@nicklucius - sounds good. I'm handing the issue over to you for development. Good night, good luck 🖖 |
The test for this started failing recently. The section of code:
The actual failing test message:
I thought this might be useful, but it didn't help me:
It looks like the URL columns are different in name only, but the other six columns are missing in the JSON. Not sure if this is related to this issue, or if this is something else? |
@geneorama - changing the value to 146, which then lines up the test expectation with the incoming data, removes the error. If there was a problem related to this issue #118, there would still be an error thrown. Therefore, I think the problem is that the number of columns returned by Socrata has changed and that is what caused the issue to fail. As such, nothing I'm seeing shows that there's anything related to #118. |
Per @chrismetcalf 's request, I'm posting this here and tagging him to take a look.
I've found that some datasets are returned with a set of response headers that cause
read.socrata()
to error out. I was led to believe the issue has to do with the type of backend (new vs. old) on the Socrata side.See below for full details, but the issue seems to be that the response header
x-soda2-types
isn't getting returned. RSocrata tries to extract the column types from the missing response header, and fails.The text was updated successfully, but these errors were encountered: