Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the data if jason_read() is not working for everyone #9

Open
sunnymh opened this issue Oct 19, 2013 · 8 comments
Open

Comments

@sunnymh
Copy link

sunnymh commented Oct 19, 2013

It seems like json_read() only works for a few people. So if that is the case, how do we reproduce the data in class on Tuesday? Those people who can't use json_read() definitely can't reproduce the code from groups which use that function.

@kqdtran
Copy link
Member

kqdtran commented Oct 19, 2013

Idk of any elegant method, but there's a workaround in #3

@sunnymh
Copy link
Author

sunnymh commented Oct 19, 2013

That's a work around for people who can't use json_read(), but I was wondering how do we check the code of people who use json_read() on Tuesday?

@teresita
Copy link

@sunnymh if you're getting errors, it could be a misspelling of json? (there's no 'a')

@aculich
Copy link
Member

aculich commented Oct 21, 2013

@teresita Thanks for picking up on the spelling error here. Definitely no 'a' in JSON, so that could be the problem. @sunnymh is this working for you now?

@sunnymh
Copy link
Author

sunnymh commented Oct 21, 2013

@aculich That's not actually my question. As in #3 people are getting errors using read_json() ValueError: arrays must all be same length, and I got the same error using read_json() as well. So I used json.load() as suggested in #3 and I think a lot of people are using json.load() as well. So my question is that, for people like me, there is the possibility that we can't run other people's code which uses read_json(). Sorry about the misspelling.

@aculich
Copy link
Member

aculich commented Oct 21, 2013

The Steps to Curate Data: Issue #8 contains most of the answer to this problem. An alternative acceptable method would be to use the CSV version of the new data which is available here:

http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php

@sunnymh
Copy link
Author

sunnymh commented Oct 21, 2013

@aculich #8 uses json.load() as well, which doesn't require us to install the latest version of Pandas to run read_json(). So on Tuesday, people who use json.load() might not be able to run read_json() if some other groups happen to use it. Is that going to be a problem?

@aculich
Copy link
Member

aculich commented Oct 21, 2013

So what you suggest here is an interesting conundrum.... and illustrates why we are using a virtual machine. The code that uses read_json() needs pandas upgraded to version 0.12, but that might impact other code the person has on their machine if it relies on an earlier version of pandas. In general the pandas code is probably forwards-compatible, but you can't be sure. So as long as you provide instructions in your version of the README.md file you should be able to get other people to upgrade their version of pandas to run your code. If we were not using a virtual machine this could cause real problems that lead to dependency hell which multiple conflicting versions of packages need to be installed. We will discuss this in class and how we can use VMs as a strategy to handle this problem. Whichever strategy you've chosen will likely work okay for Tuesday's code review, but in practice it is important to be mindful of the implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants