Skip to content

Add function to import hotspot dataset #1386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Aug 26, 2021
Merged

Add function to import hotspot dataset #1386

merged 26 commits into from
Aug 26, 2021

Conversation

willschlitzer
Copy link
Contributor

This pull request adds a function to import the hotspots.txt dataset used in some of the GMT code examples. Unfortunately, due to inconsistent spacing between the columns in the data, I was not able to use pd.read_csv() to read the data into a DataFrame. It would be great to get any feedback to shorten this code, as it currently appears messy.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

@willschlitzer willschlitzer added the feature Brand new feature label Jul 16, 2021
@willschlitzer willschlitzer added this to the 0.5.0 milestone Jul 16, 2021
@willschlitzer willschlitzer self-assigned this Jul 16, 2021
@willschlitzer
Copy link
Contributor Author

@GenericMappingTools/pygmt-maintainers Any ideas on how to make this csv import look less hacked together?

@weiji14
Copy link
Member

weiji14 commented Jul 17, 2021

@GenericMappingTools/pygmt-maintainers Any ideas on how to make this csv import look less hacked together?

Have you tried using sep='\s+' in pd.read_csv()?

@willschlitzer
Copy link
Contributor Author

willschlitzer commented Jul 19, 2021

@GenericMappingTools/pygmt-maintainers Any ideas on how to make this csv import look less hacked together?

Have you tried using sep='\s+' in pd.read_csv()?

I have. The problem is that I want to include the entire hotspot name in one field/string, but they're inconsistent in spacing/number of words. For example, the Bermuda line reads BERMUDA NaN NaN while three word ones read like LAKE VICTORIA, EAST when I set sep='\s+.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I see what you mean by inconsistent spacing, the hotspots.txt file is space-separated for the first three columns and tab-separated for the last column. I've figured out one alternative way by reading in the first 3 and last 1 column separately, and then concatenating together (see below).

P.S. There are some commits in this PR on sphdistance. You might want to cherry-pick the 'hotspot' only commits to a clean 'add-hospot-dataset' branch (and do a force-push). Or we could merge in the sphdistance PR at #1383 before this #1386 PR to save you some hassle 🙂

@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Aug 12, 2021
@willschlitzer
Copy link
Contributor Author

@meghanrjones It looks like your PR has been merged; am I cleared to update the pd.read_csv() call?

There are some technical problems on the GMT server, so the hotspots.txt is still not updated on the server.

Sounds good; I'll wait until the remote file is updated and then push an update to this PR. I'll convert to draft for the time being.

@willschlitzer willschlitzer marked this pull request as draft August 12, 2021 08:43
@seisman
Copy link
Member

seisman commented Aug 12, 2021

@willschlitzer The hotspots.txt file is now updated.

@willschlitzer
Copy link
Contributor Author

@willschlitzer The hotspots.txt file is now updated.

Sounds good, I'll update this branch and make sure everything checks out.

@willschlitzer willschlitzer marked this pull request as ready for review August 22, 2021 06:34
@willschlitzer
Copy link
Contributor Author

@GenericMappingTools/pygmt-maintainers I have updated load_hotspots() for the new format of hotspots.txt.

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just one final recommendation to add the reference for the original data.

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed final review call This PR requires final review and approval from a second reviewer labels Aug 25, 2021
@seisman seisman merged commit 8a6db70 into main Aug 26, 2021
@seisman seisman deleted the add-hotspot-dataset branch August 26, 2021 18:38
sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants