Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pick_count, station_count, etc. should really be int instead of float #270

Open
shawnboltz opened this issue Mar 1, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@shawnboltz
Copy link
Collaborator

Description
This is a minor annoyance, but the data type for some of the columns in events_to_df (and possibly some of the other DFExtractors), such as pick_count, station_count, etc., should really be an int instead of a float.

To Reproduce

import obsplus
obsplus.load_dataset("crandall_test").event_client.to_df().dtypes

Expected behavior
Here's the output of the above where I've noted which columns should be changed

time                      datetime64[ns]
latitude                         float64
longitude                        float64
depth                            float64
magnitude                        float64
event_description                 object
associated_phase_count           float64  # int64
azimuthal_gap                    float64
event_id                          object
horizontal_uncertainty           float64
local_magnitude                  float64
moment_magnitude                 float64
duration_magnitude               float64
magnitude_type                    object
p_phase_count                    float64  # int64
s_phase_count                    float64  # int64
p_pick_count                     float64  # int64
s_pick_count                     float64  # int64
standard_error                   float64
used_phase_count                 float64  # int64
station_count                    float64  # int64
vertical_uncertainty             float64
updated                   datetime64[ns]
author                            object
agency_id                         object
creation_time             datetime64[ns]
version                           object
stations                          object
dtype: object

Because this would potentially force re-indexing of everyone's event bank's, I recommend waiting to address this until we address #195 and/or other index issues.

Versions (please complete the following information):

  • ObsPlus Version '0.2.6.dev1+g17d7abd'
@shawnboltz shawnboltz added the bug Something isn't working label Mar 1, 2024
@shawnboltz
Copy link
Collaborator Author

Looking a little more at what would be required to actually change this, obsplus.utils.pd._int_to_time_columns makes this more complicated than simply changing the dtype definition.

@d-chambers
Copy link
Member

I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?

@shawnboltz
Copy link
Collaborator Author

I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?

Yeah, that could be why. I know pandas has a nullable int now, but I don't think it's doable with numpy itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants