Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Check full seed #391

Closed
wants to merge 72 commits into from
Closed

WIP: Check full seed #391

wants to merge 72 commits into from

Conversation

cjhopp
Copy link
Member

@cjhopp cjhopp commented Apr 28, 2020

What does this PR do?

Uses full SEED information (adds network and location) when comparing e.g. pick.waveform_id and tr.stats in a number of functions. Mostly importantly template_gen, but may also include mag_calc and others.

Why was it initiated? Any relevant Issues?

#388

In situations where the desired template or trace contains unique picks with the same Station and Channel, but different location and/or Network codes (e.g. separate sensors in one borehole), current behavior does not differentiate between them.

Assorted to-dos:

  • utils.correlate funcs all return cccs, no_chans, and chans. chans only includes station.channel, which then gets passed to Detection objects. Allow for full seed chans in Detection
  • Track down all the possible effects of using a full seed id chans attribute for Detection objects...could touch quite a lot...
  • Lag calc UsedChannel only uses station.channel. Allow for full seed option.

PR Checklist

  • develop base branch selected?
  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGES.md.
  • First time contributors have added your name to CONTRIBUTORS.md.

@cjhopp cjhopp self-assigned this Apr 28, 2020
@calum-chamberlain
Copy link
Member

From your messages the other day this is still a work in progress? If so, can you add "WIP: " to the start of this PR title until you are ready for review?

@cjhopp cjhopp changed the title Check full seed WIP: Check full seed Apr 29, 2020
@calum-chamberlain
Copy link
Member

When you are ready you can update this - I fixed some test issues on master and merged that into develop - it should help with CI things!

@cjhopp
Copy link
Member Author

cjhopp commented Aug 14, 2020

@calum-chamberlain

Changing _make_event_pair() to force identical seed id's, same as the correlations. I can see this being a problem if we want to allow differential times for S-picks even if on different channels at the same sensor. Thoughts?

As it is, its producing problems with my silly borehole sensors (same station, different location codes) so that my dt.ct contains bogus times whereas my dt.cc is fine. But that can also fall on me to rename them as @flixha had mentioned previously.

@calum-chamberlain
Copy link
Member

Changing to full seed-id checking for _make_event_pair could be a little annoying for me - but it is probably the right way to do it! Can you allow an option to just check the station as well though? I imagine there would be some instances where people (including me) only care about the station and not the full seed id.

@calum-chamberlain calum-chamberlain added this to the 0.5.0 milestone Aug 15, 2020
@cjhopp
Copy link
Member Author

cjhopp commented Aug 17, 2020

Yeah, mine is obviously the edge case!

The default, for now, will be checking station only, as discussed in #388

I guess another question I should have investigated already is if HypoDD can even cope with different location codes at the same station. If not, this is a useless exercise...

@calum-chamberlain
Copy link
Member

Looks like the history of this one got pretty ugly!

@cjhopp
Copy link
Member Author

cjhopp commented Apr 1, 2021

Just how I like my PR's.

I'm okay blowing this up, if you'd like to clean house. I obviously haven't had a chance to work on this in forever and a day...

It'd be useful to have, but the more it touches, the more I think @flixha was right and that maybe renaming sensors with different locations at the same station is the better approach?

@calum-chamberlain
Copy link
Member

Changing the names would be simpler, but it is definitely not the "correct" way to do it - the only reason I didn't have full seed ID checking in from the start was because SEISAN didn't keep the full seed ID... I'm keen to get this "corrected" and suck up the issues I create for myself because it will be "better" - as long as well actually produce useful errors when things go wrong. I need to patch some catalog-to-dd things and include more useful errors there. I have spent too much time working out what I have done wrong.

No rush, I think this one should be a "major" release change, so I'm not in a hurry to get it out for 0.4.3 with will contain a lot of bug-fixes from @flixha.

@flixha
Copy link
Collaborator

flixha commented Apr 8, 2021

Just how I like my PR's.

I'm okay blowing this up, if you'd like to clean house. I obviously haven't had a chance to work on this in forever and a day...

It'd be useful to have, but the more it touches, the more I think @flixha was right and that maybe renaming sensors with different locations at the same station is the better approach?

The more networks and data I see the more I understand the challenges and that user needs some freedom in the approach. Maybe it would be nice to have two supported approaches, one which completely ignores network and location codes, and one that takes them fully into account. I can't fully see yet how hard that would be and in how many places in the code that would need to be considered, but it could help to solve a good set of common problems that come up related to the issue:

  • as you mentioned, multiple sensors at different levels in a borehole (with different location codes, but same station codes --> need to take all codes into account)
  • data populated from databases that only contain information the station and channel (old Nordic format; lots of ISC data --> mostly better to ignore location & network codes)
  • data from networks that have been operating for a long time since when the default handling has changed. That applies to a good set of data of the GSN net, where they often used no location code until the 90s and then changed the very same sensor over to a new location code (--> better ignore location code)

And then there's of course detection problems where all of these challenges arise, and in those cases the user of course has to make up their own strategy.

@calum-chamberlain
Copy link
Member

Hey @cjhopp and @flixha - do either of you remember where we are with this one - do you want me to cherry pick changes from this for a clean PR, and add in the option of not checking the full SEED ID?

I was reminded of this by a user who has the same station code for multiple stations across different networks, and the templates that were being made were not useful!

@flixha
Copy link
Collaborator

flixha commented Dec 8, 2021

hey @calum-chamberlain , I'm not 100 % sure what the state was as I don't remember testing it properly, and I may have changed my opinion a bit since what I wrote above - e.g., for my own cases I now make sure that I "normalize" all network and location codes explicitly for traces and associated picks that I want to correlate. That leaves a bit more work to the user but is more "correct", along the lines of what @cjhopp implemented here.

To avoid frustrating users who are updating their EQcorrscan with this stricter handling here, it would probably be nice to allow keep the strict SEED-ID handling as an option. However, considering version numbers EQcorrscan is still in its 0.x-stage where such changes can come without early depreciation-warnings etc. So: the option to not check the full SEED ID would be nice, but personally I don't see is as important any more as I did a while ago.

If you're able to cherry-pick this PR that would be great - I can try to look out for any edge cases in some tests soon.

@cjhopp
Copy link
Member Author

cjhopp commented Dec 8, 2021

Hi chaps, 100% agree with @flixha here. My recollection was that I had added a check_full_seed arg to most of the necessary places, but never got around to the tests or cleaning it up. As @flixha says above, the consensus was to leave strict SEED checking as an "opt-in" feature so as not to break things for users. @calum-chamberlain , please feel free to cherry-pick whatever you'd like and destroy this thing.

@calum-chamberlain
Copy link
Member

Awesome, thanks @cjhopp and @flixha - I will try to clean this up today and think about some testing. :) Agree on the optional opt-in and warn that this will change implementation, and the flexibility of 0.x - I don't know if I will ever get this to 1.x! I'm still learning so much about good ways to do things that I always look back on code and want to change it!

@calum-chamberlain
Copy link
Member

Closing this in favour of #481

@calum-chamberlain calum-chamberlain deleted the check_full_seed_id branch March 19, 2023 20:09
@calum-chamberlain calum-chamberlain restored the check_full_seed_id branch March 19, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants