Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix discrepancies with legacy portal in station / history counts #83

Merged
merged 44 commits into from
Jan 5, 2022

Conversation

rod-glover
Copy link
Contributor

@rod-glover rod-glover commented Dec 23, 2021

Resolves #81
Resolves #43

This PR depends on pacificclimate/station-data-portal-backend#25, which adds reliable min_obs_time and max_obs_time attributes to station history responses. This enables us to compare times on the same basis as the legacy portal.

This PR includes the following changes:

  • Modify date filtering to match PDP.
  • Fix a bug in observation frequency filtering.
  • Place markers for every unique location in a station's history (at present, there are never more than 3), and link them with a polygon.
  • Add a tooltip that shows a station's name(s) when hovering over marker or polygon.
  • Update station popup to show unique locations, time periods, observation frequencies, and variables defined in all histories of a station.
  • Update Station Metadata tab to show the above. Add sorting on selected columns.
  • Try to improve performance. This includes adding a timing module that works like a Python decorator -- it wraps a function and times it. Too bad we don't have the equivalent of the Python with statement!

Notes:

  • I optimized the station filtering function and reduced the time spent in them by about 50%. This doesn't make much difference because the rendering time dominates, not the filtering time.
  • Part of sluggishness in rendering the map, meaning the station markers, is that the popups and tooltips are created in advance, for all markers. If they are commented out, re-renders are about twice as fast. A possible solution is to create these only on hover and click events. Each one seems to contribute roughly the same amount of overhead, though this will be worth checking more thoroughly.
  • I can't believe I'm writing this on Boxing Day.

Copy link
Contributor

@jameshiebert jameshiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor questions and comments. Looking great!

const formatDate = d => d ? d.toISOString().substr(0,10) : 'unknown';

const lexCompare = (a, b) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for sorting in the metadata table?

Copy link
Contributor Author

@rod-glover rod-glover Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is for sorting lists of things, e.g., lists of names, obs freqs. These arise because a station can have many histories, hence many names, obs freqs, etc. (These lists are reduced to their unique elements.) lexCompare implements standard lexicographic ordering for lists.

I plan to document all this in the code / README before I merge.

{
minWidth: 80,
maxWidth: 100,
id: 'Uniq Obs Freqs',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does the "Uniq" in this label refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stations can have multiple histories, hence multiple freqs; these are reduced to their unique elements (see also comment above).

src/utils/portals-common/portals-common.js Outdated Show resolved Hide resolved
// If we don't do this, and we use strict date matching, then a station with
// several histories fully covering an interval will not be selected, even
// though it should be. The question of what "adjacent" means is a bit tricky
// ... would depend in part on history.freq to distinguish too-large gaps,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about gaps. The users of the data necessarily has to assume that there will be gaps... that's just the nature of weather data collection.

Copy link
Contributor Author

@rod-glover rod-glover Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. First, let's eliminate what I'm calling "strict" date matching, which is the condition
min_obs_time < uiStartDate < uiEndDate < max_obs_time, plus allowance for nils
meaning complete containment in the observation interval. This is currently not used.

Currently, in this app, legacy PDP matching is used, which is the looser condition
min_obs_time < uiEndDate && uiStartDate < max_obs_time, plus allowance for nils
meaning overlap with the observation interval

Currently, for both legacy PDP and this app, date matching is done separately for each history in a station. If this isn't right, it can be adjusted in this app.

Copy link
Contributor Author

@rod-glover rod-glover Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant about gaps is this: Consider a station with multiple, say 2, histories spanning dates A to B (hx1) and C to D (hx2).

Suppose the user specifies start and end dates s, e such that A < s < B < C < e < D. The legacy matching rule will operate as follows:

  • hx1: A < e && s < B === true: match
  • hx2 C < e && s < D === true: match

That's as desired.

But when we have, say, A < B < s < e < C < D then:

  • hx1: A < e && s < B === false: no match
  • hx2 C < e && s < D === false: no match

When start or end date fall into the gap between the histories' intervals, the matching fails. Even one of them in the gap means that one of the two histories will not match.

If we decide that the two histories really form one contiguous period, then this matching rule is incorrect. It will bite us if there are large gaps (C - B) into which a start or end date could easily fall. In that case, we'd want a test that used only A to D as the interval, and matched both histories on that basis. That's not hard, but it's different than we have now.

Also, it gets more complicated if we decide that a "large" gap (C - B) means something different than a small gap.

Also, maybe there's a complication with multiple histories that overlap: A < C < B < D or the like.

Questions:

  1. Does the gap problem matter?
  2. Legacy PDP's effective definition of "station" is "history", and effectively ignores meta_station records. SDP treats history records linked by a common station as related.

Copy link
Contributor Author

@rod-glover rod-glover Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One further note: Because they are drawn from station_obs_stats_mv, which is updated directly from observations, min_obs_time and max_obs_time are never null (oops: except if there are no observations), and therefore the checking for nulls in the matching rules can (maybe) be simplified accordingly. This is true both for legacy PDP and for SDP.

But they are not like edate, in which null means "ongoing". Ongoing or not, these values are non-null except when there are no observations at all for that history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this discussion to a separate issue.

@rod-glover
Copy link
Contributor Author

I still have some multiple history-per-station updates to push. Just a couple of oversights.

I'm also looking at why the app is so sluggish. My timing results suggest that it is rendering, not internal computation (e.g., filtering stations) that is so slow. Got a couple of ideas of how to improve it.

@rod-glover
Copy link
Contributor Author

rod-glover commented Jan 4, 2022

The sluggishness is definitively in rendering the station markers. If StationMarkers (plural) returns null, the delay in rendering the map, and in the update of other items, e.g., the selected station count, is reduced to under 1 s.

If StationMarker (singular) returns null, the delay is about 2 s. This shows that generating and processing individual markers, even if they are trivial, is surprisingly time consuming.

If StationMarker returns only a CircleMarker, without the tooltip or popup, the delay is about 3 s.

With full marker content (tooltip + popup), the delay is about 6 s.

@rod-glover
Copy link
Contributor Author

Further investigation shows that StationMap is rendering more often than expected. Specifically, when a polygon is drawn on the map, it renders immediately, then renders again after a delay of 2 s, then again after 2 s. When the polygon is deleted, it renders after 2 s, then again after another 2 s. I'm going to take this problem over to another issue, since it is not directly related to the core issue of this PR.

@rod-glover rod-glover force-pushed the i81-station-history branch from 1643ace to 579adc9 Compare January 5, 2022 21:42
@rod-glover
Copy link
Contributor Author

Demo

@rod-glover
Copy link
Contributor Author

@jameshiebert

I addressed your questions above.
I tested the downloads and both the URLs and the results look good.
Any further comments or questions?

@jameshiebert
Copy link
Contributor

If you think that it's in good shape, I trust you. A cursory review of the range filtering shows it to be working like I expect it to.

This is probably a separate issue, but I noticed that the delete feature on the map doesn't work? One can hit the garbage can, get a cursor that says "Click on a feature to remove." But clicking has no effect. "Clear All" works OK.

@rod-glover
Copy link
Contributor Author

I'll put in an issue for the delete problem. Not sure why -- it works in CE, but not here. However, we are using different versions of React Leaflet and Leaflet in this app.

@rod-glover rod-glover merged commit 3249365 into master Jan 5, 2022
@rod-glover rod-glover deleted the i81-station-history branch January 5, 2022 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show stations outside polygon on map Station counts differ markedly from legacy portal
2 participants