You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently we generate statistics using a manual SQL query issued to the PostgreSQL on the host server on an approximately daily basis, and chmod and mv the resulting .csv file to a directory hosted by the cluster-front nginx webserver. The configuration for this can be found here:
Before each update, we currently -- but hopefully temporarily -- apply some manual SQL UPDATE statements, to filter out traffic that may not exactly be bot-initiated, but appears to be related to problems users on some (as-yet-undetermined) devices experience when using the search engine, causing them to initiate duplicate empty search queries.
Generally the statistics are updated with a delay of at-most three or four days; and frequently they are updated next-day - sometimes soon after midnight UTC, depending on when our system operators (me) are available.
To take holiday/vacation time without statistics data becoming stale and outdated, it would be nice to automate the generation of these statistics.
Describe the solution you'd like
Some general requirements here are:
Historical data should -- with very few exceptions -- generally be treated as immutable. That is: if we said that we had X number of searches on day Y, that statistic should not change when subsequent statistics are generated.
Ideally we should re-use the existing SQL query that is used today to generate the stats.
Statistics updates -- whatever the mechanism used to generate and deploy them -- should not interfere with the production application, and should be designed to minimize that risk.
A not-always-obvious implication of this is that query load on the production database should either be minimized, or perhaps removed entirely if possible by querying a secondary database instance. The problem with querying a secondary database, however, is that it increases the chance of stale/inconsistent results, meaning that re-generating the statistics at a later date could differ (conflicting with the immutability requirement).
Describe alternatives you've considered
Continuing to manually update and generate the statistics, at least in the short term, continues to be an option. It does have the benefit that it means our operators (me) are somewhat familiar with trends/spikes and oddities in daily statistics.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently we generate statistics using a manual SQL query issued to the PostgreSQL on the host server on an approximately daily basis, and
chmod
andmv
the resulting.csv
file to a directory hosted by the cluster-frontnginx
webserver. The configuration for this can be found here:infrastructure/etc/haproxy/haproxy.cfg
Lines 75 to 76 in 0c2f41e
Before each update, we currently -- but hopefully temporarily -- apply some manual SQL
UPDATE
statements, to filter out traffic that may not exactly be bot-initiated, but appears to be related to problems users on some (as-yet-undetermined) devices experience when using the search engine, causing them to initiate duplicate empty search queries.Generally the statistics are updated with a delay of at-most three or four days; and frequently they are updated next-day - sometimes soon after midnight UTC, depending on when our system operators (me) are available.
To take holiday/vacation time without statistics data becoming stale and outdated, it would be nice to automate the generation of these statistics.
Describe the solution you'd like
Some general requirements here are:
Describe alternatives you've considered
Continuing to manually update and generate the statistics, at least in the short term, continues to be an option. It does have the benefit that it means our operators (me) are somewhat familiar with trends/spikes and oddities in daily statistics.
Additional context
N/A
The text was updated successfully, but these errors were encountered: