-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming exports #1826
Merged
Merged
Streaming exports #1826
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response).
This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster.
And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response).
This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster.
In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree.
colinmegill
added
⚒️ infrastructure
Re: automation, continuous integration.
🔩 p:client-report
labels
Oct 22, 2024
ballPointPenguin
added a commit
that referenced
this pull request
Oct 24, 2024
This reverts commit 61d2940.
ballPointPenguin
added a commit
that referenced
this pull request
Jan 15, 2025
* Update server.ts * update gtag usage (#1795) * update gtag usage * ignore gtag if GA value is blank * update gtag user prop * Update README.md * v1.0 release (#1774) * fix Constants capitalization bug in participationview (#1802) * Export api endpoints (#1804) * docker local postgres * fix Constants capitalization bug in participationview * add perspective to example.env * add google apis to package.json * add perspective api key to config * config prettier * import google * prettier server.ts * fix typescript bluebird promise to enable async await * refactor post_comment route async, remove paths * perspective call * jigsaw toxicity threshold under flag. * text flag for toxic * add perspective to privacy policy * jigsaw TOS * Better type for getPca. Also removed promotion of error to the return value. Let the error propagate as an error. * Tell TypeScript to use a less ancient library target. * Trim trailing whitespace. * Add data export endpoint. This is a simple first pass which just reads the data from the database and delivers it. No caching, no fancy business. The endpoints are based on the report identifier and provide three separate .csv exports. For example: /api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/summary.csv /api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/comments.csv /api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/votes.csv The format is made to match that of the old command line exporter as much as possible. * smaller font size * Use the correct column name here. * Set text/csv content type on responses. * export fonts from globals * data export info in report view * prettier app.js * move overview down * shorten toxic text --------- Co-authored-by: Colin Megill <[email protected]> * Type errors (#1809) * prettier server, ts error 25 * ts errors 21 * ts errors 18 * type pid * ts errors 11 * modernize create xid function * ignore ts error on jigsaw * type uid * expand pca cache item * 8 errors. * modernize switchToUser function * void * PcaCacheItem types * missing args, null * ignore request.get error * tsignore .get on headers * null checks, headers, type fix (#1810) * update github actions (#1811) * update github actions * typo * use docker compose in place of docker-compose * lint fix * Enable non-docker postgres (#1817) * utilize docker compose --profile postgres, and POSTGRES_DOCKER var, to enable/disable using dockerized postgres * update configuration.md * add comment in file-server/Dockerfile (#1816) * Update GitHub actions (#1819) * add comment in file-server/Dockerfile * switch back to url-health-check-action * Fixed issue with Makefile prefixing extra whitespace to envvars and failing Docker build in some environments. * ignore .DS_Store (#1823) * Import lodash on component file that uses it. (#1808) * Streaming exports (#1826) * Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Flip vote polarity. In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree. * Properly escape comment text. * add votes matrix, show data license preprod, logging. --------- Co-authored-by: Michael Bayne <[email protected]> * Revert "Streaming exports (#1826)" This reverts commit 61d2940. * Rebase and fix for Streaming Exports changes (#1829) * Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Flip vote polarity. In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree. * Properly escape comment text. * add votes matrix, show data license preprod, logging. * cleaned up pg-query; re-establish ssl db connection --------- Co-authored-by: Michael Bayne <[email protected]> Co-authored-by: Colin Megill <[email protected]> * no more pg-native; new config flag for DATABASE_SSL (#1831) * ensure the correct http/s protocol is used in report overview (#1832) * add port collision instructions * add special chars and seed comments and vis settings tests to conversation suite * DRAFT: Better handling and/or removal of unused db fields for geolocation (#1835) * remove maxmind and its references * remove unused geolocation_cache * remove/drop unused table and fields from db * clean up typings in config.ts * use APPLICATION_NAME as a flag for nonstandard db fields * handle non-jigsaw config gracefully (#1833) * only use jigsaw API if key is provided; replace `console` with `logger` * npm run format * add comment mod checks * fix reports not showing on refresh bug and add reports test * add basic reports test * fix lint and move reports to separate folder * enable testing votes and comment ability without connected account * add monitor check * disable non functional social auth * adjust tests * remove auth from views * begin exports test and ci test setup * fix time issue * add remaining tests * action attempt 1 * change name * lint err * docker debug * copy cmd from cypress * build not watch, run tess * swap out docker command * try changing env * move env step * set up env in server and change docker command again * run in detached mode * try nohup * [DRAFT] Automated Preprod Deploy workflow (#1845) * deploy-preprod backend workflow * python mini project for static assets deploy * don't write acl headers; we ignore them anyway * include static assets deploy in github workflow * update docker syntax; add vars to deploy-preprod workflow * remove depcheck workflow * removed unused social code * update aws region * Bump black from 24.2.0 to 24.3.0 in /deploy (#1851) Bumps [black](https://github.com/psf/black) from 24.2.0 to 24.3.0. - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](psf/black@24.2.0...24.3.0) --- updated-dependencies: - dependency-name: black dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Enable importance checkbox for admin and participant (#1682) * incorporate changes from https://github.com/chena11356/polis/tree/implement-comment-prioritization-checkbox credit to https://github.com/chena11356 addresses #217 * include high_priority in vote posts to server * rename "priority_type" to "importance_enabled" * use actual quotes since HTML escapes are not being respected * Editing the importance/significance label and help text, moving it up above the vote buttons * lint fix * update migration filename --------- Co-authored-by: Hadjar Homaei <[email protected]> * Colinmegill/report experimental (#1855) * begin style dep fixes * readmes, schemas, uncertainty subtask * remove "hesitation" phrase * convert beeswarm to functional component * add gic xml * rendering sji uncertainty * test data piped through server * fix dev server * fix render bug in beeswarm * change to narrative * convert boxplot, add jest testing and action * add test for boxplot * change working directory * update package lock * update babel core * modify comments and add test * Added comment-groups.csv export. Colin will be using this data (or some filtered version of it) to pass to an LLM when it wants to summarize things. The code uses the summarized data from the PCA json blob instead of computing things from the raw comments and votes tables. The latter approach results in numbers that don't match up exactly with the data that appears on the HTML version of the report (our numbers are a little higher, so the Clojure backend is filtering out some votes/voters that we are not). We want the LLM to see the exact same data that's on the HTML page because it might refer to specific numbers and we want those numbers to be exactly the same as the numbers the user sees. * a test uncertainty section * add comment groups endpoint * prompts for group informed consensus section of report_experimental * two new prompts in "NARATIVE SKILLS" section * upgrade typescript for anthropic support * make script run dynamically with report ID arg * move to server * refactor + enable on report in web * add filter func to csv gen * move filter fn * final filter function improvements * separate section for narrative, list ALL citations * pull in consensus narrative changes & refactor * remove console logs * add commentlist below consensus * remove console logs and split narrative into separate url * filter on group aware consensus * sub groups prompt * improvements to group informed consensus prompt * increase length of gic section * typescript appease gic * tldr for consensu * swap uncertianty narrative. * uncertinaty title * break out raw data into component * consensus style * add gemini * toggle gemini & claude --------- Co-authored-by: Colin Megill <[email protected]> Co-authored-by: Darshana Narayanan <[email protected]> Co-authored-by: Michael Bayne <[email protected]> * update model * add narrative route * export missing lines * make report narrative stream * improve streaming and UX * include read/write permissions (#1856) * roll back express upgrade (#1862) * report race condition check * refactoring + tests * finish framework folder * refactor commentsModeratedIn + jest test * refactor commentList * consensusNarrative + test * groups & test * uncertaintynarrative and test * being participantsGraph * more test writing * update deploy-preprod workflow * update deploy-preprod workflow * app.js conversion to functional * free of bugs * test fixing * finish conversion from class to functional, underscore and jquery removed * fix setstate bug * fix tests * init * indentation fix * indentation fix again * syntax err fix * indentation fix * name change * change trigger * indentation fix * change trigger again * change trigger AGAIN * changedir * setup node path * change path * remove input count * update path * add working dir * indent fix * debug * add google auth * change url * change name * modify csv * update file location * full path of csv * debug ls * still debug * rearrange command * store in gist * add gh token * change file read strategy * move working dir * use diff action * split into multiple jobs * disable checkout and debug * faster testing haclk * remove debug statement * add l * update token * debug html file * should work * use local fork * add sign up link to sign in page * prod deployment workflow (#1870) --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Armand Fardeau <[email protected]> Co-authored-by: Colin Megill <[email protected]> Co-authored-by: Christopher Small <[email protected]> Co-authored-by: Michael Bayne <[email protected]> Co-authored-by: chalkghost <[email protected]> Co-authored-by: tevko <[email protected]> Co-authored-by: Tim <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hadjar Homaei <[email protected]> Co-authored-by: Darshana Narayanan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add streaming exports for participant votes matrix direct download.
Add logging to callback handler to evaluate older report missing
rid
error on preproduction.