Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase and fix for Streaming Exports changes #1829

Merged
merged 16 commits into from
Oct 29, 2024

Conversation

ballPointPenguin
Copy link
Contributor

includes all previous changes from #1826
plus some cleanup to pg-query and re-establishing the postgres SSL connection for heroku

samskivert and others added 15 commits September 19, 2024 13:27
And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).
This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.
And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).
This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.
There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.
In the raw votes table, -1 means agree and 1 means disagree, so we need to
count things correctly. And when exporting votes in participant votes, we flip
the sign so that 1 means agree and -1 means disagree.
@ballPointPenguin ballPointPenguin merged commit de31114 into edge Oct 29, 2024
4 checks passed
@ballPointPenguin ballPointPenguin deleted the br/streaming-exports branch November 15, 2024 07:00
ballPointPenguin added a commit that referenced this pull request Jan 15, 2025
* Update server.ts

* update gtag usage (#1795)

* update gtag usage

* ignore gtag if GA value is blank

* update gtag user prop

* Update README.md

* v1.0 release (#1774)

* fix Constants capitalization bug in participationview (#1802)

* Export api endpoints (#1804)

* docker local postgres

* fix Constants capitalization bug in participationview

* add perspective to example.env

* add google apis to package.json

* add perspective api key to config

* config prettier

* import google

* prettier server.ts

* fix typescript bluebird promise to enable async await

* refactor post_comment route async, remove paths

* perspective call

* jigsaw toxicity threshold under flag.

* text flag for toxic

* add perspective to privacy policy

* jigsaw TOS

* Better type for getPca.

Also removed promotion of error to the return value. Let the error propagate as
an error.

* Tell TypeScript to use a less ancient library target.

* Trim trailing whitespace.

* Add data export endpoint.

This is a simple first pass which just reads the data from the database and
delivers it. No caching, no fancy business.

The endpoints are based on the report identifier and provide three separate
.csv exports. For example:

/api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/summary.csv
/api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/comments.csv
/api/v3/reportExport/r6ke7cdzte2jrsxctfyt9/votes.csv

The format is made to match that of the old command line exporter as much as
possible.

* smaller font size

* Use the correct column name here.

* Set text/csv content type on responses.

* export fonts from globals

* data export info in report view

* prettier app.js

* move overview down

* shorten toxic text

---------

Co-authored-by: Colin Megill <[email protected]>

* Type errors  (#1809)

* prettier server, ts error 25

* ts errors 21

* ts errors 18

* type pid

* ts errors 11

* modernize create xid function

* ignore ts error on jigsaw

* type uid

* expand pca cache item

* 8 errors.

* modernize switchToUser function

* void

* PcaCacheItem types

* missing args, null

* ignore request.get error

* tsignore .get on headers

* null checks, headers, type fix (#1810)

* update github actions (#1811)

* update github actions

* typo

* use docker compose in place of docker-compose

* lint fix

* Enable non-docker postgres (#1817)

* utilize docker compose --profile postgres, and POSTGRES_DOCKER var, to enable/disable using dockerized postgres

* update configuration.md

* add comment in file-server/Dockerfile (#1816)

* Update GitHub actions (#1819)

* add comment in file-server/Dockerfile

* switch back to url-health-check-action

* Fixed issue with Makefile prefixing extra whitespace to envvars and failing Docker build in some environments.

* ignore .DS_Store (#1823)

* Import lodash on component file that uses it. (#1808)

* Streaming exports (#1826)

* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Flip vote polarity.

In the raw votes table, -1 means agree and 1 means disagree, so we need to
count things correctly. And when exporting votes in participant votes, we flip
the sign so that 1 means agree and -1 means disagree.

* Properly escape comment text.

* add votes matrix, show data license preprod, logging.

---------

Co-authored-by: Michael Bayne <[email protected]>

* Revert "Streaming exports (#1826)"

This reverts commit 61d2940.

* Rebase and fix for Streaming Exports changes (#1829)

* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Switch to non-native Postgres client.

And add a "streaming" API for making database queries, which streams the
results from the database to Node as they are generated by Postgres.

This allows Node to process the rows one by one (and garbage collect in
between), which is much easier on the VM when we need to do big queries that
summarize data (or just format it and incrementally spit it out an HTTP
response).

* Mostly refactoring.

This moves the handle_GET_reportExport route into its own file, which
necessitated refactoring some other things (zinvite and pca) out of server.ts
as well. Chipping away at the monolith.

This also converts the votes.csv report to use the streaming query from
Postgres, which is mostly a smoke test. It seems to work, so next I'll convert
it to stream the results incrementally to the HTTP response as well.

* Split each report into separate function.

* Count up comment votes in single pass over votes table.

There was actually a bug in the old SQL that aggregated votes from _all_
conversations instead of just the conversation in question, which is why it
took 30 seconds to run. With that bug fixed, even the super slow "do a full
subquery for each comment row" was actually quite fast. But this is way
cheaper/faster.

* Add participant-votes.csv export.

* Flip vote polarity.

In the raw votes table, -1 means agree and 1 means disagree, so we need to
count things correctly. And when exporting votes in participant votes, we flip
the sign so that 1 means agree and -1 means disagree.

* Properly escape comment text.

* add votes matrix, show data license preprod, logging.

* cleaned up pg-query; re-establish ssl db connection

---------

Co-authored-by: Michael Bayne <[email protected]>
Co-authored-by: Colin Megill <[email protected]>

* no more pg-native; new config flag for DATABASE_SSL (#1831)

* ensure the correct http/s protocol is used in report overview (#1832)

* add port collision instructions

* add special chars and seed comments and vis settings tests to conversation suite

* DRAFT: Better handling and/or removal of unused db fields for geolocation (#1835)

* remove maxmind and its references

* remove unused geolocation_cache

* remove/drop unused table and fields from db

* clean up typings in config.ts

* use APPLICATION_NAME as a flag for nonstandard db fields

* handle non-jigsaw config gracefully (#1833)

* only use jigsaw API if key is provided;
replace `console` with `logger`

* npm run format

* add comment mod checks

* fix reports not showing on refresh bug and add reports test

* add basic reports test

* fix lint and move reports to separate folder

* enable testing votes and comment ability without connected account

* add monitor check

* disable non functional social auth

* adjust tests

* remove auth from views

* begin exports test and ci test setup

* fix time issue

* add remaining tests

* action attempt 1

* change name

* lint err

* docker debug

* copy cmd from cypress

* build not watch, run tess

* swap out docker command

* try changing env

* move env step

* set up env in server and change docker command again

* run in detached mode

* try nohup

* [DRAFT] Automated Preprod Deploy workflow (#1845)

* deploy-preprod backend workflow

* python mini project for static assets deploy

* don't write acl headers; we ignore them anyway

* include static assets deploy in github workflow

* update docker syntax; add vars to deploy-preprod workflow

* remove depcheck workflow

* removed unused social code

* update aws region

* Bump black from 24.2.0 to 24.3.0 in /deploy (#1851)

Bumps [black](https://github.com/psf/black) from 24.2.0 to 24.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](psf/black@24.2.0...24.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Enable importance checkbox for admin and participant (#1682)

* incorporate changes from https://github.com/chena11356/polis/tree/implement-comment-prioritization-checkbox

credit to https://github.com/chena11356
addresses #217

* include high_priority in vote posts to server

* rename "priority_type" to "importance_enabled"

* use actual quotes since HTML escapes are not being respected

* Editing the importance/significance label and help text, moving it up above the vote buttons

* lint fix

* update migration filename

---------

Co-authored-by: Hadjar Homaei <[email protected]>

* Colinmegill/report experimental (#1855)

* begin style dep fixes

* readmes, schemas, uncertainty subtask

* remove "hesitation" phrase

* convert beeswarm to functional component

* add gic xml

* rendering sji uncertainty

* test data piped through server

* fix dev server

* fix render bug in beeswarm

* change to narrative

* convert boxplot, add jest testing and action

* add test for boxplot

* change working directory

* update package lock

* update babel core

* modify comments and add test

* Added comment-groups.csv export.

Colin will be using this data (or some filtered version of it) to pass to an
LLM when it wants to summarize things.

The code uses the summarized data from the PCA json blob instead of computing
things from the raw comments and votes tables. The latter approach results in
numbers that don't match up exactly with the data that appears on the HTML
version of the report (our numbers are a little higher, so the Clojure backend
is filtering out some votes/voters that we are not).

We want the LLM to see the exact same data that's on the HTML page because it
might refer to specific numbers and we want those numbers to be exactly the
same as the numbers the user sees.

* a test uncertainty section

* add comment groups endpoint

* prompts for group informed consensus section of report_experimental

* two new prompts in "NARATIVE SKILLS" section

* upgrade typescript for anthropic support

* make script run dynamically with report ID arg

* move to server

* refactor + enable on report in web

* add filter func to csv gen

* move filter fn

* final filter function improvements

* separate section for narrative, list ALL citations

* pull in consensus narrative changes & refactor

* remove console logs

* add commentlist below consensus

* remove console logs and split narrative into separate url

* filter on group aware consensus

* sub groups prompt

* improvements to group informed consensus prompt

* increase length of gic section

* typescript appease gic

* tldr for consensu

* swap uncertianty narrative.

* uncertinaty title

* break out raw data into component

* consensus style

* add gemini

* toggle gemini & claude

---------

Co-authored-by: Colin Megill <[email protected]>
Co-authored-by: Darshana Narayanan <[email protected]>
Co-authored-by: Michael Bayne <[email protected]>

* update model

* add narrative route

* export missing lines

* make report narrative stream

* improve streaming and UX

* include read/write permissions (#1856)

* roll back express upgrade (#1862)

* report race condition check

* refactoring + tests

* finish framework folder

* refactor commentsModeratedIn + jest test

* refactor commentList

* consensusNarrative + test

* groups & test

* uncertaintynarrative and test

* being participantsGraph

* more test writing

* update deploy-preprod workflow

* update deploy-preprod workflow

* app.js conversion to functional

* free of bugs

* test fixing

* finish conversion from class to functional, underscore and jquery removed

* fix setstate bug

* fix tests

* init

* indentation fix

* indentation fix again

* syntax err fix

* indentation fix

* name change

* change trigger

* indentation fix

* change trigger again

* change trigger AGAIN

* changedir

* setup node path

* change path

* remove input count

* update path

* add working dir

* indent fix

* debug

* add google auth

* change url

* change name

* modify csv

* update file location

* full path of csv

* debug ls

* still debug

* rearrange command

* store in gist

* add gh token

* change file read strategy

* move working dir

* use diff action

* split into multiple jobs

* disable checkout and debug

* faster testing haclk

* remove debug statement

* add l

* update token

* debug html file

* should work

* use local fork

* add sign up link to sign in page

* prod deployment workflow (#1870)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Armand Fardeau <[email protected]>
Co-authored-by: Colin Megill <[email protected]>
Co-authored-by: Christopher Small <[email protected]>
Co-authored-by: Michael Bayne <[email protected]>
Co-authored-by: chalkghost <[email protected]>
Co-authored-by: tevko <[email protected]>
Co-authored-by: Tim <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hadjar Homaei <[email protected]>
Co-authored-by: Darshana Narayanan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants