Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] OFA unable to export data to csv by record type and fiscal period #3137

Closed
ADPennington opened this issue Aug 9, 2024 · 2 comments · Fixed by #3162
Closed

[bug] OFA unable to export data to csv by record type and fiscal period #3137

ADPennington opened this issue Aug 9, 2024 · 2 comments · Fixed by #3162
Assignees
Labels
bug dev QASP Review Refined Ticket has been refined at the backlog refinement

Comments

@ADPennington
Copy link
Collaborator

ADPennington commented Aug 9, 2024

Thank you for taking the time to let us know about the issue you found. The basic rule for bug reporting is that
something isn't working the way one would expect it to work. Please provide us with the information requested
below and we will look at it as soon as we are able.

Description

OFA typically extracts data for all STTs by record type and fiscal period. For some record types (e.g. T2, T3), this typically includes upwards of 500K (reference). on 8/8/2024, we attempted to export the latest TANF T2 records for FY2023Q1, which includes approx 500K records, and the process failed.

Action Taken

  • navigate to search indices table for TANF T2
  • select fiscal period FY2023 Q1
  • select apply filters
  • select all records at the top of the table
  • further specify intention to select all 400K+ records
  • select csv export and go

What I expected to see

a pop-up of the csv file exported, such as in the example below:
Screenshot (25)

What I did see

interface
Screenshot 2024-08-09 103922

logs

10:36:18.398: [APP/PROC/WEB.0] [2024-08-09 14:36:18 +0000] [7] [WARNING] Worker with pid 1242 was terminated due to signal 9
10:36:18.405: [APP/PROC/WEB.0] [2024-08-09 14:36:18 +0000] [1419] [INFO] Booting worker with pid: 1419

Other Helpful Information

  • Demo of bug for ACF users only 🔏
  • URL of the page I was on:
  • Browser and version: chrome
  • Operating System: ( Windows | MacOS X | Linux | Other ) windows
  • Is the issue repeatable?: ( yes | no | don't know ) yes
  • Has the issue occurred more than once?: yes
@andrew-jameson
Copy link
Collaborator

andrew-jameson commented Aug 9, 2024

Potential solution(s) coming out of office hours:

  1. Upon hitting "go", queryset can use iterator/paginator over data
  2. Write flat csv file in /tmp/ then upload to a s3 location
  3. Potentially batch the writing of the CSV?
  4. Redirect after the "go" to auto-download file from the s3 link

https://nextlinklabs.com/resources/insights/django-big-data-iteration

@elipe17
Copy link

elipe17 commented Aug 26, 2024

Can we implement a class that uses the suggested solutions to be used in other areas of the code base? This large queryset issue presented itself in the testing of #3064 in the qasp environment: python manage.py clean_and_reparse -y 2023 -q Q1. This tries to bring a querset of ~860k records into memory which kills the process. See below:

vcap@9eda436c-8a8f-4409-442c-5963:~$ python manage.py clean_and_reparse -y 2023 -q Q1

You have selected to reparse datafiles for FY 2023 and Q1. The reparsed files will NOT be stored in new indices and the old indices 
These options will delete and reparse (87) datafiles.
Continue [y/n]? y
Previous reparse has exceeded the timeout. Allowing execution of the command.
2024-08-26 20:44:54,995 INFO clean_and_reparse.py::__backup:L47 :  Beginning reparse DB Backup.
Beginning reparse DB Backup.
2024-08-26 20:44:55,000 INFO db_backup.py::get_system_values:L51 :  Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
2024-08-26 20:44:55,002 INFO db_backup.py::backup_database:L86 :  Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
2024-08-26 20:45:39,331 INFO db_backup.py::backup_database:L91 :  Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
2024-08-26 20:45:39,344 INFO db_backup.py::backup_database:L101 :  Pg dumpfile size in bytes: 280313953.
Pg dumpfile size in bytes: 280313953.
2024-08-26 20:45:39,344 INFO db_backup.py::upload_file:L173 :  Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
2024-08-26 20:45:41,768 INFO db_backup.py::upload_file:L186 :  Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
2024-08-26 20:45:41,771 INFO db_backup.py::main:L326 :  Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
2024-08-26 20:45:41,796 INFO backup_db.py::handle:L36 :  Cloud backup/restore job complete.
Cloud backup/restore job complete.
2024-08-26 20:45:41,796 INFO clean_and_reparse.py::__backup:L49 :  Backup complete! Commencing clean and reparse.
Backup complete! Commencing clean and reparse.
2024-08-26 20:45:42,429 INFO clean_and_reparse.py::__delete_associated_models:L156 :  Before summary delete
Before summary delete
2024-08-26 20:45:42,437 INFO clean_and_reparse.py::__delete_associated_models:L158 :  Before delete errors
Before delete errors
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_associated_models:L160 :  Before delete records
Before delete records
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_records:L105 :  Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
2024-08-26 20:45:51,440 INFO clean_and_reparse.py::__delete_records:L108 :  total deleted: 863642
total deleted: 863642
2024-08-26 20:45:51,441 INFO clean_and_reparse.py::__delete_records:L111 :  Deleteing from elastic
Deleteing from elastic
Killed

@robgendron robgendron added the raft review This issue is ready for raft review label Sep 11, 2024
@reitermb reitermb added QASP Review and removed raft review This issue is ready for raft review labels Sep 18, 2024
jtimpe added a commit that referenced this issue Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug dev QASP Review Refined Ticket has been refined at the backlog refinement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants