-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #2029] Merge Nava fork to HHS #2173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signing off in so much as my changes from the fork are faithfully represented here. Maybe try smaller PRs next time ducks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - skimmed through the API folder and didn't see anything off
This will be "Rebase and merge" not "Squash and merge" right? ("Create a merge commit" might not be bad either, as it'd be a clear indicator of when this happened.) |
@andycochran , correct |
31b9ab6
to
ae72ed6
Compare
ae72ed6
to
780e450
Compare
Update API packages Mostly to address a vulnerability in the certifi library: https://github.com/navapbc/simpler-grants-gov/actions/runs/9844240120/job/27177446181?pr=127
Fixes HHS#2046 Setup S3 localstack for having a local version of S3 to use (for future work) Script / utils for interacting with S3 Localstack is a tool that creates a mock version of AWS locally. While the ability to mock out certain features varies, S3 being just a file storage system is pretty simple and fully featured even when mocked. Note that localstack has a paid version as well that adds more features, but all of S3's features are [supported in the free community tier](https://docs.localstack.cloud/references/coverage/coverage_s3/). We've used localstack for s3 and a few other AWS services on other projects. The script creates the S3 bucket in localstack. You can actually interact with the localstack instance of s3 with the AWS cli like so: ```sh aws --endpoint-url http://localhost:4566 s3 ls > 2024-07-12 13:10:24 local-opportunities ``` I created a tmp file in it succesfully: ```sh aws --endpoint-url http://localhost:4566 s3 cp tmp.txt s3://local-opportunities/path/to/tmp.txt ``` I can see the tmp file: ```sh aws --endpoint-url http://localhost:4566 s3 ls s3://local-opportunities/path/to/ > 2024-07-12 13:23:22 15 tmp.txt ``` And I can download it: ```sh aws --endpoint-url http://localhost:4566 s3 cp s3://local-opportunities/path/to/tmp.txt local_tmp.txt ```
Fixes HHS#2064 Modified the search endpoint to be able to return its response as a CSV file My understanding is that the CSV download in the current experience is a frequently used feature - so adding it is worthwhile. An important detail is that all it takes to switch from getting the response as a normal JSON response body is to change the new "format" field in the request. So if the frontend added a new download button, they would just make an identical request adding this format field (they'd likely want to also adjust the page size to return more than 25 items). The actual logic is pretty simple, instead of return the normal JSON body, we instead construct a CSV file object and return that. There is some level of formatting/parsing that we need to do with this, but its pretty minor. Note that it is explicit with which fields it returns that way the CSV won't keep changing on users if we make adjustments to the schemas elsewhere. As for returning the file, it just relies on Flask itself. I'm not as familiar with file operations in an endpoint like this, so if there are scaling concerns (ie. very large output files), let me know. I know there are a few tools in Flask for streaming file responses and other complexities. If we wanted to add support for more file types like a JSON file or XML, we'd just need to add converters for those and the file logic should all work the same. I originally implemented this as JSON but realized it was just the exact response body shoved in a file - if a user wants that they might just create the file themselves from the API response. You can see what the file looks like that this produced either by running the API yourself, or looking at this one I generated. Note that for the list fields, I used `;` to separate the values within a single cell. [opportunity_search_results_20240617-152953.csv](https://github.com/user-attachments/files/15873437/opportunity_search_results_20240617-152953.csv) --------- Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2054 Restructured the transformation code - Split each chunk of the transformation logic into separate "Subtask" classes - Made a constants file - A few duplicated pieces of implementation were pulled into functions that the subtasks are derived from Some additional logging Created a Subtask class for breaking up a task into multiple steps for organizational reasons Added configuration to the transformation step for enabling/disabling different parts of the process (not used yet - but lets us build things out and not worry about breaking non-local environments). This looks far larger than it actually is, most of the actual changes are very small, I made all of the changes without adjusting the tests (outside of a few small bits of cleanup) and then refactored the tests as well. This does not aim to change the meaningful behavior of the transformation logic, but instead make it a lot easier to parse. Now when we add new transformations, that's conceptually simpler as its adding another one of the subtasks rather than adding to the massive mess of functions it was before. There are a few small logging / metric related changes from the Subtask just so we can have very granular metrics of how long each part of the task takes. I ran locally with a full snapshot of the production data and didn't see anything of note different from prior runs. Still takes ~10 minutes. --------- Co-authored-by: nava-platform-bot <[email protected]>
…nt schema (navapbc/simpler-grants-govnavapbc/simpler-grants-gov#168) Fixes HHS#2056 - Added .with_start_date to search_schema builder to allow building a date field with key of "start_date" - Added .with_end_date to search_schema builder to allow building a date field with key of "end_date" - Added post_date and close_date properties to OpportunitySearchFilterV1Schema class, which utilize the above to build schema filters for post_date and close_date which can utilize start_date and/or end_date fields. - Added two unit tests in test_opportunity_route_search that will test the data validation of these new filters. One test is for 200 response cases and the other test is for 422 (invalid) response cases. Note: As noted in the AC of Issue #163, this PR does NOT include implementation of the filters. Currently, these filters do nothing as they haven't been tied to any sort of query. This PR is just to lay the ground work. --------- Co-authored-by: nava-platform-bot <[email protected]>
Fixes #59 This makes the search page static and adds a suspense boundary for the data being fetched by the server. The data comes from the API and is called from 3 components: * [`<SearchPaginationFetch />`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-9dbdda5096b97ad049cccea24c5a046581d26c151a6f94fcc32c05cb33ee9dee) * [`<SearchResultsHeaderFetch />`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-14a084f66c050414cc2bbd0875256511630438971022073301bbfe91c4aa8cd1) * [`<SearchResultsListFetch />`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-aabe6a7d19434a9b26199430bbcde5d31a0790aebc4cd844b922ac2fa1348dce) This also simplifies the state model by pushing state changes directly to the browser query params and rerendering the changed items. This makes things a lot simpler and thus a lot of state management code is removed and there results list is no longer wrapped in a form and passing form refs between components. This is the recommended approach by next: https://nextjs.org/learn/dashboard-app/adding-search-and-pagination There are several items that needed to be shared among the client components: the query, total results count, and total pages. These are wrapped in a `<QueryProvider />` that updates the state of these items. This was added so that if someone enters a query in the text box and the clicks a filter their query is not lost, so that the "N Opportunities" text doesn't need to be rerendered when paging or sorting, and so that the pager stays the same length when paging or sorting. The data is fetched a couple of times in a duplicative fashion, however this follows [NextJS best practice](https://nextjs.org/docs/app/building-your-application/rendering/composition-patterns#sharing-data-between-components) since the requests are cached. The pager has been updated to reload only when there is a change in the page length. Because of an issue with the way the pager renders, it is unavailable while data is being updated: <img width="1229" alt="image" src="https://github.com/navapbc/simpler-grants-gov/assets/512243/a097b0e2-f646-43b5-bc5a-664db02780a2"> This is because the Truss React component [switches between a link and a button as it renders](https://github.com/trussworks/react-uswds/blob/main/src/components/Pagination/Pagination.tsx#L42) and there isn't an option to supply query arguments, so if a user where to click it they would lose the query params. Overall this puts us on nice footing for the upcoming work using NextJS best practice.
Fixes HHS#2042 Updated most search strings on Search page to use correct next-intl translation components Added strings to Messages file Updated the unit tests as well because they were not inheriting context due to improper import path from non-global context source <img width="1583" alt="Screenshot 2024-08-07 at 1 36 18 PM" src="https://github.com/user-attachments/assets/1b613d26-cbd0-4d0e-b831-0380c6f72af6">
…son (#176) Fixes #166 - Adds export_opportunity_data task - Changes opportunity_to_csv function to opportunities_to_csv to be more flexible by including output as a parameter - Adds unit test for export_opportunity_data task. - The test runs the export_opportunity_data task, uploading a csv and json file to mock_s3_bucket. Then it reads the files and verifies contents. --------- Co-authored-by: Michael Chouinard <[email protected]>
…ent (#173) Fixes HHS#2041 > Removed inline styling on component > Ended up having to create a custom loading style for pagination due to no support from USWDS on this. Honestly though IDK why we don't just hide pagination during loading... > Screenshots, GIF demos, code examples or output to help show the changes working as expected. ![2024-08-08 13 59 02](https://github.com/user-attachments/assets/04994a3f-a48e-4511-839d-aa686a7f1414)
) Fixes HHS#2040 > Updated pagination component to hide if there are no results. Previously it would show 7 pages that you could navigate between, all with no results. Updated all unit tests for pagination component. They were broken and commented out previously. ![2024-08-08 15 45 10](https://github.com/user-attachments/assets/27e8dc6a-dddd-4c52-8086-4cd4579d73fe)
Fixes HHS#2035 Pin the Python version and document details on how to upgrade Upgraded packages as well while I was tinkering with this This is mirroring changes from the template repo: navapbc/template-application-flask#235 Python releases new minor versions (3.12, 3.13, etc.) every year in October. It usually takes a few weeks for all of our dependencies and tooling to also be upgraded to the latest version, causing our builds to break. There isn't much we can do except wait a few weeks and then do the upgrade (assuming no breaking changes in features we use). However, we had some of our dependencies pinned to the major version (Python 3) so it has broken the past few years until it started working again when the dependencies got fixed. This is just getting ahead of that and making sure the upgrade to Python 3.13 doesn't cause any problems. This change is largely documentation as the Python version used in the dockerfile + pyproject.toml already would have resolved to Python 3.12, this just makes it so it won't auto-upgrade to 3.13 when that releases in October. The package updates are all very minor - we updated them not too long ago, mostly just cleaning up a few things like the types-requests issue that is no longer present.
Fixes HHS#2033 Added a builder for the search schema for integer and boolean fields. Updated the string builder to fix a small bug + allow you to specify a pattern for the string. This is just adding additional filters that we'll be able to search by (once the rest of the implementation is built out). These are all fields that fall into a "yeah, I think someone might want to narrow down by that" group, and definitely doesn't encompass every filter we might want to add. Mostly just wanted to get integer and boolean fields implemented so the search logic could all get built out with actual use cases. Added examples to the OpenAPI docs to verify these work, at least at the request parsing phase. --------- Co-authored-by: nava-platform-bot <[email protected]>
Updated API packages Mostly to upgrade flask-cors which had a vulnerability in releases before 5.0
Fixes HHS#2032 Adjusted a few makefile commands / added some additional docs Renamed `db-recreate` to `volume-recreate` as the command also recreates the OpenSearch volume. Added a command for populating your OpenSearch index locally for opportunities rather than needing to build the full command yourself. As we've added more commands to the Makefile, we haven't kept ontop of how they all interact. This should add some level of clarity to the commands. The new `make populate-search-opportunities` command is a much easier way of writing out the full command of `docker compose run --rm grants-api poetry run flask load-search-data load-opportunity-data`. Note that we have a `make help` command which prints out any command with a `## help text` on the target definition in the makefile: <img width="1353" alt="Screenshot 2024-09-06 at 4 38 02 PM" src="https://github.com/user-attachments/assets/d21f25c4-031f-4028-bf6b-d73908215b01">
Fixes #164 Adds support to our search query builder layer to handle building queries for filtering on ints and dates between specific date ranges. Added ints and dates in the same ticket as the queries are essentially the same, just ints vs dates. While it throws an exception if both start and end values are None, I think the API request schema should also do that so that the error is more relevant/accurate to the API, but I can add that later (likely a lot more niche edge cases to handle for requests). Nothing too exciting with these queries, they work as expected and are just simple ranges. The test dataset is roughly accurate (turns out books didn't always have exact release dates until the last ~20 years). I also have tested these queries manually with the opportunity endpoints fields, the following two queries would work (can be tested locally at http://localhost:5601/app/dev_tools#/console ): ``` GET opportunity-index-alias/_search { "size": 5, "query": { "bool": { "filter": [ { "range": { "summary.post_date": { "gte": "2020-01-01", "lte": "2025-01-01" } } } ] } } } GET opportunity-index-alias/_search { "size": 12, "query": { "bool": { "filter": [ { "range": { "summary.award_floor": { "gte": 1234, "lte": 100000 } } } ] } } } ```
Fixes HHS#2034 > Cleaning up docs around getting onboarded to the project/repo > I walked through the READMEs and linked documentation to try to get things up and running on my new laptop without having to go to any outside sources of information (including you all) without adding that to the docs.
Fixes HHS#2050 Added agency tables Added tgroups tables to the staging & foreign tables Added factories / data setup for these tables https://docs.google.com/document/d/1EPZJyqTQruq-BkQoojtrkLqgVl8GrpfsZAIY9T1dEg8/edit provides a lot of rough details on how this data works in the existing system The next ticket will be to transform the data stored in tgroups into the new agency table(s). The agency table I created contains most of the fields from the legacy system (in a normalized structure) including a few that I don't think we quite need, but just in case they have a separate purpose than what I can understand, I'm preferring to keep them. --------- Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.
…ueries (#197) Fixes HHS#2039 Adjusted the logic that connects the API requests to the builder in the search layer to now connect all of the new fields. A few minor validation adjustments to the API to prevent a few common mistakes a user could make The length of the search tests are getting pretty long, I think a good follow-up would be to split the test file into validation and response testing. I adjusted some validation/setup of the API schemas because I don't see a scenario where min/max OR start/end dates would not ever be needed together. This also let me add a quick validation rule that a user would provide at least one of the values. I adjusted some of the way the search_opportunities file was structured as we only supported filtering by strings before, and it used the name of the variables to determine the type. I made it a bit more explicit, as before random variables could be passed through to the search layer which seems potentially problematic if not filtered out somewhere.
Fixes HHS#2051 Add transformations for agency data Agency data is structured oddly in the existing system, instead of being in ordinary tables, its in a `tgroups` table that has values stored as key-value pairs. We want to normalize that into something more workable, so the transformation needs to work a bit differently than the transformations of other tables. For simplicity, I load all of the data for every agency (and later filter to just what changed) as this removes a lot of weird edge cases that we would have otherwise needed to consider. Only modified rows actually get used, but we know we have the full set of data now. I have a snapshot of the prod tgroups table and loaded it into my DB locally and ran the transform script. In total, it takes ~2 seconds to run and didn't hit any issues. A set of the relevant metrics: ``` total_records_processed=1152 total_records_deleted=0 total_records_inserted=1152 total_records_updated=0 total_error_count=0 agency.total_records_processed=1152 agency.total_records_inserted=1152 TransformAgency_subtask_duration_sec=2.14 task_duration_sec=2.14 ``` As a sanity test, I also loaded in the tgroups data from dev and tried running it through. While it generally worked, there were 12 agencies that failed because they were missing the ldapGp and AgencyContactCity fields. I'm not certain if we want to do anything about that as they all seemed to be test agencies based on the names. --------- Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2036 > What was added, updated, or removed in this PR. > Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified. > Screenshots, GIF demos, code examples or output to help show the changes working as expected. --------- Co-authored-by: Aaron Couch <[email protected]> Co-authored-by: Aaron Couch <[email protected]>
816504c
to
92f266f
Compare
Summary
Fixes #229
Time to review: 3,190 mins
Changes proposed
Merge the Nava work during the hiatus over to HHS.
Additional information
The issue numbers should line up correctly