Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #2029] Merge Nava fork to HHS #2173

Merged
merged 59 commits into from
Sep 18, 2024
Merged

Conversation

acouch
Copy link
Collaborator

@acouch acouch commented Sep 18, 2024

Summary

Fixes #229

Time to review: 3,190 mins

Changes proposed

Merge the Nava work during the hiatus over to HHS.

Additional information

The issue numbers should line up correctly

mdragon
mdragon previously approved these changes Sep 18, 2024
Copy link
Collaborator

@mdragon mdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off in so much as my changes from the fork are faithfully represented here. Maybe try smaller PRs next time ducks

chouinar
chouinar previously approved these changes Sep 18, 2024
Copy link
Collaborator

@chouinar chouinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - skimmed through the API folder and didn't see anything off

@andycochran
Copy link
Collaborator

This will be "Rebase and merge" not "Squash and merge" right? ("Create a merge commit" might not be bad either, as it'd be a clear indicator of when this happened.)

@acouch acouch dismissed stale reviews from chouinar and mdragon via 01bf11a September 18, 2024 16:36
@acouch
Copy link
Collaborator Author

acouch commented Sep 18, 2024

This will be "Rebase and merge"

@andycochran , correct

acouch added a commit to navapbc/simpler-grants-gov that referenced this pull request Sep 18, 2024
chouinar
chouinar previously approved these changes Sep 18, 2024
chouinar and others added 26 commits September 18, 2024 15:32
Fixes HHS#2046

Setup S3 localstack for having a local version of S3 to use (for future
work)

Script / utils for interacting with S3

Localstack is a tool that creates a mock version of AWS locally. While
the ability to mock out certain features varies, S3 being just a file
storage system is pretty simple and fully featured even when mocked.

Note that localstack has a paid version as well that adds more features,
but all of S3's features are [supported in the free community
tier](https://docs.localstack.cloud/references/coverage/coverage_s3/).
We've used localstack for s3 and a few other AWS services on other
projects.

The script creates the S3 bucket in localstack. You can actually
interact with the localstack instance of s3 with the AWS cli like so:

```sh
aws --endpoint-url http://localhost:4566 s3 ls
> 2024-07-12 13:10:24 local-opportunities
```

I created a tmp file in it succesfully:
```sh
aws --endpoint-url http://localhost:4566 s3 cp tmp.txt s3://local-opportunities/path/to/tmp.txt
```

I can see the tmp file:
```sh
aws --endpoint-url http://localhost:4566 s3 ls s3://local-opportunities/path/to/
> 2024-07-12 13:23:22         15 tmp.txt
```

And I can download it:
```sh
aws --endpoint-url http://localhost:4566 s3 cp s3://local-opportunities/path/to/tmp.txt local_tmp.txt
```
Fixes HHS#2064

Modified the search endpoint to be able to return its response as a CSV
file

My understanding is that the CSV download in the current experience is a
frequently used feature - so adding it is worthwhile.

An important detail is that all it takes to switch from getting the
response as a normal JSON response body is to change the new "format"
field in the request. So if the frontend added a new download button,
they would just make an identical request adding this format field
(they'd likely want to also adjust the page size to return more than 25
items).

The actual logic is pretty simple, instead of return the normal JSON
body, we instead construct a CSV file object and return that. There is
some level of formatting/parsing that we need to do with this, but its
pretty minor. Note that it is explicit with which fields it returns that
way the CSV won't keep changing on users if we make adjustments to the
schemas elsewhere.

As for returning the file, it just relies on Flask itself. I'm not as
familiar with file operations in an endpoint like this, so if there are
scaling concerns (ie. very large output files), let me know. I know
there are a few tools in Flask for streaming file responses and other
complexities.

If we wanted to add support for more file types like a JSON file or XML,
we'd just need to add converters for those and the file logic should all
work the same. I originally implemented this as JSON but realized it was
just the exact response body shoved in a file - if a user wants that
they might just create the file themselves from the API response.

You can see what the file looks like that this produced either by
running the API yourself, or looking at this one I generated.

Note that for the list fields, I used `;` to separate the values within
a single cell.

[opportunity_search_results_20240617-152953.csv](https://github.com/user-attachments/files/15873437/opportunity_search_results_20240617-152953.csv)

---------

Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2054

Restructured the transformation code
- Split each chunk of the transformation logic into separate "Subtask"
classes
- Made a constants file
- A few duplicated pieces of implementation were pulled into functions
that the subtasks are derived from

Some additional logging

Created a Subtask class for breaking up a task into multiple steps for
organizational reasons

Added configuration to the transformation step for enabling/disabling
different parts of the process (not used yet - but lets us build things
out and not worry about breaking non-local environments).

This looks far larger than it actually is, most of the actual changes
are very small, I made all of the changes without adjusting the tests
(outside of a few small bits of cleanup) and then refactored the tests
as well. This does not aim to change the meaningful behavior of the
transformation logic, but instead make it a lot easier to parse. Now
when we add new transformations, that's conceptually simpler as its
adding another one of the subtasks rather than adding to the massive
mess of functions it was before.

There are a few small logging / metric related changes from the Subtask
just so we can have very granular metrics of how long each part of the
task takes.

I ran locally with a full snapshot of the production data and didn't see
anything of note different from prior runs. Still takes ~10 minutes.

---------

Co-authored-by: nava-platform-bot <[email protected]>
…nt schema (navapbc/simpler-grants-govnavapbc/simpler-grants-gov#168)

Fixes HHS#2056

- Added .with_start_date to search_schema builder to allow building a
date field with key of "start_date"
- Added .with_end_date to search_schema builder to allow building a date
field with key of "end_date"
- Added post_date and close_date properties to
OpportunitySearchFilterV1Schema class, which utilize the above to build
schema filters for post_date and close_date which can utilize start_date
and/or end_date fields.
- Added two unit tests in test_opportunity_route_search that will test
the data validation of these new filters. One test is for 200 response
cases and the other test is for 422 (invalid) response cases.

Note: As noted in the AC of Issue #163, this PR does NOT include
implementation of the filters. Currently, these filters do nothing as
they haven't been tied to any sort of query. This PR is just to lay the
ground work.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Fixes #59

This makes the search page static and adds a suspense boundary for the
data being fetched by the server. The data comes from the API and is
called from 3 components:

* [`<SearchPaginationFetch
/>`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-9dbdda5096b97ad049cccea24c5a046581d26c151a6f94fcc32c05cb33ee9dee)
* [`<SearchResultsHeaderFetch
/>`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-14a084f66c050414cc2bbd0875256511630438971022073301bbfe91c4aa8cd1)
* [`<SearchResultsListFetch
/>`](https://github.com/navapbc/simpler-grants-gov/pull/101/files#diff-aabe6a7d19434a9b26199430bbcde5d31a0790aebc4cd844b922ac2fa1348dce)

This also simplifies the state model by pushing state changes directly
to the browser query params and rerendering the changed items. This
makes things a lot simpler and thus a lot of state management code is
removed and there results list is no longer wrapped in a form and
passing form refs between components. This is the recommended approach
by next:
https://nextjs.org/learn/dashboard-app/adding-search-and-pagination

There are several items that needed to be shared among the client
components: the query, total results count, and total pages. These are
wrapped in a `<QueryProvider />` that updates the state of these items.
This was added so that if someone enters a query in the text box and the
clicks a filter their query is not lost, so that the "N Opportunities"
text doesn't need to be rerendered when paging or sorting, and so that
the pager stays the same length when paging or sorting.

The data is fetched a couple of times in a duplicative fashion, however
this follows [NextJS best
practice](https://nextjs.org/docs/app/building-your-application/rendering/composition-patterns#sharing-data-between-components)
since the requests are cached.

The pager has been updated to reload only when there is a change in the
page length. Because of an issue with the way the pager renders, it is
unavailable while data is being updated:

<img width="1229" alt="image"
src="https://github.com/navapbc/simpler-grants-gov/assets/512243/a097b0e2-f646-43b5-bc5a-664db02780a2">

This is because the Truss React component [switches between a link and a
button as it
renders](https://github.com/trussworks/react-uswds/blob/main/src/components/Pagination/Pagination.tsx#L42)
and there isn't an option to supply query arguments, so if a user where
to click it they would lose the query params.

Overall this puts us on nice footing for the upcoming work using NextJS
best practice.
Fixes HHS#2042

Updated most search strings on Search page to use correct next-intl
translation components
Added strings to Messages file

Updated the unit tests as well because they were not inheriting context
due to improper import path from non-global context source

<img width="1583" alt="Screenshot 2024-08-07 at 1 36 18 PM"
src="https://github.com/user-attachments/assets/1b613d26-cbd0-4d0e-b831-0380c6f72af6">
…son (#176)

Fixes #166

- Adds export_opportunity_data task
- Changes opportunity_to_csv function to opportunities_to_csv to be more
flexible by including output as a parameter
- Adds unit test for export_opportunity_data task.

- The test runs the export_opportunity_data task, uploading a csv and
json file to mock_s3_bucket. Then it reads the files and verifies
contents.

---------

Co-authored-by: Michael Chouinard <[email protected]>
…ent (#173)

Fixes HHS#2041

> Removed inline styling on component

> Ended up having to create a custom loading style for pagination due to
no support from USWDS on this. Honestly though IDK why we don't just
hide pagination during loading...

> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.
![2024-08-08 13 59
02](https://github.com/user-attachments/assets/04994a3f-a48e-4511-839d-aa686a7f1414)
)

Fixes HHS#2040

> Updated pagination component to hide if there are no results.
Previously it would show 7 pages that you could navigate between, all
with no results.

Updated all unit tests for pagination component. They were broken and
commented out previously.

![2024-08-08 15 45
10](https://github.com/user-attachments/assets/27e8dc6a-dddd-4c52-8086-4cd4579d73fe)
Fixes HHS#2037

> Added new rule to eslint to prevent leaving behind console log
statements

> normal expected rule in eslint configurations
Fixes HHS#2035

Pin the Python version and document details on how to upgrade

Upgraded packages as well while I was tinkering with this

This is mirroring changes from the template repo:
navapbc/template-application-flask#235

Python releases new minor versions (3.12, 3.13, etc.) every year in
October. It usually takes a few weeks for all of our dependencies and
tooling to also be upgraded to the latest version, causing our builds to
break. There isn't much we can do except wait a few weeks and then do
the upgrade (assuming no breaking changes in features we use).

However, we had some of our dependencies pinned to the major version
(Python 3) so it has broken the past few years until it started working
again when the dependencies got fixed. This is just getting ahead of
that and making sure the upgrade to Python 3.13 doesn't cause any
problems.

This change is largely documentation as the Python version used in the
dockerfile + pyproject.toml already would have resolved to Python 3.12,
this just makes it so it won't auto-upgrade to 3.13 when that releases
in October.

The package updates are all very minor - we updated them not too long
ago, mostly just cleaning up a few things like the types-requests issue
that is no longer present.
Fixes HHS#2033

Added a builder for the search schema for integer and boolean fields.

Updated the string builder to fix a small bug + allow you to specify a
pattern for the string.

This is just adding additional filters that we'll be able to search by
(once the rest of the implementation is built out). These are all fields
that fall into a "yeah, I think someone might want to narrow down by
that" group, and definitely doesn't encompass every filter we might want
to add. Mostly just wanted to get integer and boolean fields implemented
so the search logic could all get built out with actual use cases.

Added examples to the OpenAPI docs to verify these work, at least at the
request parsing phase.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Updated API packages

Mostly to upgrade flask-cors which had a vulnerability in releases
before 5.0
Fixes HHS#2032

Adjusted a few makefile commands / added some additional docs

Renamed `db-recreate` to `volume-recreate` as the command also recreates
the OpenSearch volume.

Added a command for populating your OpenSearch index locally for
opportunities rather than needing to build the full command yourself.

As we've added more commands to the Makefile, we haven't kept ontop of
how they all interact. This should add some level of clarity to the
commands.

The new `make populate-search-opportunities` command is a much easier
way of writing out the full command of `docker compose run --rm
grants-api poetry run flask load-search-data load-opportunity-data`.

Note that we have a `make help` command which prints out any command
with a `## help text` on the target definition in the makefile:
<img width="1353" alt="Screenshot 2024-09-06 at 4 38 02 PM"
src="https://github.com/user-attachments/assets/d21f25c4-031f-4028-bf6b-d73908215b01">
Fixes #164

Adds support to our search query builder layer to handle building
queries for filtering on ints and dates between specific date ranges.

Added ints and dates in the same ticket as the queries are essentially
the same, just ints vs dates. While it throws an exception if both start
and end values are None, I think the API request schema should also do
that so that the error is more relevant/accurate to the API, but I can
add that later (likely a lot more niche edge cases to handle for
requests).

Nothing too exciting with these queries, they work as expected and are
just simple ranges.

The test dataset is roughly accurate (turns out books didn't always have
exact release dates until the last ~20 years).

I also have tested these queries manually with the opportunity endpoints
fields, the following two queries would work (can be tested locally at
http://localhost:5601/app/dev_tools#/console ):
```
GET opportunity-index-alias/_search
{
  "size": 5,
  "query": {
    "bool": {
      "filter": [
          {
            "range": {
              "summary.post_date": {
                "gte": "2020-01-01",
                "lte": "2025-01-01"
              }
            }
          }
        ]
    }
  }
}

GET opportunity-index-alias/_search
{
  "size": 12,
  "query": {
    "bool": {
      "filter": [
          {
            "range": {
              "summary.award_floor": {
                "gte": 1234,
                "lte": 100000
              }
            }
          }
        ]
    }
  }
}
```
Fixes HHS#2034

> Cleaning up docs around getting onboarded to the project/repo

> I walked through the READMEs and linked documentation to try to get
things up and running on my new laptop without having to go to any
outside sources of information (including you all) without adding that
to the docs.
Fixes HHS#2050

Added agency tables

Added tgroups tables to the staging & foreign tables

Added factories / data setup for these tables

https://docs.google.com/document/d/1EPZJyqTQruq-BkQoojtrkLqgVl8GrpfsZAIY9T1dEg8/edit
provides a lot of rough details on how this data works in the existing
system

The next ticket will be to transform the data stored in tgroups into the
new agency table(s).

The agency table I created contains most of the fields from the legacy
system (in a normalized structure) including a few that I don't think we
quite need, but just in case they have a separate purpose than what I
can understand, I'm preferring to keep them.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2038

Updated the load search data task to partially support incrementally
loading + deleting records in the search index rather than just fully
remaking it.

Various changes to the search utilities to support this work

Technically this doesn't fully support a true incremental load as it
updates every record rather than just the ones with changes. I think the
logic necessary to detect changes both deserves its own ticket, and may
evolve when we later support indexing files to OpenSearch, so I think it
makes sense to hold off on that for now.
…ueries (#197)

Fixes HHS#2039

Adjusted the logic that connects the API requests to the builder in the
search layer to now connect all of the new fields.

A few minor validation adjustments to the API to prevent a few common
mistakes a user could make

The length of the search tests are getting pretty long, I think a good
follow-up would be to split the test file into validation and response
testing.

I adjusted some validation/setup of the API schemas because I don't see
a scenario where min/max OR start/end dates would not ever be needed
together. This also let me add a quick validation rule that a user would
provide at least one of the values.

I adjusted some of the way the search_opportunities file was structured
as we only supported filtering by strings before, and it used the name
of the variables to determine the type. I made it a bit more explicit,
as before random variables could be passed through to the search layer
which seems potentially problematic if not filtered out somewhere.
Fixes HHS#2051

Add transformations for agency data

Agency data is structured oddly in the existing system, instead of being
in ordinary tables, its in a `tgroups` table that has values stored as
key-value pairs. We want to normalize that into something more workable,
so the transformation needs to work a bit differently than the
transformations of other tables.

For simplicity, I load all of the data for every agency (and later
filter to just what changed) as this removes a lot of weird edge cases
that we would have otherwise needed to consider. Only modified rows
actually get used, but we know we have the full set of data now.

I have a snapshot of the prod tgroups table and loaded it into my DB
locally and ran the transform script. In total, it takes ~2 seconds to
run and didn't hit any issues.

A set of the relevant metrics:
```
total_records_processed=1152
total_records_deleted=0
total_records_inserted=1152
total_records_updated=0
total_error_count=0
agency.total_records_processed=1152
agency.total_records_inserted=1152
TransformAgency_subtask_duration_sec=2.14
task_duration_sec=2.14
```

As a sanity test, I also loaded in the tgroups data from dev and tried
running it through. While it generally worked, there were 12 agencies
that failed because they were missing the ldapGp and AgencyContactCity
fields. I'm not certain if we want to do anything about that as they all
seemed to be test agencies based on the names.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Fixes HHS#2036

> What was added, updated, or removed in this PR.

> Testing instructions, background context, more in-depth details of the
implementation, and anything else you'd like to call out or ask
reviewers. Explain how the changes were verified.

> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

---------

Co-authored-by: Aaron Couch <[email protected]>
Co-authored-by: Aaron Couch <[email protected]>
@acouch acouch merged commit 22562ce into HHS:main Sep 18, 2024
29 of 45 checks passed
@acouch acouch deleted the acouch/merge-nava-to-hhs branch September 18, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecated ticket
8 participants