Skip to content

Commit

Permalink
Merge pull request #331 from IATI/develop
Browse files Browse the repository at this point in the history
deployment: make SOLR_PARALLEL_PROCESSES come from Github vars
  • Loading branch information
simon-20 authored May 17, 2024
2 parents 87c4196 + 1015a65 commit 9ba6f69
Show file tree
Hide file tree
Showing 4 changed files with 1,755 additions and 18 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ env:
SOLR_API_URL: ${{ secrets.DEV_SOLR_API_URL }}
SOLR_USER: ${{ secrets.DEV_SOLR_USER }}
SOLR_PASSWORD: ${{ secrets.DEV_SOLR_PASSWORD }}
SOLR_PARALLEL_PROCESSES: 10
SOLR_PARALLEL_PROCESSES: ${{ vars.DEV_SOLR_PARALLEL_PROCESSES }}
DB_USER: ${{ secrets.DB_USER }}
DB_PASS: ${{ secrets.DB_PASS }}
DB_HOST: ${{ secrets.DB_HOST }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ env:
SOLR_API_URL: ${{ secrets.PROD_SOLR_API_URL }}
SOLR_USER: ${{ secrets.PROD_SOLR_USER }}
SOLR_PASSWORD: ${{ secrets.PROD_SOLR_PASSWORD }}
SOLR_PARALLEL_PROCESSES: 5
SOLR_PARALLEL_PROCESSES: ${{ vars.PROD_SOLR_PARALLEL_PROCESSES }}
DB_USER: ${{ secrets.PROD_DB_USER }}
DB_PASS: ${{ secrets.PROD_DB_PASS }}
DB_HOST: ${{ secrets.PROD_DB_HOST }}
Expand Down
1,748 changes: 1,744 additions & 4 deletions IATI_Data_Flow.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,8 @@ Service Loop (when container starts)
- Checks for `stale_datasets` - `document.last_seen` is from a previous run (so no longer in registry)
- `clean_datasets()`
- Removes `stale_datasets` from Activity lake, decided it wasn't worth updating `changed_datasets` from activity lake because filenames are hash of `iati_identifier` so less likely to change.
- Removes `changed_datasets` and `stale_datasets` from source xml blob container and Solr.
- Removes `stale_datasets` from source and clean xml blob container and Solr.
- Removes `changed_datasets`from source and clean xml blob container. Not Solr as this will be removed later, and we want the older data to be available to data store users during processing.
- Removes `stale_datasets` from DB documents table
- `reload(retry_errors)`
- `retry_errors` is True after RETRY_ERRORS_AFTER_LOOP refreshes.
Expand Down Expand Up @@ -222,27 +223,23 @@ Service Loop (when container starts)

## Functions

- `main()` - Sends XML to the [iati-flattener](https://github.com/IATI/iati-flattener) which transforms it into a flat JSON document, then stores it in the database (`document.flattened_activities`) in JSONB format.
- `main()` - Flattens XML into a flat JSON document, then stores it in the database (`document.flattened_activities`) in JSONB format.

Used to use the [iati-flattener service](https://github.com/IATI/iati-flattener), but now it does it using a Python class it the same process.

## Logic

- `main()`
- Reset unfinished flattens
- Reset unfinished and errored flattens
- Get unflattened (`db.getUnflattenedDatasets`)
- process_hash_list()
- If prior_error = 422, 400, 413, break out of loop for this file
- Start flatten in db (db.startFlatten)
- Download source XML from Azure blobs - If charset error, breaks out of loop for file
- POST's to flattener API
- Update solrize_start column
- If status code != 200
- `404` - update DB `document.flatten_api_error`, pause 1min, continue loop
- `400 - 499` - update DB `document.flatten_api_error`, break out of loop
- `500 +` - update DB `document.flatten_api_error`, break out of loop
- else - log warning, continue
- Uses Python class `Flattener` to flatten.
- Mark done and store results in DB (db.completeFlatten)
- If exception
- Can't download BLOB, then `"UPDATE document SET downloaded = null WHERE id = %(id)s"`, to force re-download
- Other Exception, log message, no change to DB
- Other Exception, log message, `UPDATE document SET flatten_api_error = %(error)s WHERE id = %(doc_id)s`

# Lakify

Expand Down

0 comments on commit 9ba6f69

Please sign in to comment.