Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: setup scripts to update primary language and publication_date #15

Merged
merged 3 commits into from
Jan 31, 2025

Conversation

thepsalmist
Copy link
Contributor

@thepsalmist thepsalmist commented Jan 22, 2025

This PR introduces scripts to update the sources table columns first_story and language based on queries from the ES cluster

Addresses #12 #11

This currently exports the results to CSV as follows
language_results_batch_1.csv
publication_date_results_batch_1.csv

@thepsalmist thepsalmist self-assigned this Jan 22, 2025
@thepsalmist thepsalmist requested a review from m453h January 22, 2025 14:43
@thepsalmist
Copy link
Contributor Author

Note: When switching this to Django management command within the web_app repo, we'll actually. refactor to fetch the sources directly from the database rather that making the API call

Copy link
Member

@pgulley pgulley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice- I like the modularity of this approach, as we think about making other kinds of batch updates to data in the directory. This just exports the changes to a CSV right? I like that approach too, but have you validated the upload loop for that? I've never touched the batch upload paths myself so I'd love to see a validation on a test instance of the web_search.

Copy link
Contributor

@m453h m453h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks good, here is my early feedback on some few things I observed

@thepsalmist thepsalmist marked this pull request as ready for review January 27, 2025 14:18
@pgulley
Copy link
Member

pgulley commented Jan 28, 2025

This looks good to me! Next step is moving this over to the web_search.

@thepsalmist
Copy link
Contributor Author

This looks good to me! Next step is moving this over to the web_search.

Sure, currently working on moving this to Websearch

Copy link
Contributor

@m453h m453h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🚀

@pgulley pgulley self-requested a review January 30, 2025 19:24
@thepsalmist thepsalmist merged commit e913a0a into mediacloud:main Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants