-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: setup scripts to update primary language and publication_date #15
chore: setup scripts to update primary language and publication_date #15
Conversation
Note: When switching this to Django management command within the web_app repo, we'll actually. refactor to fetch the sources directly from the database rather that making the API call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice- I like the modularity of this approach, as we think about making other kinds of batch updates to data in the directory. This just exports the changes to a CSV right? I like that approach too, but have you validated the upload loop for that? I've never touched the batch upload paths myself so I'd love to see a validation on a test instance of the web_search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure looks good, here is my early feedback on some few things I observed
…es and analyze first_publication_date and language
This looks good to me! Next step is moving this over to the web_search. |
Sure, currently working on moving this to Websearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 🚀
This PR introduces scripts to update the sources table columns
first_story
andlanguage
based on queries from the ES clusterAddresses #12 #11
This currently exports the results to CSV as follows
language_results_batch_1.csv
publication_date_results_batch_1.csv