You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo was originally started for, among others, the Peace Portal corpora. From this side there was a request to regularly check resources publishing funerary inscriptions for newly added inscriptions. I was thinking to set this up as follows:
add Elasticsearch dependency to this repository
let the scraper retrieve ids of all (newly) available documents and compare their ids against ids found in existing indices
scrape the documents whose ids are not yet in the indices
Preferably, this would be done through a dedicated server, or, we could make use of Kubernetes. I'd like to work on this in the first half of 2025, so I'm not sure if the latter option is available.
This would need to be run at regular intervals, e.g., every few months. What would be the best way to achieve this? We could add a chronjob on the server itself, or we could make use of a self-hosted GitHub action runner and trigger the harvest via GitHub actions. Any thoughts, @gdamaskos@tymees@bartbouter@falconburrow ?
The text was updated successfully, but these errors were encountered:
This repo was originally started for, among others, the Peace Portal corpora. From this side there was a request to regularly check resources publishing funerary inscriptions for newly added inscriptions. I was thinking to set this up as follows:
Preferably, this would be done through a dedicated server, or, we could make use of Kubernetes. I'd like to work on this in the first half of 2025, so I'm not sure if the latter option is available.
This would need to be run at regular intervals, e.g., every few months. What would be the best way to achieve this? We could add a chronjob on the server itself, or we could make use of a self-hosted GitHub action runner and trigger the harvest via GitHub actions. Any thoughts, @gdamaskos @tymees @bartbouter @falconburrow ?
The text was updated successfully, but these errors were encountered: