Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver script #3

Open
Klortho opened this issue May 20, 2012 · 4 comments
Open

Driver script #3

Klortho opened this issue May 20, 2012 · 4 comments

Comments

@Klortho
Copy link
Collaborator

Klortho commented May 20, 2012

Right now this is fetch-samples, but it needs to morph into a real driver script with these features:

  • Maintain a lightweight sqlite database that describes:
    • which articles have been converted and uploaded, and the status of each
    • when the last batch (from oa-service) was retrieved
  • Takes an argument articles, which specifies the list of articles to process.
    Either an explicit list, or a reference to an XML file that contains
    a list, or (default) all the articles that have been updated since last time,
    according to the oa-service.
  • Takes an argument steps, that specifies which step in the pipeline to execute (default is all):
    • Download from PMC
    • Unzip
    • Reorganize directory
    • Convert XML
    • Import into Mediawiki
    • Upload media files into Mediawiki
@Daniel-Mietchen
Copy link
Collaborator

Sounds very similar to the oa-get routine in
https://github.com/erlehmann/open-access-media-importer .

@Klortho
Copy link
Collaborator Author

Klortho commented May 20, 2012

Yes, I saw that. But I didn't think oa-get is ready for prime time, and I just want something simple. But you're right that these should be tied together at some point. I might have to learn Python ...

@Daniel-Mietchen
Copy link
Collaborator

The OA Media Importer as a whole is not ready yet, but the crawling part mostly is, and using it does not require coding anything in python.

One use case is the "Wikipedia" circle in http://malaria.bibsoup.net/ .

@konrad
Copy link
Owner

konrad commented May 21, 2012

I removed download_examples.sh now as fetch-samples.sh does the job.

open-access-media-importer has some dependencies which might be a hurdle for some people. I think for our purpose it is fine to fetch the selected examples with wget. But we should refer to oa-get as tool for downloading other articles than used for our testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants