Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loose ends on the python script #39

Open
Klortho opened this issue Sep 7, 2014 · 1 comment
Open

Loose ends on the python script #39

Klortho opened this issue Sep 7, 2014 · 1 comment

Comments

@Klortho
Copy link
Member

Klortho commented Sep 7, 2014

The python script seems not to be quite finished. I don't think the tmpdir, infile, or outfile command line options are implemented.

The infile option would be really nice, because it would mean that I could make changes to the XSLT, and the reconvert the same article again without having to download it. Or, I could make changes to the input XML to test things, and reconvert it directly.

I'm not sure about tmpdir. It might be better to have the script always download and extract into a directly called articles.

I do not know python very well, but it seems pretty easy to hack, so maybe I could do this as a learning exercise.

@wrought
Copy link
Member

wrought commented Sep 14, 2014

The output option is there as a placeholder for running the script as a system for streaming converted text, to std.out (the default) or to a file. Indeed it is not implemented, and currently the output is saved to a .mw.xml file for simplicity. Doesn't seem like we need to change this, but we could comment it out or remove it to be more clear.

The infile option is implemented and it works! However, the script expects DOIs or PMIDs as inputs, so an infile is a list of DOIs or PMIDs. If you want to reconvert the same article again without having to download it, or to make changes to the XML and reconvert directly, instead you should simply call xsltproc as usual:

xsltproc jats-to-mediawiki.xsl $FILENAME.nxml > $FILENAME.mw.xml

On the other hand, we should definitely change the script to check if the file has already been downloaded, and if so, skip downloading it. That should save network time generally.

As for tmpdir you're right, this was an oversight and causes a bit of a mess. I just fixed it and put everything into an articles directory as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants