Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File extension missing #68

Closed
Daniel-Mietchen opened this issue May 17, 2014 · 7 comments
Closed

File extension missing #68

Daniel-Mietchen opened this issue May 17, 2014 · 7 comments

Comments

@Daniel-Mietchen
Copy link
Member

The media files to be embedded should have the correct file name suffix by default, so that edits like
https://en.wikisource.org/w/index.php?title=Biodiversity_Assessment_of_the_Fishes_of_Saba_Bank_Atoll%2C_Netherlands_Antilles&diff=4736919&oldid=4736916
are not necessary any more.

That would be a precursor to #8.

@notconfusing
Copy link
Member

@Klortho @wrought is it possible to do this in xsl transform? We currently do not modify the output of the transform, so in order to add filenames we would have to start treating the xsl output in python. That's fine if that's the only way, but it does not make the xsl conversion self-contained.

It's is as simple as just append.jpg, or .png, only that one has to check for the existence of each one, since images are not gauranteed to come in any specific format.

@wrought wrought changed the title File name suffix missing File extension missing May 18, 2014
@wrought
Copy link
Member

wrought commented May 18, 2014

Here's a sample figure in the .nxml file:

<fig id="F1" orientation="portrait" position="float"><label>Figure 1.</label><caption><p>Phylogeny of the genus <italic><named-content content-type="taxon-name">Bassaricyon</named-content></italic>. Phylogeny generated from the concatenated <italic>CHRNA1</italic> and cytochrome <italic>b</italic> sequences. All analyses consistently recovered the same relationships with high support. Divergence dating was generated in BEAST; bars show the 95% confidence interval at each node. Branches without support are collapsed and outgroup clades have been collapsed, leaving monophyletic groupings with 100% support. Data for <italic>CHRNA1</italic> are missing for <italic><named-content content-type="taxon-name">Bassaricyon gabbii</named-content></italic>, for which DNA was extracted from a museum skull. All nodes in <italic><named-content content-type="taxon-name">Bassaricyon</named-content></italic> have 1.00 Bayesian posterior probability, except the split between <italic><named-content content-type="taxon-name">Bassaricyon gabbii</named-content></italic> and <italic><named-content content-type="taxon-name">Bassaricyon alleni</named-content></italic>/<italic><named-content content-type="taxon-name">Bassaricyon medius</named-content></italic> (0.97 Bayesian posterior probability). Non-focal and outgroup taxa are shaded in gray, <italic><named-content content-type="taxon-name">Bassaricyon</named-content></italic> species and subspecies are color coded, samples of <italic><named-content content-type="taxon-name">Bassaricyon medius medius</named-content></italic> and <italic><named-content content-type="taxon-name">Bassaricyon neblina neblina</named-content></italic> that were collected within 5 km of each other in Ecuador are shaded.</p></caption><graphic xlink:href="ZooKeys-324-001-g001"/></fig>

Most importantly, the <graphic> element:

<graphic xlink:href="ZooKeys-324-001-g001"/>

As you can see, there is no extension stored in this element for this version of the article. However, image files are provided with the rest of the tarball where we find the .nxml file. So, we have some options:

  1. Find if there is a standard convention and use the jats-to-mediawiki xslt to append .jpg or w/e the standard may be.
  2. Implement post-processing of the article xml or the converted mediawiki-markup generated from the xslt using python to check the tarball files for the correct corresponding file extension (e.g. .jpg, .jpeg or .png)
    • Currently we don't handle any post-processing of the text, so this would be a start, which already breaks convention.
  3. Use whatever OAMI does to guess/assign file extensions, or subvert the filenames and file extensions that come with the article xml to use whatever OAMI uses.
  4. Update manually

@Daniel-Mietchen
Copy link
Member Author

I think we should go with OAMI here.

@Klortho
Copy link
Member

Klortho commented May 28, 2014

I agree with that.

@erlehmann
Copy link

Can someone elaborate on why file extensions are needed on MediaWiki?

@erlehmann
Copy link

From OAMI source code:

        #TODO: file extension should be adapted for other file formats
        url_path = urlparse.urlsplit(material.url).path
        source_filename = url_path.split('/')[-1]
        assert(mimetype in ('audio', 'video'))
        if mimetype == 'audio':
            extension = 'oga'
        elif mimetype == 'video':
            extension = 'ogv'
        wiki_filename = path.splitext(source_filename)[0] + '.' + extension

@notconfusing
Copy link
Member

Ok, I've updated our code to search only for jpg and pngs per advice above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants