Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable programmatic full-text import from PMC into Wikisource #7

Open
Daniel-Mietchen opened this issue Feb 18, 2014 · 13 comments
Open

Comments

@Daniel-Mietchen
Copy link
Member

So that presence of article in Wikisource could be signaled in citation on Wikipedia

@Daniel-Mietchen
Copy link
Member Author

See also konrad/JATS-to-Mediawiki#3 .

@Klortho
Copy link
Member

Klortho commented Feb 20, 2014

Daniel, do you have anybody working on this? I could try to spend some time on it, if not.

@Daniel-Mietchen
Copy link
Member Author

The plan is that @wrought will start working on this, but I guess there will be occasions where you could be of help - we'll get back to you then. Thanks.

@notconfusing
Copy link
Member

I plan to start working on it too, but I am curious @Klortho do you have ideas or an outline about the best way to proceed? As in, what strategy to use, both technical and non-technical?

@Klortho
Copy link
Member

Klortho commented Feb 27, 2014

Hi, Max,
Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I think that XSLT is 90% of the way there. It needs a driver script, which could be written in anything. It should use the PMC OA web service to discover new and changed articles in the OA subset.

@Daniel-Mietchen
Copy link
Member Author

We have such a driver script at
https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids.
However, the import to Wikisource is to be triggered by citations on
Wikipedia, so the focus will be less on discovering new articles.

http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/
https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications
http://okfn.org
http://wikimedia.org

On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:

Hi, Max,
Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I
think that XSLT is 90% of the way there. It needs a driver script, which
could be written in anything. It should use the PMC OA web servicehttps://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/to discover new and changed articles in the OA subset.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-36213408
.

@notconfusing
Copy link
Member

A simple way to watch for new citations to trigger the driver script is to
watch (i.e. poll at an interval) the "what links here" transclusions of the
citation template. Is there a better way than constantly polling for
transclusions at an interval?

On Thu, Feb 27, 2014 at 1:02 AM, Daniel Mietchen
[email protected]:

We have such a driver script at

https://github.com/erlehmann/open-access-media-importer/blob/master/oa-pmc-ids.

However, the import to Wikisource is to be triggered by citations on
Wikipedia, so the focus will be less on discovering new articles.

http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-daniel/
https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications
http://okfn.org
http://wikimedia.org

On Thu, Feb 27, 2014 at 7:09 AM, Chris Maloney [email protected]:

Hi, Max,
Daniel linked to JATS-to-Mediawiki above, and that's where I'd start. I
think that XSLT is 90% of the way there. It needs a driver script, which
could be written in anything. It should use the PMC OA web service<
https://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/>to discover new and
changed articles in the OA subset.

Reply to this email directly or view it on GitHub<
https://github.com/Daniel-Mietchen/OA-signalling/issues/7#issuecomment-36213408>

.

Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-36222436
.

@Daniel-Mietchen
Copy link
Member Author

Alternative options include dumps, or Recent changes feeds - both would
seem to me better than constant polling.

Plus, we probably want to wait a week or so for a citation to consolidate,
so as not to become a toy for spammers.

@Klortho
Copy link
Member

Klortho commented Mar 10, 2014

Plus, we probably want to wait a week or so for a citation to consolidate, so as not to become a toy for spammers.

I don't see the problem here, since we're only talking about selecting which PMC articles to import to WikiSource, right? Wouldn't it be reasonable to assume, that once an article is in PMC, that it's at least eligible to be imported into WikiSource? How bad a problem would "false positives" be?

@Daniel-Mietchen
Copy link
Member Author

The thing is that there is no clear policy around that, so any automated tool would have to err on the side of caution. In any case, I think we shall start with the most cited articles - an article-level version of https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Popular1 .

@Daniel-Mietchen
Copy link
Member Author

Related issues: #9 and #37 .

@notconfusing notconfusing added this to the Phase 1 - Wikisource & Selected Articles milestone May 5, 2014
@wrought wrought changed the title Automate full-text import from PMC into Wikisource Enable programmatic full-text import from PMC into Wikisource May 8, 2014
@wrought
Copy link
Member

wrought commented May 8, 2014

Updated title to reflect that the goal is to do this programmatically for quality, with community control at heart, not just automatically. ;)

@notconfusing
Copy link
Member

All the technology at the moment is fully in place to do this, only we are waiting on the Wikisource community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants