Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show information about the journal #10015

Merged
merged 44 commits into from
Aug 16, 2023
Merged

Conversation

aqurilla
Copy link
Contributor

@aqurilla aqurilla commented Jun 16, 2023

This fixes #6189 by adding a fetcher for journal information. Info buttons are added next to the Journal and ISSN fields in the entry editor, and show the information as a popover. An EnablementStatus enum is also added for generally maintaining state of online services in Preferences.

image

image

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@aqurilla
Copy link
Contributor Author

Following the suggestion here (#6189 (comment)), as a first step I've written a script that combines Scimagojr data across all the available years (1999-2022). The idea is to build a consolidated json and to include it with JabRef. The data source contains info on around 38k journals.

An issue here is the size of the consolidated JSON, which is ~115MB (compressed size of 21MB), when all the fields are included for all years. These fields include -

'Rank', 'Sourceid', 'Title', 'Type', 'Issn', 'SJR', 'H index', 
'Total Docs. (2020)', 'Total Docs. (3years)', 'Total Refs.', 'Total Cites (3years)', 
'Citable Docs. (3years)', 'Cites / Doc. (2years)', 'Ref. / Doc.', 'Country', 'Region', 
'Publisher', 'Coverage', 'Categories', and 'Areas'

Is this size acceptable or should json size be reduced further by
a) limiting the fields included and/or
b) limiting the years included?

@Siedlerchr
Copy link
Member

Hmm a possible solution could be to use a similar approach as the journal abbreviations, put them in a mv database.

@aqurilla
Copy link
Contributor Author

Thanks, I'll take a look at that

@Siedlerchr
Copy link
Member

@aqurilla some more context (was only on mobile earlier) We are regularly updating the journal abbreviations either manually or partially automated and then merge them together to store them in a mv store database.

https://www.h2database.com/html/mvstore.html
https://github.com/JabRef/abbrv.jabref.org

Another option could be as a first step to download the file from a GitHub repo

@calixtus
Copy link
Member

calixtus commented Jun 17, 2023

Is there no way to recieve the data about the journals just in time, maybe chaching them, as soon as it is recieved once? A progress indicatoor could be shown while a background process loads the journal information...

@aqurilla
Copy link
Contributor Author

aqurilla commented Jun 17, 2023

@Siedlerchr Thanks for the additional context - using an mv database would be a good option if we are going with the consolidated json approach. I expect it would need similar components i.e. creating something like the refresh-journal-lists.yml, the generateJournalAbbreviationList gradle task, and store the json in a github repo?

@calixtus thanks for bringing this up. I tried the Elsevier API again and it looks like one of their endpoints works well for our use case (https://dev.elsevier.com/documentation/SerialTitleAPI.wadl#d1e534). It allows search by ISSN and returns the latest data for several journal metrics. We could just call the endpoint when the user clicks the journal info button. The drawback in this approach is the API quota/limits.

What would be the best approach to follow?

@calixtus
Copy link
Member

I'm not happy with another repository distributed with every release of jabref. Adds another 30 mb to our installer. Lightweight is something different.

Maybe we can cache the journals as soon they are loaded from the API and the cache stored when closing jabref, so the API quota limit is not exceeded to soon.

@calixtus
Copy link
Member

Also please avoid adding the future repository to a preferences object.

@tobiasdiez
Copy link
Member

tobiasdiez commented Jun 18, 2023

I could host the journal information on jabref online. 100mb is nothing for a real DB. Has the disadvantage that the user needs to be online.

There is now also openalex, eg https://docs.openalex.org/api-entities/sources/get-a-single-source

@aqurilla
Copy link
Contributor Author

openalex looks great w.r.t having a very high daily API quota, so there is low chance of quota exhaustion. It does seem to only have yearly counts for number of works and number of citations though

@calixtus
Copy link
Member

Devcall

We had a short little discussion tonight in our maintainers devcall. We really like this little enhancement, but we also see some problems that may arise. If we distribute all the journal data with the jabref package on release, it will only be update approx twice a year. Also this feature may only affect / be useful only for a very small number of users, but we would add a database of 20 mb (?) to every release. This is something we are not very happy about.
I think we should follow tobias suggestion to host the journal data on a jabref server or to fetch the journal data from another source and caching them for about a week on the user pc.
If the journal data is only updated once a year, wouln't be some valuable data be ignored? The issue #6189 suggests to show the price range and the range? This would require to download the data on the fly.
We have to keep in mind, if we make this a jit data fetcher, this has to be included in our privacy rules and made opt-in for GDPR compliance.

@calixtus
Copy link
Member

@aqurilla thanks for your efforts already. Would be great if you could keep the concerns above in mind continuing your PR! ❤️

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

k3KAW8Pnf7mkmdSMPHz27 commented Jun 19, 2023

It does seem to only have yearly counts for number of works and number of citations though

@aqurilla just a quick question about this, do you have a link? I'd expect at least monthly based on

https://docs.openalex.org/api-entities/publishers/publisher-object

  • counts_by_year
  • cited_by_count
  • works_count

(perhaps you would need to combine the information to get the current years count, I haven't tried the API yet)


EDIT: Oh wait, you are talking about yearly vs monthly? Nvm.

@aqurilla
Copy link
Contributor Author

@calixtus thanks for sharing the devcall discussion details! Keeping all these points in mind, I think the best way would be to go ahead with a jit data fetcher using the Scopus API, and include caching on the user system. It looks like this would address all of the issues.

@k3KAW8Pnf7mkmdSMPHz27 sure! I am referring to the link that tobiasdiez shared (https://docs.openalex.org/api-entities/sources/get-a-single-source). For showing year-wise variation charts we only have the counts_by_year.works_count and counts_by_year.cited_by_count data

@tobiasdiez
Copy link
Member

I'll try to implement the API at jabrefonline later this week. This approach has also the advantage that we can easily extend it in the future and enrich it by data from other sources (eg openalex). For the moment I would say concentrate on the display of the data in jabref.

@aqurilla
Copy link
Contributor Author

@tobiasdiez sounds good!

@tobiasdiez
Copy link
Member

tobiasdiez commented Jun 26, 2023

I now have a first draft for the API. You can try it out by issuing a POST request to https://mango-pebble-0224c3803-2067.westeurope.1.azurestaticapps.net/api with

{
"query":"query GetJournalByIssn($issn: Int) {\n  journal(issn: $issn) {\n    id\n    name\n    issn\n    scimagoId\n    country\n    publisher\n    areas\n    categories\n    citationInfo {\n      year\n      docsThisYear\n      docsPrevious3Years\n      citableDocsPrevious3Years\n      citesOutgoing\n      citesOutgoingPerDoc\n      citesIncomingByRecentlyPublished\n      citesIncomingPerDocByRecentlyPublished\n      sjrIndex\n    }\n    hIndex\n  }\n}\n",
"variables":{"issn":15230864},
"operationName":"GetJournalByIssn"
}

This should give you a result of the type

{
  "data": {
    "journal": {
      "id": "ckslj3f10000f09jvc1xifgi9",
      "name": "Antioxidants & Redox Signaling",
      "issn": [
        15230864,
        15577716
      ],
      "scimagoId": 27514,
      "country": "United States",
      "publisher": "Mary Ann Liebert Inc.",
      "areas": [
        "Biochemistry, Genetics and Molecular Biology",
        "Medicine"
      ],
      "categories": [
        "Biochemistry (Q1)",
        "Cell Biology (Q1)",
        "Clinical Biochemistry (Q1)",
        "Medicine (miscellaneous) (Q1)",
        "Molecular Biology (Q1)",
        "Physiology (Q1)"
      ],
      "citationInfo": [
        {
          "year": 2022,
          "docsThisYear": 217,
          "docsPrevious3Years": 488,
          "citableDocsPrevious3Years": 487,
          "citesOutgoing": 19202,
          "citesOutgoingPerDoc": 130.63,
          "citesIncomingByRecentlyPublished": 3692,
          "citesIncomingPerDocByRecentlyPublished": 7.21,
          "sjrIndex": 1.706
        },
        {
          "year": 2021,
          "docsThisYear": 158,
          "docsPrevious3Years": 530,
          "citableDocsPrevious3Years": 530,
          "citesOutgoing": 24155,
          "citesOutgoingPerDoc": 152.88,
          "citesIncomingByRecentlyPublished": 4724,
          "citesIncomingPerDocByRecentlyPublished": 7.59,
          "sjrIndex": 1.832
        }
      ],
      "hIndex": 217
    }
  }
}

Do you think this data format is convenient or would you like to see some changes?

Currently, it only contains this particular test data. I'll later add all of the data from scimago. Let me know if you encounter any issues or questions.

EDIT: Something went wrong with the deployment of the database. Will try to have a look at this tomorrow.

@aqurilla
Copy link
Contributor Author

@tobiasdiez thanks I will check it out. The data format looks convenient to me 👍

@aqurilla aqurilla changed the title [WIP] Show information about the journal Show information about the journal Jul 18, 2023
@calixtus
Copy link
Member

Sorry I wasn't able to read your PR in detail earlier, this week was unexpectedly very busy and next week isn't much better too. Please hang on...

@aqurilla
Copy link
Contributor Author

@calixtus no worries, thanks for the update!

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it again and looked through the code. One nitpick. Nevertheless, I would vote for merging it to be able to gather more user feedback.

The only thing that worries me is that the popup shows up at another screen and is not resizable/movable here. -- I would have excpected that the popup shows up on the same screen as JabRef

grafik

@koppor
Copy link
Member

koppor commented Jul 26, 2023

Why is the script removed? Is it available at another repository? I would like to keep it somewhere "near" to enable switching to other endpoints.

@koppor
Copy link
Member

koppor commented Jul 26, 2023

Would it be ppossible to output the "INFO" only if there could not be found any information about a journal name?

WARN: Error while fetching journal information: ISSN and/or journal name not found in catalog: org.jabref.logic.importer.FetcherException: ISSN and/or journal name not found in catalog
        at [email protected]/org.jabref.logic.journals.JournalInformationFetcher.parseResponse(JournalInformationFetcher.java:129)
        at [email protected]/org.jabref.logic.journals.JournalInformationFetcher.getJournalInformation(JournalInformationFetcher.java:62)
        at [email protected]/org.jabref.gui.fieldeditors.journalinfo.JournalInfoViewModel.populateJournalInformation(JournalInfoViewModel.java:36)
        at [email protected]/org.jabref.gui.fieldeditors.journalinfo.JournalInfoView.populateJournalInformation(JournalInfoView.java:47)
        at [email protected]/org.jabref.gui.fieldeditors.PopOverUtil.lambda$showJournalInfo$0(PopOverUtil.java:40)
        at [email protected]/org.jabref.gui.util.BackgroundTask$1.call(BackgroundTask.java:60)
        at [email protected]/org.jabref.gui.util.DefaultTaskExecutor$1.call(DefaultTaskExecutor.java:161)
        at javafx.graphics@20/javafx.concurrent.Task$TaskCallable.call(Task.java:1426)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1623)
2023-07-27 00:18:22 [JavaFX Application Thread] org.jabref.gui.JabRefDialogService.notify()
INFO: Error while fetching journal information: ISSN and/or journal name not found in catalog

@tobiasdiez
Copy link
Member

I've removed the python script here and will add it to JabRef/JabRefOnline#2067. Also updated the api url to the production server. This needs JabRef/JabRefOnline#2067 to be merged, which I will hopefully do tomorrow or latest at the weekend.

tobiasdiez added a commit to JabRef/JabRefOnline that referenced this pull request Jul 28, 2023
For JabRef/jabref#10015.

Sample query:
```
query GetJournalByIssn($issn: String) {
  journal(issn: $issn) {
    id
    name
    issn
    scimagoId
    country
    publisher
    areas
    categories
    citationInfo {
      year
      docsThisYear
      docsPrevious3Years
      citableDocsPrevious3Years
      citesOutgoing
      citesOutgoingPerDoc
      citesIncomingByRecentlyPublished
      citesIncomingPerDocByRecentlyPublished
      sjrIndex
    }
    hIndex
  }
}
```
with `issn: 15230864`.

References:
- https://www.scimagojr.com/help.php
- Example:
https://www.scimagojr.com/journalsearch.php?q=27514&tip=sid&clean=0
- https://docs.openalex.org/api-entities/sources/source-object

---------

Co-authored-by: Nitin Suresh <[email protected]>
@tobiasdiez
Copy link
Member

I've now merged the PR in the jabref online repo, but the data ingestion is still ongoing (might take a few hours). You can already test it with issn 15454509.

@aqurilla aqurilla requested a review from koppor August 1, 2023 15:23
@aqurilla aqurilla requested a review from koppor August 10, 2023 03:15
Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get this in as soon as possible? It's a very nice new feature and it would be nice to get some feedback from dev build users before the next release (mainly because after the release I need to be careful with api changes to not break backwards compatibility).

@aqurilla
Copy link
Contributor Author

Hi, is there any further work required for this PR? It looks like the failing test is unrelated

@tobiasdiez tobiasdiez merged commit 17a215d into JabRef:main Aug 16, 2023
@tobiasdiez
Copy link
Member

I've merged this now as there were no further requests for changes. Thanks a lot @aqurilla for your nice work on this, and your patience!

@aqurilla
Copy link
Contributor Author

Thankyou! appreciate the extensive discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show information about the journal
7 participants