Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better OpenAlex support #1316

Merged
merged 4 commits into from
Oct 14, 2024
Merged

Better OpenAlex support #1316

merged 4 commits into from
Oct 14, 2024

Conversation

ewan-escience
Copy link
Collaborator

Better OpenAlex support

Changes proposed in this pull request

  • Allow for finding mentions by OpenAlex ID
  • Also search in the OpenAlex database when searching for mentions by title
  • Expand the mentions scraper to also include mentions with an OpenAlex ID (and no DOI)
  • Updated the respective docs

How to test

  • docker compose down --volumes && docker compose build --parallel && docker compose up --scale data-generation=0
  • Add mentions with a DOI
  • Add mentions with an OpenAlex ID but no DOI (see e.g.https://research-software-directory.org/api/v1/mention?external_id=not.is.null&doi=is.null&select=title,doi,external_id&limit=100)
  • Add mentions by searching for a title, OpenAlex results should also show up
  • Add these mentions in various places (reference papers for software, other mentions for software, project impact, project output)
  • Afterwards, run the mentions scraper, no errors should be reported:
  • docker compose exec scrapers java -cp /usr/myjava/scrapers.jar nl.esciencecenter.rsd.scraper.doi.MainMentions

Closes #1312

PR Checklist:

  • Increase version numbers in docker-compose.yml
  • Link to a GitHub issue
  • Update documentation
  • Tests

Copy link
Contributor

@dmijatovic dmijatovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I left few suggestions.

@@ -49,10 +49,14 @@ export default function MentionsOverview() {

const searchTypeTerm: SearchTermInfo = extractSearchTerm(sanitisedSearch)
const termEscaped = encodeURIComponent(sanitisedSearch)
if (searchTypeTerm.type === 'doi') {
return `doi=eq.${termEscaped}`
switch (searchTypeTerm.type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use default instead of title?

Comment on lines 133 to 160
{isAdmin &&
<>
<ControlledTextField
control={control}
options={{
name: 'doi',
label: config.doi.label,
useNull: true,
defaultValue: formData?.doi,
helperTextMessage: config.doi.help,
helperTextCnt: `${formData?.doi?.length || 0}/${config.doi.validation.maxLength.value}`,
}}
rules={config.doi.validation}
/>
<div className="py-2"></div>
</>
<>
<ControlledTextField
control={control}
options={{
name: 'doi',
label: config.doi.label,
useNull: true,
defaultValue: formData?.doi,
helperTextMessage: config.doi.help,
helperTextCnt: `${formData?.doi?.length || 0}/${config.doi.validation.maxLength.value}`,
}}
rules={config.doi.validation}
/>
<div className="py-2"></div>
<ControlledTextField
control={control}
options={{
name: 'openalex_id',
label: config.openalex_id.label,
useNull: true,
defaultValue: formData?.openalex_id,
helperTextMessage: config.openalex_id.help,
}}
rules={config.openalex_id.validation}
/>
<div className="py-2"></div>
</>
Copy link
Contributor

@dmijatovic dmijatovic Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  1. DOI / OpenAlex look different (DOI does not have help and have char count). Can we use same approach with OpenAlexId input?
  2. I think is better to have these ID's at the bottom. This modal is used for manual adding of mentions. The main assumption is that these do not have DOI or OpenAlexID (otherwise you can add them automatically using search).

image

Comment on lines -15 to +17
We search in <strong> <a href="https://crossref.org" target="_blank">Crossref</a>, <a href="https://datacite.org" target="_blank">DataCite</a></strong> and the RSD.
We search in <strong> <a href="https://www.crossref.org/" target="_blank">Crossref</a>, <a href="https://datacite.org" target="_blank">DataCite</a>, <a href="https://openalex.org/" target="_blank">OpenAlex</a></strong> and the RSD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both work with or without www.

@@ -39,35 +40,57 @@ export default function FindMentionSection({id,config,findPublicationByTitle}:Fi
const {session: {token}} = useAuth()
const {onAdd} = useEditMentionReducer()

async function findPublication(searchFor: string) {
async function findPublication(searchFor: string): Promise<MentionItemProps[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer return type to be inferred by Typescript based on the function returns and not "forced" in the function signature.

}
return []
}
case 'title': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion as with other case structure. I would suggest using default case.

Comment on lines +51 to 69
switch (search.type) {
case 'doi': {
// convert to lower case
const doi = search.term.toLowerCase()
// validate if not already included
const found = mentions.find(mention => mention.doi?.toLowerCase() === doi)
if (found) {
// flag item with DOI already processed
mentionResultPerDoi.set(doi, {doi, status: 'alreadyImported', include: false})
return false
}
return true
}
return true
} else {
// flag invalid DOI entries
mentionResultPerDoi.set(search.term, {doi:search.term, status: 'invalidDoi', include: false})
return false
case 'openalex':
case 'title':
// flag invalid DOI entries
mentionResultPerDoi.set(search.term, {doi: search.term, status: 'invalidDoi', include: false})
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use default case or if/else as it seems only 2 options exist

@jmaassen
Copy link
Member

Works as expected. There are a few small inconsistencies now:

  • When searching for a publication by doi, the doi itself (without the http bit) is enough. This is not true for an OpenAlex ID?
  • You can provide a list of DOIs for bulk import, but not a list of openalex IDs, or a mix

Copy link

Quality Gate Failed Quality Gate failed for 'scrapers'

Failed conditions
14.1% Duplication on New Code (required ≤ 3%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link

@ewan-escience ewan-escience merged commit 70833bb into main Oct 14, 2024
4 of 5 checks passed
@ewan-escience ewan-escience deleted the 1312-openalex-support branch October 21, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better OpenAlex support
3 participants