Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Equality check for Journal abbreviations #8948

Closed
Siedlerchr opened this issue Jul 3, 2022 · 16 comments · Fixed by #9288
Closed

Improve Equality check for Journal abbreviations #8948

Siedlerchr opened this issue Jul 3, 2022 · 16 comments · Fixed by #9288

Comments

@Siedlerchr
Copy link
Member

I have a question regarding the use of ampersands &, I often come across \& in my .bib files when an ampersand is used, JabRef doesn't seem to recognize that journaltitle if the name is not exactly the same. Do we need to add entries for all occurences of & and \& and and as well?

Originally posted by @jhossbach in JabRef/abbrv.jabref.org#100 (comment)

@koppor
Copy link
Member

koppor commented Jul 4, 2022

  • Abbreviations should always be without \& --> There is a CI check "necessary" at the abbrv repo
  • JabRef should be able to handle both & and \& when abbrevating and unabbreviating
  • When putting a result, \& should be used

Background:

  • LaTeX requires & to be escaped.

@AkshatJain9
Copy link
Contributor

Hi,

I am with a group of University Students looking to close this issue for a University assignment. Could I confirm that no one is actively working on this and possible be assigned. Thanks!

@ThiloteE
Copy link
Member

ThiloteE commented Oct 6, 2022

Welcome and thanks for your interest :-)

As a general advice for newcomers: Check out https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md for a start. Also, https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace is worth having a look at. Feel free to ask if you have any questions here on GitHub or also at JabRef's Gitter chat.

Try to open a (draft) pull request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

@ThiloteE ThiloteE moved this from Free to take to Reserved in Candidates for University Projects Oct 9, 2022
@AkshatJain9
Copy link
Contributor

AkshatJain9 commented Oct 11, 2022

Abbreviations should always be without & --> There is a CI check "necessary" at the abbrv repo
JabRef should be able to handle both & and & when abbrevating and unabbreviating

Would you be able to clarify by what you mean by these first two points? I am currently working on the issue and have so far gathered that whenever we write an &, it should appear as an \& in the bib file (unless we are in a url{...} command). Just a little confused on what you are proposing with the first two points. Thanks!

@AkshatJain9
Copy link
Contributor

Looking into this I have implemented a basic translation from & -> & which looks like the following. Notice that the & in the Journal field is parsed as a & in the BibTeX as required;
image
image

My only question at this stage is; we have the reading parsing working as intended (that is, in line with BibTeX standards), so should I add parsing for other escaped characters, e.g. \ for \ etc.

Also, I will link my commit, but right now this logic is placed in BibWriter.java directly, should this be moved somewhere else?

@Siedlerchr
Copy link
Member Author

@AkshatJain9 The idea is only one part of the solution. But this does not belong to the BIbWriter. This belongs somewhere in the Journal Abbreviations formatter itself. We already have EscapeAmpersandsFormatter you can use.

@AkshatJain9
Copy link
Contributor

I should have clarified, my teammate has already worked on the mechanisms for reading from the abbreviations repo and treating & and & as equal, my job is to just make sure & is written correctly in the BibTeX.

I saw EscapeAmpersandsFormatter but struggled to understand where it was being used currently, seems like its functionality is nested in a lot of other classes which made it a little difficult to reason about. Do you have any guidance?

@koppor
Copy link
Member

koppor commented Oct 14, 2022

Please do NOT change the BibTeX reading and writing! JabRef tries to keep the .bib file as is.

We have the formatters in place, which can be configured by the user at library properties - and also at Quality -> cleanup entries. See https://docs.jabref.org/finding-sorting-and-cleaning-entries/cleanupentries for details.

The ideas were following:

  1. Journal abbreviations have the ampersand stored unescaped
  2. In the BibEntry, the ampersand can be stored escaped or unescaped
  3. When the user wants to write the field "correctly", they should configure saving actions

For idea 1:

The issue for that is JabRef/abbrv.jabref.org#107

For idea 2:

  • There is a latex free field - org.jabref.model.entry.BibEntry#getResolvedFieldOrAliasLatexFree
  • This could have the ampersand stored unescaped

For idea 3:

@koppor
Copy link
Member

koppor commented Oct 14, 2022

@AkshatJain9 Maybe a good start would be to work on JabRef#585. This is a very focussed issue.

@AkshatJain9
Copy link
Contributor

Thanks, I'll have a look!

@ANUu7312578
Copy link
Contributor

Hi! I'm working on this project with @AkshatJain9 and was hoping to clarify my understanding of the issue. In the point about "JabRef should be able to handle both & and \& when abbrevating and unabbreviating" mentioned in #8948 (comment), is this referring to the ability for a journal title to be abbreviated even when the & is escaped?

For example, given that "ACS Applied Materials & Interfaces" can be abbreviated as "ACS Appl. Mater. Interfaces", is the idea that "ACS Applied Materials \& Interfaces" should also be able to be abbreviated as "ACS Appl. Mater. Interfaces"?

@Siedlerchr
Copy link
Member Author

Siedlerchr commented Oct 15, 2022

@AkshatJain9 Yes. As you can see JabRef stores the journal names and abbreviations together in a database (created fromt the csv files) and then does a lookup for the journal name and the abbreviation.

  1. Unescaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces
  2. Escaped: ACS Applied Materials\&Interfaces -> ACS Appl. Mater. Interfaces

For the other direction, the unnabbrev:
ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

@ANUu7312578
Copy link
Contributor

Hi, was just wondering if this comment (#8948 (comment)) was actually replying to my question? Additionally, would I be able to confirm that in the unnabbreviation direction, the & is always unescaped?

@AkshatJain9 Yes. As you can see JabRef stores the journal names and abbreviations together in a database (created fromt the csv files) and then does a lookup for the journal name and the abbreviation.

  1. Unescaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces
  2. Escaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces

For the other direction, the unnabbrev: ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

@Siedlerchr
Copy link
Member Author

@ANUu7312578 Yes, sorry, tagged the wrong handle ;)

Yes, I would say for the un-abbreviate we should always use the unescaped variant.
e.g.
ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

@ANUu7312578
Copy link
Contributor

Hi! I opened a draft pull request which attempts to fix the 2nd idea out of the three ideas present in the original issue. Namely, the "JabRef should be able to handle both & and \& when abbrevating and unabbreviating". This is because the first idea ("Abbreviations should always be without \& --> There is a CI check "necessary" at the abbrv repo") is being implemented in the repo which stores all the abbreviations and the 3rd idea ("When putting a result, \& should be used") seems to be already implemented but just requires the user to enable the option.

Would it be possible for me to get some feedback on the pull request?

@koppor
Copy link
Member

koppor commented Oct 24, 2022

DevCall label was set because of discussing default save actions of a newly created library.

We decided against it in today's DevCall, because the database is changed "magically". We like the "integrity check" more: https://docs.jabref.org/finding-sorting-and-cleaning-entries/checkintegrity

The next step for the integrity check should be:

  • More checks
  • Remember violations which are OK
  • Show violation count after save ("This library has 5 quality issues, please open the integrity check for a complete list").

Repository owner moved this from Reserved to Done in Candidates for University Projects Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants