Issue 120 fix #128

sreenath-tm · 2023-04-07T07:37:12Z

Solves the issue #120
The script reads all the csv files other than the file "journal_abbreviations_general" and if there is any entry in the rest of the file that is present in "journal_abbreviations_general" will be removed.

The script was executed once and the resultant "journal_abbreviations_general" file has replaced the older version with duplicate entries.
The format of each entry in the CSV file is expected to be ;[;[;]]. However no data have all these fields set and based on how they are set the entries in the CSV file that were handled during the script development process were of three types which are as below

Advances in Chemistry Series;Adv. Chem. Ser.;; [Last 2 fields are not there still they have the symbol ";;" to signify those fields are empty]
ACS Applied Nano Materials;ACS Appl. Nano Mater. [ Last 2 fields are not there and they do not have the symbol ";;" to signify those fields are empty]
Advances in Cyclic Nucleotide Research;Adv. Cycl. Nucl. Res<d>.[Only the Last field is not set and signified by a ";"]

Around 80% entries follow the second format, 18% follow the first format and 2% follows last format. The third format need not be considered as it is consistent but when the last 2 fields are not set we needed to decide which format to choose. To streamline the same, the output generated by the script will be of the first format { The one that ends with ";;" -- Can be changed based on discussion}.

koppor

The format of each entry in the CSV file is expected to be ;[;[;]].

The [] in the syntax denotes that these are optional -- and only be present if values are present.

A;B or A;B;C or A;B;C;D, not A;B; or A;B;;. Thus, I don't understand that there is ;;.

Based on our discussion, I removed in c422d85 the field frequency.

Thus,

Either A;B or A;B;C should be the content format.

Can you please update the PR so that no final ; is present?

For instance, following existing entry

Vogelwarte, Die;Vogelwarte

is better than the new entry

Vogelwarte, Die;Vogelwarte;;

koppor · 2023-04-07T13:38:44Z

If any frequency is existing, it can just be removed!

koppor · 2023-04-07T13:42:17Z

journals/journal_abbreviations_general.csv

@@ -28,1779 +26,756 @@ ACM Transactions on Knowledge Discovery from Data;ACM Trans. Knowl. Discovery Da
 ACM Transactions on Management Information Systems;ACM Trans. Manage. Inf. Syst.;;
 ACM Transactions on Mathematical Software;ACM Trans. Math. Software;;
 ACM Transactions on Multimedia Computing Communications and Applications;ACM Trans. Multimedia Comput. Commun. Appl.;;
-ACM Transactions on Parallel Computing;ACM Trans. Parallel Comput.;;
 ACM Transactions on Programming Languages and Systems;ACM Trans. Program. Lang. Syst.;;


I think, your script does not read journal_abbreviations_webofscience-dots.csv - or keeps different abbreviations.

Could you update the script to be more aggressive?

If the journal name appears in another list, remove it from the journal_abbreviations_general.csv.

Yes as you had mentioned I will make the script criteria a bit tighter where I will check only for the Title and if the Title is common i will remove the entry from the CSV file.

Sounds good; lets check, how the result looks like.

You could output the entries, where in the general list the abbreviation is shorter than in the other ones. (but still proceed with removing - I would like to see the general list being very small - maybe, we can even delete it)

This issue was basically due to the case disparity of the Title. I will handle that and raise a new PR.

This issue has been handled

sreenath-tm · 2023-04-07T14:59:34Z

If any frequency is existing, it can just be removed!

I modified the script to check for any entries with the frequency field. I can confirm there do not exist any entries with the frequency field set.

sreenath-tm · 2023-04-09T16:13:51Z

The modified script handles only based on the Title column and the condition checked will be case insensitive. The entries have been reduced to 1891 lines and as discussed the entry will have only 3 columns where frequency column has not been considered.

koppor · 2023-04-12T11:16:36Z

Thank you for working on this. A good next step.

sreenath-tm added 2 commits April 7, 2023 10:44

Added the deletion script and the updated version of general file

06012c9

Adding the missed journal

43b0fc3

koppor requested changes Apr 7, 2023

View reviewed changes

koppor reviewed Apr 7, 2023

View reviewed changes

Case insenstivite checking only based on title

60d9e3e

koppor approved these changes Apr 12, 2023

View reviewed changes

koppor merged commit 885db2c into JabRef:main Apr 12, 2023

koppor mentioned this pull request Apr 12, 2023

Remove duplicates from general list #120

Closed

koppor mentioned this pull request Apr 12, 2023

Improve journal list handling JabRef/jabref-koppor#48

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 120 fix #128

Issue 120 fix #128

sreenath-tm commented Apr 7, 2023 •

edited

Loading

koppor left a comment

koppor commented Apr 7, 2023

koppor Apr 7, 2023

sreenath-tm Apr 7, 2023

koppor Apr 7, 2023

sreenath-tm Apr 9, 2023

sreenath-tm Apr 11, 2023

sreenath-tm commented Apr 7, 2023

sreenath-tm commented Apr 9, 2023

koppor commented Apr 12, 2023

Issue 120 fix #128

Issue 120 fix #128

Conversation

sreenath-tm commented Apr 7, 2023 • edited Loading

koppor left a comment

Choose a reason for hiding this comment

koppor commented Apr 7, 2023

koppor Apr 7, 2023

Choose a reason for hiding this comment

sreenath-tm Apr 7, 2023

Choose a reason for hiding this comment

koppor Apr 7, 2023

Choose a reason for hiding this comment

sreenath-tm Apr 9, 2023

Choose a reason for hiding this comment

sreenath-tm Apr 11, 2023

Choose a reason for hiding this comment

sreenath-tm commented Apr 7, 2023

sreenath-tm commented Apr 9, 2023

koppor commented Apr 12, 2023

sreenath-tm commented Apr 7, 2023 •

edited

Loading