Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 120 fix #128

Merged
merged 3 commits into from
Apr 12, 2023
Merged

Issue 120 fix #128

merged 3 commits into from
Apr 12, 2023

Conversation

sreenath-tm
Copy link
Contributor

@sreenath-tm sreenath-tm commented Apr 7, 2023

Solves the issue #120
The script reads all the csv files other than the file "journal_abbreviations_general" and if there is any entry in the rest of the file that is present in "journal_abbreviations_general" will be removed.

The script was executed once and the resultant "journal_abbreviations_general" file has replaced the older version with duplicate entries.
The format of each entry in the CSV file is expected to be ;[;[;]]. However no data have all these fields set and based on how they are set the entries in the CSV file that were handled during the script development process were of three types which are as below

  • Advances in Chemistry Series;Adv. Chem. Ser.;; [Last 2 fields are not there still they have the symbol ";;" to signify those fields are empty]
  • ACS Applied Nano Materials;ACS Appl. Nano Mater. [ Last 2 fields are not there and they do not have the symbol ";;" to signify those fields are empty]
  • Advances in Cyclic Nucleotide Research;Adv. Cycl. Nucl. Res<d>.[Only the Last field is not set and signified by a ";"]

Around 80% entries follow the second format, 18% follow the first format and 2% follows last format. The third format need not be considered as it is consistent but when the last 2 fields are not set we needed to decide which format to choose. To streamline the same, the output generated by the script will be of the first format { The one that ends with ";;" -- Can be changed based on discussion}.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of each entry in the CSV file is expected to be ;[;[;]].

The [] in the syntax denotes that these are optional -- and only be present if values are present.

A;B or A;B;C or A;B;C;D, not A;B; or A;B;;. Thus, I don't understand that there is ;;.

Based on our discussion, I removed in c422d85 the field frequency.

Thus,

Either A;B or A;B;C should be the content format.

Can you please update the PR so that no final ; is present?

For instance, following existing entry

Vogelwarte, Die;Vogelwarte

is better than the new entry

Vogelwarte, Die;Vogelwarte;;

@koppor
Copy link
Member

koppor commented Apr 7, 2023

If any frequency is existing, it can just be removed!

@@ -28,1779 +26,756 @@ ACM Transactions on Knowledge Discovery from Data;ACM Trans. Knowl. Discovery Da
ACM Transactions on Management Information Systems;ACM Trans. Manage. Inf. Syst.;;
ACM Transactions on Mathematical Software;ACM Trans. Math. Software;;
ACM Transactions on Multimedia Computing Communications and Applications;ACM Trans. Multimedia Comput. Commun. Appl.;;
ACM Transactions on Parallel Computing;ACM Trans. Parallel Comput.;;
ACM Transactions on Programming Languages and Systems;ACM Trans. Program. Lang. Syst.;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, your script does not read journal_abbreviations_webofscience-dots.csv - or keeps different abbreviations.

image

Could you update the script to be more aggressive?

If the journal name appears in another list, remove it from the journal_abbreviations_general.csv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as you had mentioned I will make the script criteria a bit tighter where I will check only for the Title and if the Title is common i will remove the entry from the CSV file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good; lets check, how the result looks like.

You could output the entries, where in the general list the abbreviation is shorter than in the other ones. (but still proceed with removing - I would like to see the general list being very small - maybe, we can even delete it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was basically due to the case disparity of the Title. I will handle that and raise a new PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has been handled

@sreenath-tm
Copy link
Contributor Author

If any frequency is existing, it can just be removed!

I modified the script to check for any entries with the frequency field. I can confirm there do not exist any entries with the frequency field set.

@sreenath-tm
Copy link
Contributor Author

The modified script handles only based on the Title column and the condition checked will be case insensitive. The entries have been reduced to 1891 lines and as discussed the entry will have only 3 columns where frequency column has not been considered.

@koppor koppor merged commit 885db2c into JabRef:main Apr 12, 2023
@koppor
Copy link
Member

koppor commented Apr 12, 2023

Thank you for working on this. A good next step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants