-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 120 fix #128
Issue 120 fix #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of each entry in the CSV file is expected to be ;[;[;]].
The []
in the syntax denotes that these are optional -- and only be present if values are present.
A;B
or A;B;C
or A;B;C;D
, not A;B;
or A;B;;
. Thus, I don't understand that there is ;;
.
Based on our discussion, I removed in c422d85 the field frequency
.
Thus,
Either A;B
or A;B;C
should be the content format.
Can you please update the PR so that no final ;
is present?
For instance, following existing entry
Vogelwarte, Die;Vogelwarte
is better than the new entry
Vogelwarte, Die;Vogelwarte;;
If any frequency is existing, it can just be removed! |
@@ -28,1779 +26,756 @@ ACM Transactions on Knowledge Discovery from Data;ACM Trans. Knowl. Discovery Da | |||
ACM Transactions on Management Information Systems;ACM Trans. Manage. Inf. Syst.;; | |||
ACM Transactions on Mathematical Software;ACM Trans. Math. Software;; | |||
ACM Transactions on Multimedia Computing Communications and Applications;ACM Trans. Multimedia Comput. Commun. Appl.;; | |||
ACM Transactions on Parallel Computing;ACM Trans. Parallel Comput.;; | |||
ACM Transactions on Programming Languages and Systems;ACM Trans. Program. Lang. Syst.;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes as you had mentioned I will make the script criteria a bit tighter where I will check only for the Title and if the Title is common i will remove the entry from the CSV file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; lets check, how the result looks like.
You could output the entries, where in the general list the abbreviation is shorter than in the other ones. (but still proceed with removing - I would like to see the general list being very small - maybe, we can even delete it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue was basically due to the case disparity of the Title. I will handle that and raise a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has been handled
I modified the script to check for any entries with the frequency field. I can confirm there do not exist any entries with the frequency field set. |
The modified script handles only based on the Title column and the condition checked will be case insensitive. The entries have been reduced to 1891 lines and as discussed the entry will have only 3 columns where frequency column has not been considered. |
Thank you for working on this. A good next step. |
Solves the issue #120
The script reads all the csv files other than the file "journal_abbreviations_general" and if there is any entry in the rest of the file that is present in "journal_abbreviations_general" will be removed.
The script was executed once and the resultant "journal_abbreviations_general" file has replaced the older version with duplicate entries.
The format of each entry in the CSV file is expected to be ;[;[;]]. However no data have all these fields set and based on how they are set the entries in the CSV file that were handled during the script development process were of three types which are as below
Around 80% entries follow the second format, 18% follow the first format and 2% follows last format. The third format need not be considered as it is consistent but when the last 2 fields are not set we needed to decide which format to choose. To streamline the same, the output generated by the script will be of the first format { The one that ends with ";;" -- Can be changed based on discussion}.