-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strengthening an existing plugin “Non-ASCII Equivalents” #387
Conversation
I added 41 new characters to the existing “Non-ASCII Equivalents” plugin: 18 Polish diacritics 23 "My others" special characters that were junk in the ID3 tag of my MP3 collection. Co-Authored-By: Anderson Mesquita <[email protected]>
Co-Authored-By: Anderson Mesquita <[email protected]>
Co-Authored-By: Anderson Mesquita <[email protected]>
I don't know how to fix the Codacy warning. I don't want to break the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your changes. In general the additions make sense, but I have two concerns here:
-
I think this plugin would be more useful as a separate plugin if it would provide a scripting function instead of just doing the replacement in 3 arbitrary tags. See my comment below.
Otherwise the extended replacements should become part of the existing plugin. I don't see why we need an essentially identical plugin in the plugin list, this just causes confusion. -
The simple accented characters that just get replaced with their unaccented base character should be removed from the static replacements. Instead Picard's built-in
unaccent
function should be run on the string. This will take care of all the accented cases here and all those that are probably missing.
return char | ||
|
||
|
||
def ascii(word): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ascii
is a built-in Python function, and the custom function is hiding that. Hence the warning. I'd recommend renaming this function to something else (replace_ascii_equivalents
?)
"・": "-", | ||
"☆": "-", | ||
"★": "-", | ||
"/": ",", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the four chars below don't seem to belong there. Those are all valid ASCII characters and I don't see why those should be unwanted in tags.
For Windows filenames they are not allowed, but those characters get replaced anyway by Picard in filenames and there are existing settings for those.
"album", | ||
"artist", | ||
"title", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this limitation to those three tags is one of the weaknesses of the current plugin. This seems very arbitrary and limiting.
What I think would be much more useful is if the plugin would register a scripting function, see https://picard-docs.musicbrainz.org/en/appendices/plugins_api.html#tagger-script-functions . If there was a scripting function $ascii(...)
or such (maybe someone has a better name) then users would be flexible to replace characters in whatever tag they want:
$set(composer,$ascii(%composer%))
"→": "-", | ||
"・": "-", | ||
"☆": "-", | ||
"★": "-", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better replace with ASCII star *
"í": "i", "ì": "i", "ï": "i", "î": "i", | ||
"ó": "o", "ò": "o", "ö": "o", "ô": "o", | ||
"ú": "u", "ù": "u", "ü": "u", "û": "u", | ||
"ý": "y", "ỳ": "y", "ÿ": "y", "ŷ": "y", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above, some of the ones in "Misc Letters" and most of the new "Polish letters" you added can be handled by Picard's built-in function picard.util.textencoding.unaccent
.
from picard.util.textencoding import unaccent
print(unaccent("ÄŽ")) # this will print "AZ"
```
"Ç": "C", | ||
"ç": "c", | ||
"Ñ": "N", | ||
"ñ": "n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above four will be handled by unaccent
"Ł": "L", | ||
"ł": "l", | ||
"Ń": "N", | ||
"ń": "n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for "Ł" and "ł" all the above will be handled by unaccent
|
||
|
||
def ascii(word): | ||
return "".join(sanitize(char) for char in word) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest getting rid of the diacritic characters in the replacement tables as noted above and use Picard's unaccent
function instead as I described above. Then do this here:
word = "".join(sanitize(char) for char in word)
return unaccent(word)
@Echelon666 @finem4n So we currently have two similar PRs, #386 and #387. I think you guys should see what changes we make. If Echelon666 is fine reworking this plugin to provide a scripting function I think this warrants this as a separate plugin. The existing non_ascii_equivalents plugin can still be extended with additional tags as in #386 and reworked to use |
I am entirely unclear why we are creating an entirely new "enhanced" plugin here rather than simply enhancing the existing one. This is like me saying I want to tweak something in Picard, and I do so by publishing Picard Enhanced which does exactly what Picard does but with my additional tweak. This is a route to madness - an exponentially expanding ecosystem of almost identical functionality, but each slightly different from the other. If I wanted to enhance this plugin further, am I supposed to publish a "Non-ASCII Equivalents Enhanced Extended" version - and what happens when (sooner) we run out of synonyms for "Enhanced" OR (later) it literally takes more characters for the name than there are atoms in the Universe? Obviously you need to be wary of introducing backwards incompatibilities that would significantly change the outcome for someone upgrading in ways they wouldn't want. I do NOT think that is the case here, but even if it is was backwardly incompatible, I would expect to see minor incompatibilities handled with an option not by creating an entirely different but extremely similar competing plugin. P.S. It also makes reviewing the code more difficult because you don't get a DIFF showing what has changed from the original. P.P.S. I did suggest to @Echelon666 in the forums BEFORE he created this PR that this should be an update to the existing plugin, but that advice was apparently ignored. See also #384 where I took the time to provide an example of his proposed plugin with a better coding style, but @Echelon666 also ignored advice and review comments. TBH I am somewhat sadly forming the impression that it is complete waste of time reviewing PRs from him, because review comments and advice are never accepted and in the end the PR simply languishes open but unmergeable. @phw said:
I disagree. Adding a scripting function is a non-breaking change. |
My problem with the existing plugin is that it does all this transformation on existing kind of arbitrary existing tags. Adding a function is non-breaking, but removing the automatic conversion of tags would be. So when installing the plugin one would need to accept this automatic conversion. On the other hand we could also add such a function to Picard proper. Overall #386 is the better PR currently, though. |
No - whilst adding a hardcoded expanded list of tags to apply this to would indeed be breaking, instead providing a UI options page where the default set of tags is the existing set would be non-breaking. IMO we really, really need to avoid proliferation of highly similar plugins and instead have one plugin with options. |
Yes, I agree with that. The |
@Echelon666 When you make a commit, please provide an explanatory commit comments so that we can immediately see what your commit is intended to do. |
OK. Thanks. |
I removed Trailing whitespace and repeated Ó. I can't help you more because I don't know Python and Github very well. You have to finish it yourself. |
@Echelon666 I think the earlier PR #386, which now has also been updated, better addresses improving the plugin. With these changes most of what you wanted to have improved, specifically with handling Polish characters, will be solved in the existing plugin. |
But what do you suggest? That I remove my (this) PR? Let it be. ;) Just give me a sign. I will have my plugin for private use anyway, because in addition to Polish characters it has to remove funny characters. |
Philipp is suggesting that you close this PR because: I would imagine that he would prefer that you close it yourself unmerged, rather than for him to do it unilaterally. And don't forget that I predicted this a few days ago in a comment elsewhere. |
@Sophist-UK can you shut up? |
You quite literally asked for someone to tell you what to do. So I did - admittedly pretty bluntly. And then you blame ME and tell me to shut up? Given our shared history of me being blunt with you on several occasions now, did you really think that telling me to "shut up" would work rather than just prompt a further blunt response? FFS, grow up, act more maturely and take responsibility for the consequences of your own words and attitude. |
Go away, you loser. |
Yes, I'll close this. There have been two very identical PRs, and the other one provided the significant changes to the existing plugin. Coding changes can take time, and oftentimes one needs to go one step back to move forward. Also a plugin like this always can be changed and extended for personal use easily if wanted. @Echelon666 It's one thing to argue, but insulting others here is not acceptable. @Sophist-UK, who has been around for a long time and knows Picard and the plugins well, has been trying to help you despite all difficulties and disagreements. I'll close this PR now, but I expect all future discussions here and on the forums to be civilized. |
I added 41 new characters to the existing “Non-ASCII Equivalents” plugin:
18 Polish diacritics
23 "My others" special characters that were junk in the ID3 tag of my MP3 collection.
thread on forum:
https://community.metabrainz.org/t/strengthening-an-existing-plugin-non-ascii-equivalents/722822