Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copyright parser edge case #39

Open
julian45 opened this issue Jul 2, 2024 · 2 comments
Open

Copyright parser edge case #39

julian45 opened this issue Jul 2, 2024 · 2 comments
Labels
copyright Parse copyright notices script question Further information is requested

Comments

@julian45
Copy link

julian45 commented Jul 2, 2024

While filling in copyrights for this release today, the copyright parser handled the © credit without a problem, but it doesn't seem to like the ℗ credit:
℗ «2024 Living,Dining&kitchen Records»

I was hoping to have it parse this as one complete label entry (for me to match with this label), but it only seems to pick up the Living part alone.

I don't know if there's a way to adjust the parser to handle this in a way that wouldn't negatively affect the function of the parser for more normal cases, but if not, I'm hoping that you might be able to help me figure out a way to make adjustments even just for my personal copy.

I've confirmed that this occurs in the latest version of the parser userscript, v2024.7.1.

@kellnerd kellnerd added question Further information is requested copyright Parse copyright notices script labels Jul 3, 2024
@kellnerd
Copy link
Owner

kellnerd commented Jul 3, 2024

I am afraid that it is not possible to correctly handle this edge case without breaking the detection of other more common cases.

You would have to change the "Advanced configuration" of the credit parser (which is collapsed by default). There are two features you have to work around temporarily:

  1. A comma terminates the credit statement.
  2. An ampersand separates two credited (label) names. (Edit: I forgot that this is my custom setting which I have not made a default so far)

In order to parse the whole rest of the line as a single credited name you have to adapt the "Credit terminator" and the "Name separator" settings. Unless you are familiar with regular expressions, the simplest way to achieve this is to temporarily empty these settings.
After you have parsed the credit you can restore the default values using the reset buttons.

P.S. For this specific case (which has no space after the comma) you could also replace the (?=,| part of the credit terminator setting with (?=,\s|.

@kellnerd
Copy link
Owner

kellnerd commented Sep 3, 2024

Just for the record: I am still not sure if the case of a comma without a consecutive space should prevent splitting by default or not. Maybe there are more cases like in your example, but changing this might also lead to other cases not being handled correctly anymore.
I have just found back my commit from July and pushed it to a separate branch for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
copyright Parse copyright notices script question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants