-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bracing for words containing unbraced capital letters #118
base: master
Are you sure you want to change the base?
Conversation
Thanks for your contributions, and thanks for documenting them so thoroughly!
I'd be in favour of option 1 because (a) it's consistent behaviour for both cases ( Out of interest - what might be the potential pitfalls of double bracing the entire value?
Another consideration (perhaps for a separate PR) is to only wrap words which have uppercase letters not at the start. This way we still allow the bibtex style to make casing consistent except where the uppercase is clearly intentional.
Finally, we have an issue with some bibtex exports containing fully uppercases values. I don't have any examples to hand but I have seen examples like this come from Scholar exports:
It may be prudent to skip wrapping capitals if the entire value is uppercase. Although we do have some that might be valid 🤷:
Option 1 seems overkill - e.g. would it really be needed for abstract? It might mess up author/editor/doi/url/crossref values too, because they have to be formatted in a certain way. Option 2 seems reasonable, but with different defaults. e.g. title, shorttitle, booktitle, journal, publisher.
I agree "escape" is potentially confusing. How about It's worth noting that your current solution appears to be limited to the latin alphabet. I'm OK with this as a limitation because it would be harder to write a regexp for other alphabets, but it would be good to see it noted in the code.
Great idea capturing the philosophy in CONTRIBUTING.md. It was originally very opinionated as it was literally just a script for cleaning up my theses bibliography, but nowadays I tend to cater to requests for greater customisability. I'm not too worried about the UI getting cluttered. We can refine it to hide details. Maintaining backward compatibility is the key thing. So if we get this new option into production, we can't really change how it behaves (unless it's a bug fix). |
Appreciated!
This describes quite well what to do or not to do.
I think we should actually follow that style, as you suggested. The sensible default would be, to not change the bracing of capitals, that are already braced. I don't know however how this should be combined with #120 . Should the combination mean, that there are no braces in the end, or that the bracing is changed to enclose entire words. One of those would happen if both are applied, except we ignore one option if the other is turned on. We should also make sure that the output is stable across runs. I would actually want to be able to unify the bracing style. Maybe a seperate toggle could be something like a "force" flag regarding this option.
I can open a PR if you want.
Isn't this sufficiently handled by the option to remove all caps? I don't know, how we would make the judgement for correct or incorrect bracing.
Let me know if there are any others i should add.
Brace is probably the wording to go for. I will try to come up with something precise, that allows for the implementation of other options regarding bracing without causing unnecessary confusion.
Will be implemented using unicode categories. |
@@ -1,5 +1,14 @@ | |||
import { specialCharacters } from './unicode'; | |||
|
|||
export function braceWordsWithCapital(str: string): string { | |||
return str.replace( | |||
/(?<![\{\p{L}])\p{L}*\p{Lu}\p{L}*(?![\p{L}\}])/gu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome regexp! Could you add a comment to explain what it does?
Agree with your design choices and code looks good. Unicode categories are cool! Could you document the design choices somewhere in the code (maybe above the braceWordsWithCapital function):
For the option name I'd argue Are you happy to write some tests for this? |
Nice feature. Thank you. @misc{sel4-perf,
- title = {seL4 Performance},
- howpublished = {\url{https://sel4.systems/About/Performance/}}
+ title = {{seL}4 {Performance}},
+ howpublished = {\url{https://sel4.systems/{About}/{Performance}/}}
} |
This looks like a great thread with good ideas of implementation of this feature. I've been running into this issue lately and was wondering if this feature was implemented yet or will be soon? |
Hey, I know I have been working on it, but sadly could not get around to finishing this. This also will probably take a while. So if anyone would like to finish this up, this would be a relief to me! |
Thank you very much for this awesome package. I implemented the option described in #24 with the following behavior:
there are however some things to consider regarding the implementation, which I would like to discuss before merging.
Behaviour
The example shows, that a word that does not have a capital letter at the start, but more than one inside will not be capitalized. If we want to do that, we could just say: escape any group (1-X) of uppercase letters.
A common pattern is also to escape like this:
This will not be changed, however, it is quite common. Should the given pattern be changed to the default of bibtex-tidy? Maybe the option to choose:
[A-Z]
to\w
Allow configuring fields?
As of now, if the option is turned on (default: false), the escaping of the upper case is applied to all fields or rather all fields containing any capital letters. The escaping of capital letters will thus also be applied to Journal, Series, Author, or any other field, which may be undesirable. I think there are 3 ways to deal with this:
sortFields
option. If we can't find sensible defaults, this is probably best, as we could just have the defaults there, but editable (think sort fields option).title
,shorttitle
andbooktitle
. Happy to count any others in! If we can find sensible defaults, this may be preferred due to the required UI space of option 2.Option name
Right now the chosen name for the cli option is
--escape-uppercase
. This can lead to unclear cli arguments likebibtex-tidy --escape-uppercase --no-escape
. The escaping of special characters and of uppercase letters has possibly confusable names throughout the codebase.Tests
I will of course add a testcase, once the desired behaviour is defined!
BTW: Thanks for the understandable structure of the codebase. I don't know the structure of node packages that well, but could easily find the places to add the option, namely
docs/index.html
,docs/index.ts
,src/format.ts
,src/optionDefinition.ts
, andsrc/utils.ts
.Thank you very much for the feedback!
Let me know, whether you fear a cluttered UI due to a way too high amount of options. We may not want to be too opinionated, but that means implementing more options. I might tackle some other issues as well but would like your stance on this. Maybe a
CONTRIBUTING.md
may be a good idea, to also document how to add more options and capture the philosophy.Closes #24