-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtin: fix assert '_ISspace'.camel_to_snake() == '_i_sspace'
#21736
Conversation
@@ -1529,6 +1529,7 @@ fn test_camel_to_snake() { | |||
assert 'BBaa'.camel_to_snake() == 'b_baa' | |||
assert 'aa_BB'.camel_to_snake() == 'aa_bb' | |||
assert 'JVM_PUBLIC_ACC'.camel_to_snake() == 'jvm_public_acc' | |||
assert '_ISspace'.camel_to_snake() == '_i_sspace' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not IS
a separate word from space
?
But then BBaa
became b_baa
🤔, so it is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have these in my /usr/include/ctype.h
:
# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
# define _ISbit(bit) (1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
# define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif
enum
{
_ISupper = _ISbit (0), /* UPPERCASE. */
_ISlower = _ISbit (1), /* lowercase. */
_ISalpha = _ISbit (2), /* Alphabetic. */
_ISdigit = _ISbit (3), /* Numeric. */
_ISxdigit = _ISbit (4), /* Hexadecimal numeric. */
_ISspace = _ISbit (5), /* Whitespace. */
_ISprint = _ISbit (6), /* Printing. */
_ISgraph = _ISbit (7), /* Graphical. */
_ISblank = _ISbit (8), /* Blank (usually SPC and TAB). */
_IScntrl = _ISbit (9), /* Control character. */
_ISpunct = _ISbit (10), /* Punctuation. */
_ISalnum = _ISbit (11) /* Alphanumeric. */
};
#endif /* ! _ISbit */
And later:
/usr/include/ctype.h:197:# define isspace(c) __isctype((c), _ISspace)
I think the intended usage of the macro is as a predicate (is space
), for checking whether a letter is a space character, an ASCII code of 32 (2^5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, a conversion to _is_space
being more intuitive is the impression I'm getting too.
As you mention, the implementation should be updated to be consistent then.
Likely obvious but to have it noted: Since we handle the first separately, the general implementation for capitals followed by lowercase characters should be updated then. aaBBc
currently becomes aa_b_bc
, while aa_bb_c
would be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rule may be that if there is camelCase
, i.e. a single capital letter C
, followed by lower case letters, then the capital is part of the following word.
However, if there are several capitals one after the other, then the whole span of capitals, is its own independent word (perhaps from an acronym, or an abbreviation, or the product of a deranged mind), and the following lower case letters form their own independent word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a short search for commonalities and conventions regarding camel to snake conversions, it would be _i_sspace
for a most widely used rust crate too:
https://crates.io/crates/convert_case
use convert_case::{Case};
fn main() {
let snake_str = "_ISspace".from_case(Case::Camel).to_case(Case::Snake);
dbg!(snake_str); // `[main.rs:5:5] snake_str = "_i_sspace"`
let snake_str = "AAbb".from_case(Case::Camel).to_case(Case::Snake);
dbg!(snake_str); // `[main.rs:7:5] snake_str = "a_abb"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Rust took the "easy" way out... counting the capitals is a bit more work.
I'd prefer the method that looks more correct... it can always be changed if it causes problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?
Yes, that is good idea. The version on master is definitely broken, since it loses a letter, while the version here does not.
Ref.: #21722 (comment)