Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin: fix assert '_ISspace'.camel_to_snake() == '_i_sspace' #21736

Merged
merged 2 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions vlib/builtin/string.v
Original file line number Diff line number Diff line change
Expand Up @@ -2662,9 +2662,13 @@ pub fn (s string) camel_to_snake() string {
}
lower_first_c, lower_second_c
} else {
lower_first_c := s[0]
second_c := if s[1].is_capital() { u8(`_`) } else { s[1] }
lower_first_c, second_c
first_c := s[0]
second_c := if s[1].is_capital() {
if first_c == `_` { s[1] + 32 } else { u8(`_`) }
} else {
s[1]
}
first_c, second_c
}
unsafe {
b[0] = first_char
Expand Down
1 change: 1 addition & 0 deletions vlib/builtin/string_test.v
Original file line number Diff line number Diff line change
Expand Up @@ -1529,6 +1529,7 @@ fn test_camel_to_snake() {
assert 'BBaa'.camel_to_snake() == 'b_baa'
assert 'aa_BB'.camel_to_snake() == 'aa_bb'
assert 'JVM_PUBLIC_ACC'.camel_to_snake() == 'jvm_public_acc'
assert '_ISspace'.camel_to_snake() == '_i_sspace'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not IS a separate word from space?
But then BBaa became b_baa 🤔, so it is consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have these in my /usr/include/ctype.h:

# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
#  define _ISbit(bit)   (1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
#  define _ISbit(bit)   ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif

enum
{
  _ISupper = _ISbit (0),    /* UPPERCASE.  */
  _ISlower = _ISbit (1),    /* lowercase.  */
  _ISalpha = _ISbit (2),    /* Alphabetic.  */
  _ISdigit = _ISbit (3),    /* Numeric.  */
  _ISxdigit = _ISbit (4),   /* Hexadecimal numeric.  */
  _ISspace = _ISbit (5),    /* Whitespace.  */
  _ISprint = _ISbit (6),    /* Printing.  */
  _ISgraph = _ISbit (7),    /* Graphical.  */
  _ISblank = _ISbit (8),    /* Blank (usually SPC and TAB).  */
  _IScntrl = _ISbit (9),    /* Control character.  */
  _ISpunct = _ISbit (10),   /* Punctuation.  */
  _ISalnum = _ISbit (11)    /* Alphanumeric.  */
};
#endif /* ! _ISbit  */

And later:

/usr/include/ctype.h:197:# define isspace(c)    __isctype((c), _ISspace)

I think the intended usage of the macro is as a predicate (is space), for checking whether a letter is a space character, an ASCII code of 32 (2^5).

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a conversion to _is_space being more intuitive is the impression I'm getting too.

As you mention, the implementation should be updated to be consistent then.

Likely obvious but to have it noted: Since we handle the first separately, the general implementation for capitals followed by lowercase characters should be updated then. aaBBc currently becomes aa_b_bc, while aa_bb_c would be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rule may be that if there is camelCase, i.e. a single capital letter C, followed by lower case letters, then the capital is part of the following word.

However, if there are several capitals one after the other, then the whole span of capitals, is its own independent word (perhaps from an acronym, or an abbreviation, or the product of a deranged mind), and the following lower case letters form their own independent word.

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a short search for commonalities and conventions regarding camel to snake conversions, it would be _i_sspace for a most widely used rust crate too:

https://crates.io/crates/convert_case

use convert_case::{Case};

fn main() {
	let snake_str = "_ISspace".from_case(Case::Camel).to_case(Case::Snake);
	dbg!(snake_str); // `[main.rs:5:5] snake_str = "_i_sspace"`
	let snake_str = "AAbb".from_case(Case::Camel).to_case(Case::Snake);
	dbg!(snake_str); // `[main.rs:7:5] snake_str = "a_abb"`
}

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Rust took the "easy" way out... counting the capitals is a bit more work.

I'd prefer the method that looks more correct... it can always be changed if it causes problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?

Yes, that is good idea. The version on master is definitely broken, since it loses a letter, while the version here does not.

}

fn test_snake_to_camel() {
Expand Down
Loading