Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin: fix assert '_ISspace'.camel_to_snake() == '_i_sspace' #21736

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

ttytm
Copy link
Member

@ttytm ttytm commented Jun 26, 2024

@@ -1529,6 +1529,7 @@ fn test_camel_to_snake() {
assert 'BBaa'.camel_to_snake() == 'b_baa'
assert 'aa_BB'.camel_to_snake() == 'aa_bb'
assert 'JVM_PUBLIC_ACC'.camel_to_snake() == 'jvm_public_acc'
assert '_ISspace'.camel_to_snake() == '_i_sspace'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not IS a separate word from space?
But then BBaa became b_baa 🤔, so it is consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have these in my /usr/include/ctype.h:

# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
#  define _ISbit(bit)   (1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
#  define _ISbit(bit)   ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif

enum
{
  _ISupper = _ISbit (0),    /* UPPERCASE.  */
  _ISlower = _ISbit (1),    /* lowercase.  */
  _ISalpha = _ISbit (2),    /* Alphabetic.  */
  _ISdigit = _ISbit (3),    /* Numeric.  */
  _ISxdigit = _ISbit (4),   /* Hexadecimal numeric.  */
  _ISspace = _ISbit (5),    /* Whitespace.  */
  _ISprint = _ISbit (6),    /* Printing.  */
  _ISgraph = _ISbit (7),    /* Graphical.  */
  _ISblank = _ISbit (8),    /* Blank (usually SPC and TAB).  */
  _IScntrl = _ISbit (9),    /* Control character.  */
  _ISpunct = _ISbit (10),   /* Punctuation.  */
  _ISalnum = _ISbit (11)    /* Alphanumeric.  */
};
#endif /* ! _ISbit  */

And later:

/usr/include/ctype.h:197:# define isspace(c)    __isctype((c), _ISspace)

I think the intended usage of the macro is as a predicate (is space), for checking whether a letter is a space character, an ASCII code of 32 (2^5).

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, a conversion to _is_space being more intuitive is the impression I'm getting too.

As you mention, the implementation should be updated to be consistent then.

Likely obvious but to have it noted: Since we handle the first separately, the general implementation for capitals followed by lowercase characters should be updated then. aaBBc currently becomes aa_b_bc, while aa_bb_c would be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rule may be that if there is camelCase, i.e. a single capital letter C, followed by lower case letters, then the capital is part of the following word.

However, if there are several capitals one after the other, then the whole span of capitals, is its own independent word (perhaps from an acronym, or an abbreviation, or the product of a deranged mind), and the following lower case letters form their own independent word.

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a short search for commonalities and conventions regarding camel to snake conversions, it would be _i_sspace for a most widely used rust crate too:

https://crates.io/crates/convert_case

use convert_case::{Case};

fn main() {
	let snake_str = "_ISspace".from_case(Case::Camel).to_case(Case::Snake);
	dbg!(snake_str); // `[main.rs:5:5] snake_str = "_i_sspace"`
	let snake_str = "AAbb".from_case(Case::Camel).to_case(Case::Snake);
	dbg!(snake_str); // `[main.rs:7:5] snake_str = "a_abb"`
}

Copy link
Member Author

@ttytm ttytm Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Rust took the "easy" way out... counting the capitals is a bit more work.

I'd prefer the method that looks more correct... it can always be changed if it causes problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's time to break a convention, maybe first merge the fix to have it in the history for a potential rollback, then make the change?

Yes, that is good idea. The version on master is definitely broken, since it loses a letter, while the version here does not.

@spytheman spytheman merged commit 6ecfc6f into vlang:master Jun 27, 2024
76 checks passed
@ttytm ttytm deleted the builtin/fix-snake-to-camel2 branch June 27, 2024 07:02
raw-bin pushed a commit to raw-bin/v that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants