Skip to content

Improve readability of construct_internal #1552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

sffc
Copy link
Member

@sffc sffc commented Jan 27, 2022

Follow-up to #1540

@sffc sffc requested a review from robertbastian January 27, 2022 21:05
@sffc sffc requested a review from Manishearth as a code owner January 27, 2022 21:05
@sffc
Copy link
Member Author

sffc commented Jan 27, 2022

I know readability is subjective, but I find this to match better with my mental model.

@sffc sffc removed the request for review from Manishearth January 27, 2022 21:06
@robertbastian
Copy link
Member

I don't like this. Currently all outgoing edges of a state are together, this puts them all over the place.

Comment on lines 160 to 184
state = match (state, content) {
(Start | Body0, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
Body0
}
(AfterChar, Some(b'/')) => AfterSlash,
(AfterChar, _) => return Err(("[a-zA-z0-9=_/]", i)),

(AfterSlash, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
AfterCharAfterSlash
(Body0, Some(b'/')) => Slash,
(Slash | Body1, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
Body1
}
(AfterSlash, _) => return Err(("[a-zA-Z0-9=_]", i)),

(
AfterCharAfterSlash,
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'='),
) => AfterCharAfterSlash,
(AfterCharAfterSlash, Some(b'/')) => AfterSlash,
(AfterCharAfterSlash, Some(b'@')) => AfterAt,
(AfterCharAfterSlash, _) => return Err(("[a-zA-z0-9=_/@]", i)),

(AfterAt, Some(b'0'..=b'9')) => AfterDigit,
(AfterAt, _) => return Err(("[0-9]", i)),

(AfterDigit, Some(b'0'..=b'9')) => AfterDigit,
(AfterDigit, Some(_)) => return Err(("[0-9]", i)),
(AfterDigit, None) => {
(Body1, Some(b'/')) => Slash,
(Body1, Some(b'@')) => AtSign,
(AtSign | Body2, Some(b'0'..=b'9')) => Body2,

// Success:
(Body2, None) => {
return Ok(Self {
path,
hash: ResourceKeyHash::compute_from_str(path),
})
}

// Errors:
(Start | Slash, _) => return Err(("[a-zA-Z0-9=_]", i)),
(Body0, _) => return Err(("[a-zA-z0-9=_/]", i)),
(Body1, _) => return Err(("[a-zA-z0-9=_/@]", i)),
(AtSign | Body2, _) => return Err(("[0-9]", i)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make it even more structured and nicer imho:

Suggested change
state = match (state, content) {
(Start | Body0, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
Body0
}
(AfterChar, Some(b'/')) => AfterSlash,
(AfterChar, _) => return Err(("[a-zA-z0-9=_/]", i)),
(AfterSlash, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
AfterCharAfterSlash
(Body0, Some(b'/')) => Slash,
(Slash | Body1, Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=')) => {
Body1
}
(AfterSlash, _) => return Err(("[a-zA-Z0-9=_]", i)),
(
AfterCharAfterSlash,
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'='),
) => AfterCharAfterSlash,
(AfterCharAfterSlash, Some(b'/')) => AfterSlash,
(AfterCharAfterSlash, Some(b'@')) => AfterAt,
(AfterCharAfterSlash, _) => return Err(("[a-zA-z0-9=_/@]", i)),
(AfterAt, Some(b'0'..=b'9')) => AfterDigit,
(AfterAt, _) => return Err(("[0-9]", i)),
(AfterDigit, Some(b'0'..=b'9')) => AfterDigit,
(AfterDigit, Some(_)) => return Err(("[0-9]", i)),
(AfterDigit, None) => {
(Body1, Some(b'/')) => Slash,
(Body1, Some(b'@')) => AtSign,
(AtSign | Body2, Some(b'0'..=b'9')) => Body2,
// Success:
(Body2, None) => {
return Ok(Self {
path,
hash: ResourceKeyHash::compute_from_str(path),
})
}
// Errors:
(Start | Slash, _) => return Err(("[a-zA-Z0-9=_]", i)),
(Body0, _) => return Err(("[a-zA-z0-9=_/]", i)),
(Body1, _) => return Err(("[a-zA-z0-9=_/@]", i)),
(AtSign | Body2, _) => return Err(("[0-9]", i)),
state = match state {
Start => match byte {
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=') => AfterChar,
_ => return Err(("[a-zA-Z0-9=_]", i)),
},
AfterChar => match byte {
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=') => AfterChar
Some(b'/') => AfterSlash,
_ => return Err(("[a-zA-z0-9=_/]", i)),
}
AfterSlash => match byte {
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=') => AfterCharAfterSlash
_ => return Err(("[a-zA-Z0-9=_]", i)),
}
AfterCharAfterSlash => match byte {
Some(b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=') => AfterCharAfterSlash,
Some(b'/') => AfterSlash,
Some(b'@') => AfterAt,
_ => return Err(("[a-zA-z0-9=_/@]", i)),
}
AfterAt => match byte {
Some(b'0'..=b'9') => AfterDigit,
_ => return Err(("[0-9]", i)),
}
AfterDigit => match byte {
Some(b'0'..=b'9') => AfterDigit,
Some(_) => return Err(("[0-9]", i)),
None => {
return Ok(Self {
path,
hash: ResourceKeyHash::compute_from_str(path),
})
}
}

The error states are much more obviously correct because they're the fallthrough from the lines directly above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this more readable than what you did in #1540, but this still has the problem of the b'a'..=b'z' | b'A'..=b'Z' | b'0'..=b'9' | b'_' | b'=' and stuff being duplicated.

@sffc
Copy link
Member Author

sffc commented Jan 27, 2022

This change keeps the enum and error values you introduced in #1540, but puts the flow of the state machine back to something closer to what it had been before. The flow got changed in the last commit to #1540 which I didn't have a chance to review before it got merged. I prefer the logic written out this way because:

  1. The patterns of accepted characters are not duplicated "all over the place"
  2. Centralizes around destination states more than source states
  3. Fewer lines of code since there is less duplication

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i like this too

AfterCharAfterSlash,
AfterAt,
AfterDigit,
Body0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (nb): while you're here, do you want to add some doc comments explaining each state qualitatively?

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly about this go ahead

@robertbastian
Copy link
Member

I incorporated this in #1555

@sffc sffc deleted the construct_internal branch February 1, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants