Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Use JS regular expression syntax, not path-to-regexp, in RegexRouter #3170

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jswalden
Copy link
Contributor

@jswalden jswalden commented Dec 8, 2024

I sniped myself. 🙂

I initially used literal regular expressions, in part because it'd make things like case-insensitivity explicit. (Maybe TypeScript will generate precise types as regards named capture groups sometime.) But regular expression patterns aren't composable, so I quickly switched to template literals. (This also meant I could change Element's definition when I learned that existing tests weren't nearly as restrictive in the enforced character sets for particular names as I'd assumed.)

TextRequiringPrecedingSpace is defined as legibly as I could make it, albeit somewhat verbosely. (I don't get the sense lookbehind assertions are in common enough use to be self-explanatory if I just inlined them.)

/**
* Add a route to the router, using the express style path syntax
*/
addPath(path: string, handler: PathRouteHandler): void {
Copy link
Member

@Julusian Julusian Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should be keeping pathToRegexp and this method to use for the osc api. Because that api is using typical slash separated components, so wouldn't be fighting the library?

The reason I started using pathToRegexp was to make the regexes simple and resilient with minimal effort (the library will have much better redos and similar protections than we will think to write)

I don't know though, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My taste -- I emphasize this is a taste as this is your turf 🙂 -- is to prefer built-in language function over libraries, when they do similar things. Regular expression syntax is in the JS author's toolkit. A new regular expression language is not. (And theoretically could have its own corner cases.)

I also prefer there be fewer ways to do it, not more. Two distinct pattern mechanisms seems worse than one -- perhaps than either one.

Regular expression pathological cases happen when you have repetition with possibility of matching multiple ways. None of these cases have that. ReDOS simply isn't a worry here.

Honestly, tho, sequentially testing a list of regular expressions isn't the right way for either of these cases compared to actual parsing. For OSC I'd address.split('/') and iterate through the components. For TCP I'd incrementally tokenize the string. I'd use some switches to invoke the intended operation -- and return informative errors for contextual errors.

But I'd never start writing that patch if I didn't know it'd be accepted. 🙂 Regular expressions aren't right for this, but for a fixed list of a couple dozen syntaxes they're bearable, in a patch with the least code most likely to be accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new regular expression language is not. (And theoretically could have its own corner cases.)
Honestly, tho, sequentially testing a list of regular expressions isn't the right way for either of these cases compared to actual parsing.

True, but I found pathToRegexp because I was curious how express (or koa?) did its route matching. its a pretty widely used and known syntax because of both of those.
Which I think is where I started on this; wanting to setup the osc api in a way that is familiar to adding routes to express.

I think that this regex implementation is much more easily readable than tokenizing would be. Thats the nice thing with this pathToRegexp implementation is that with just a little knowledge on the syntax of the paths, it is easy to skim read each rule being checked and find what you are looking for.

I think this is slightly lost in this PR because it hides some things (format of location), but it makes it more readable in other ways as it highlights which bits of the string are matching something while keeping the exact syntax out of it.

So I am tempted by this, just not convinced by tokenizing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants