-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aaron Stringer-Usdan #3
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,21 @@ | ||
# Reflection on implementing regular expressions of a DSL | ||
|
||
## Which operators were easiest to implement and why? | ||
I found all of the operators pretty easy to implement, but I guess the * and + operators were the easiest. * was of course easy because all it took was calling the Star constructor. + was nearly as easy since its definition wasn't much more complex and was entirely in terms of other operators. | ||
|
||
## Which operators were most difficult to implement and why? | ||
The ~ and || were somewhat more difficult, by which I mean that I spent more time figuring out how to make them work and not that I had to write significantly more complex code for them. For reasons I'm not entirely sure of (I guess it's because of the pattern matching?) it was not possible to use pattern matching on a Union or Concat object created with the operator when the definition looked like | ||
```Scala | ||
def ||(other:RegularExpression) = Union(this, other) | ||
``` | ||
I looked at the sample solution to see why this might not be working, and found "unapply" methods. After reading about them online, I figured out what they do but wouldn't have thought to use them so I wanted to see if I could allow pattern matching in a different way. For reasons I'm completely not sure of (and this might be super hacky and I shouldn't have done it), specifying the return type of these operators as `RegularExpression` made the pattern matching work without | ||
extractor methods. | ||
|
||
## Comment on the design of this internal DSL | ||
Now that I've got the whole thing working, it seems to have a lot of the features you'd want in a DSL. From a strictly theoretical point of view, you can write any regular expression you could want. All the operators are there, and represented by no more than three characters, too (unless you count the number of characters in the number _n_ you supply to the repetition operator). | ||
|
||
However, the formal mathematical definition of regular expressions doesn't have wildcard characters so far as I can recall, and neither does this DSL. It _is_ possible to get a wildcard expression by taking the union of all characters that could possibly be matched against, but this is _really_ not practical for someone using the DSL. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should be too concerned about the formal mathematical definition of regular expressions. What's important is making it easy for users to write regular expressions, and I agree that adding wildcards would be a good next step. In addition to the general wildcard, it could also be nice to have symbols that would match against all digits, all letters, etc. |
||
It might be nice if the language had some kind of wildcard object (just like there's an empty regex object). It probably couldn't be called "." since, I'm pretty sure, that's against the rules for identifiers in Scala. You could still call it something like "WILD", or use a | ||
unicode character, and just that would make it much easier to use wildcards. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, would there be a way to use a unicode character without having users type out the hex value? WILD seems like it'd be okay, though it's a bit long to type out. |
||
Write a few brief paragraphs that discuss: | ||
+ What works about this design? (For example, what things seem easy and | ||
natural to say, using the DSL?) | ||
+ What doesn't work about this design? (For example, what things seem | ||
cumbersome to say?) | ||
+ Think of a syntactic change that might make the language better. How would | ||
you implement it _or_ what features of Scala would prevent you from | ||
implementing it? (You don't have to write code for this part. You could say | ||
"I would use literal extension to..." or "Scala's rules for valid | ||
identifiers prevent...") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,8 @@ | ||
package dsls.regex | ||
|
||
import scala.language.implicitConversions | ||
import scala.language.postfixOps | ||
|
||
/** | ||
* Modify this file to implement an internal DSL for regular expressions. | ||
* | ||
|
@@ -11,6 +14,21 @@ package dsls.regex | |
abstract class RegularExpression { | ||
/** returns true if the given string matches this regular expression */ | ||
def matches(string: String) = RegexMatcher.matches(string, this) | ||
def matches(char: Char) = RegexMatcher.matches(char.toString, this) | ||
|
||
def ||(other:RegularExpression):RegularExpression = Union(this, other) | ||
def ~(other:RegularExpression):RegularExpression = Concat(this, other) | ||
//Specifying the return type of the above functions means we don't need extractors. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What kind of error message did you get when you didn't specify the return type? Curious because, as stated above, I didn't include the return type in my code and it worked fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I remember right, it said that objects of type Union and Concat don't have matches as a member function. I don't know why it couldn't figure out to typecast them to RegularExpression, but it couldn't, apparently. edit: or for that matter why they didn't inherit matches to begin with |
||
def <*> = Star(this) | ||
def * = this<*> | ||
def <+> = this ~ (this <*>) | ||
def + = this<+> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no comments... |
||
|
||
def apply(n:Int):RegularExpression = { | ||
if(n==0) EPSILON | ||
else this~this{n-1} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More concise than my version. You probably want to do something about negative values, though (ie throw an error). No one likes stack overflows.. |
||
} | ||
|
||
/** a regular expression that matches nothing */ | ||
|
@@ -34,3 +52,12 @@ case class Concat(val left: RegularExpression, val right: RegularExpression) | |
* expression | ||
*/ | ||
case class Star(val expression: RegularExpression) extends RegularExpression | ||
|
||
object RegularExpression { | ||
implicit def charToRegex(c: Char): RegularExpression = Literal(c) | ||
|
||
implicit def strToRegex(s: String): RegularExpression = { | ||
((EPSILON:RegularExpression) /: s)((x,y)=>Concat(x,y)) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's weird. My Union/Concat methods were working fine without me specifying the return type, and I didn't use any extractor methods.