-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Andrew Fishberg #13
base: master
Are you sure you want to change the base?
Andrew Fishberg #13
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,37 @@ | |
|
||
## Which operators were easiest to implement and why? | ||
|
||
The easiest operators to implement were ||, ~, and <*>. These operators were | ||
literally just wrappers around the Union(), Concat(), and Star() case classes | ||
respectively. The only real difference is these operators were implemented | ||
within the Regular Expression class so they could be infix operators (for || | ||
and ~) and a postfix operator (for <*>). Although <+> operator was still pretty | ||
easy, unlike the previous examples, it was defined in terms of other | ||
operatators (i.e. ~ and <*>) rather than just being a wrapper around a case | ||
class. | ||
|
||
Overall, these operators were so easy to implement since all the behavior | ||
was already defined by RegexMatcher. This means the operators were really | ||
only responsible for properly nesting input inside predefined | ||
RegularExpression case classes. | ||
|
||
## Which operators were most difficult to implement and why? | ||
|
||
Although none of the operators were particularly difficult, apply (i.e. the {} | ||
operator) was certainly the most difficult of the 5 operators. This is because | ||
unlike the other operators, it required more than just nesting case classes | ||
or using our other operators. Rather, the {} operator needed some type of | ||
repetitive logical structure. Although I chose to use recursion (i.e. the {} | ||
operator recursively uses itself until it hits the base case) a _for loop_ | ||
could have provided this same functionality. | ||
|
||
In other words, although most of the behavior of the {} operator is defined | ||
in RegexMatcher, this operator required a logical construction of the | ||
case class nesting rather than simply relying on a case class to embody all | ||
of this information. Although there could be a case class object to embody | ||
this information, by defining the {} operator in terms of other operators and | ||
case classes, we can simplify the behavior code in RegexMatcher. | ||
|
||
## Comment on the design of this internal DSL | ||
|
||
Write a few brief paragraphs that discuss: | ||
|
@@ -15,4 +44,63 @@ Write a few brief paragraphs that discuss: | |
you implement it _or_ what features of Scala would prevent you from | ||
implementing it? (You don't have to write code for this part. You could say | ||
"I would use literal extension to..." or "Scala's rules for valid | ||
identifiers prevent...") | ||
identifiers prevent...") | ||
|
||
This design works very well for a simple implementation of a Scala regex | ||
internal DSL. That is, this implementation allows Scala users to natively use | ||
the most common regex features within Scala. For instance, if Java was going to | ||
attempt to implement a similar structure for regex (i.e. using classes rather | ||
than packing it within the String class, like it actually does), you would | ||
not be able to be this expressive. Java code, by not allowing implicit | ||
conversions, operator overloads, and () omission in function calls, would be | ||
forced to look similar to the initial Scala code we were provided. Although the | ||
line between internal DSL and API is not clearly defined, the feel of this | ||
final Scala code is much more DSL-like than anything Java would be able to | ||
produce. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is true, and I'd hope that it'd be the case, given that it's what Scala was really designed for. |
||
|
||
That being said, there are parts of this DSL that are a little clunky. | ||
Java's implementation of regex, built into the String class, comes by default | ||
with some very useful predefined regular expressions that match common | ||
"character sets". Although we managed to implement digits in our code (e.g. | ||
the set of digits '\d' in Java), we needed to do it by manually unioning all | ||
the numeric characters. Although this was not too bad thanks to the || | ||
operator, this quickly become unrealistic for "character sets" that | ||
include more than 10 characters (e.g. the set of non-digits '\D'). Although | ||
not having to implement this saves the implementer work, it is very | ||
undesirable for users. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But presumably, the implementers for Java's regex had to do the same thing - so we, as implementers, could be nice to our users and give them some of these sets if we're willing to put in the effort to define them. I guess this could be an implementer vs user tradeoff to some degree. |
||
|
||
Additionally, since this internal DSL only tackles the most common regex | ||
operations there are many features that are simply not available currently. | ||
For instance, the '[^...]' subexpression, '\A', \z' in Java cannot be | ||
implemented with the existing set of operations without further expanding | ||
this DSL and the RegexMatcher. ('[^...]' matches any single character not | ||
included in the brackets; '\A' matches the beginning of the string; '\z' | ||
matches the end of a string). Again, although not having to implement these | ||
operations saves the implementer work, it is very undesirable for users. | ||
|
||
That being said, nothing about the Scala environment prevents there from | ||
being ways to define these operations and character sets. Just like we | ||
internally defined the objects EMPTY and EPSILON which match nothing | ||
and the empty string respectively, we could define objects like DIGIT, | ||
NONDIGIT, BEGIN, and END which match to digits, non-digits, string starts, and | ||
string ends respectively. Although this would require the implementer to | ||
convolute this implementation, the additional features would be necessary for | ||
this internal DSL to match up against most common modern regex libraries. | ||
|
||
Similarly, although we would probably need to define it as a postfix operator, | ||
we could make something that matches the functionality of the '[^...]'. Although | ||
this change would certainly make the DSL better, it is debatable whether this | ||
can be incorperated into the syntax (this is a very important point). Although | ||
things like NONDIGIT could be incorperated into our internal DSL so it is | ||
functionally and almost syntaxtically iterchangable with the Java regex | ||
implementation (i.e. at least in terms of ordering), there are certain | ||
restrictions Scala puts that we cannot change. That is, as long as our regex | ||
implementation remains an internal DSL, there is a set of features we cannot | ||
change. For instance, Scala limits us to a small subset of unary prefix | ||
operators. It is for this reason we would probably want to implement '[^...]' | ||
as a postfix operator of some sort. Similarly, + and * are normally binary | ||
operators in Scala, thus we use <+> and <*> as our unary postfix operators. | ||
The only way to overcome these shortcomings would be to implement an | ||
external DSL which adds a lot more work for the developer. Furthermore, forcing | ||
users to relearn a standard (i.e. regex) just because of a shortcoming in your | ||
implementation is a little unreasonable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True - that said, regexes aren't already perfectly standardized (I think), so we'll have to pick some kind of standard to go with. I'm curious what Scala features might let us implement '[^...]', or if something would prevent us. I think I'd have to see an example of it in use for that though. It does seem like most of this can be viewed as tradeoffs between implementer and user, though, which I think is a good point. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,22 +3,23 @@ package dsls.regex | |
object Program extends App { | ||
|
||
/**************************************************************************** | ||
* TODO: Extend characters to support regular expressions | ||
* | ||
* Make it possible to replace the definition of the numbers with: | ||
* val zero = '0' | ||
* etc. | ||
* Allows us to use our DSL in this Program object | ||
***************************************************************************/ | ||
val zero = Literal('0') | ||
val one = Literal('1') | ||
val two = Literal('2') | ||
val three = Literal('3') | ||
val four = Literal('4') | ||
val five = Literal('5') | ||
val six = Literal('6') | ||
val seven = Literal('7') | ||
val eight = Literal('8') | ||
val nine = Literal('9') | ||
import RegularExpression._ | ||
|
||
/**************************************************************************** | ||
* Implicitly casts characters to a RegularExpression | ||
***************************************************************************/ | ||
val zero = '0' | ||
val one = '1' | ||
val two = '2' | ||
val three = '3' | ||
val four = '4' | ||
val five = '5' | ||
val six = '6' | ||
val seven = '7' | ||
val eight = '8' | ||
val nine = '9' | ||
|
||
require(zero matches "0") | ||
require(one matches "1") | ||
|
@@ -30,25 +31,21 @@ object Program extends App { | |
require(seven matches "7") | ||
require(eight matches "8") | ||
require(nine matches "9") | ||
println("Passed individual digit requires!") | ||
|
||
/**************************************************************************** | ||
* TODO: Extend strings to support regular expressions | ||
* | ||
* Make it possible to replace the definition of answer with: | ||
* val answer = "42" | ||
***************************************************************************/ | ||
val answer = Concat(four, two) | ||
|
||
* Implicitly casts strings to a RegularExpression | ||
****************************************************************************/ | ||
val answer = "42" | ||
|
||
require(answer matches "42") | ||
println("Passed implicit string conversion require!") | ||
|
||
/**************************************************************************** | ||
* TODO: Add the union operator for regular expressions | ||
* | ||
* Make it possible to replace the definition of digit with: | ||
* val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9' | ||
* Implicitly defines a RegularExpression with characters and the || | ||
* operator that matches to any digit | ||
***************************************************************************/ | ||
val digit = Union(zero, Union(one, Union(two, Union(three, Union(four, | ||
Union(five, Union(six, Union(seven, Union(eight, nine))))))))) | ||
val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9' | ||
|
||
require(digit matches "0") | ||
require(digit matches "1") | ||
|
@@ -59,103 +56,77 @@ object Program extends App { | |
require(digit matches "6") | ||
require(digit matches "7") | ||
require(digit matches "8") | ||
require(digit matches "9") | ||
require(digit matches "9") | ||
println("Passed digit requires!") | ||
|
||
/**************************************************************************** | ||
* TODO: Add the concatenation operator for regular expressions | ||
* | ||
* Make it possible to replace the definition of digit with: | ||
* val pi = '3' ~ '1' ~ '4' | ||
* Uses the ~ operator for concatenation | ||
***************************************************************************/ | ||
val pi = Concat(Literal('3'), Concat(Literal('1'), Literal('4'))) | ||
val pi = '3' ~ '1' ~ '4' | ||
|
||
require(pi matches "314") | ||
println("Passed pi requires!") | ||
|
||
/**************************************************************************** | ||
* TODO: Add the star operator for regular expressions | ||
* | ||
* Make it possible to replace the definition of zeroOrMoreDigits with: | ||
* val zeroOrMoreDigits = digit <*> | ||
* Uses the <*> operator | ||
***************************************************************************/ | ||
val zeroOrMoreDigits = Star(digit) | ||
val zeroOrMoreDigits = digit<*> | ||
|
||
require(zeroOrMoreDigits matches "") | ||
require(zeroOrMoreDigits matches "0") | ||
require(zeroOrMoreDigits matches "9") | ||
require(zeroOrMoreDigits matches "09") | ||
require(zeroOrMoreDigits matches "987651234") | ||
println("Passed digit requires!") | ||
|
||
/**************************************************************************** | ||
* TODO: Add the plus operator for regular expressions | ||
* | ||
* Make it possible to replace the definition of number with: | ||
* val number = digit <+> | ||
* Uses the <+> operator | ||
***************************************************************************/ | ||
val number = Concat(digit, zeroOrMoreDigits) | ||
val number = digit<+> | ||
|
||
require(!(number matches "")) | ||
require(number matches "0") | ||
require(number matches "9") | ||
require(number matches "09") | ||
require(number matches "987651234") | ||
println("Passed number requires!") | ||
|
||
/**************************************************************************** | ||
* TODO: Add the repetition operator for regular expressions | ||
* | ||
* Make it possible to replace the definition of cThree with: | ||
* val cThree = 'c'{3} | ||
* Uses the {} operator for repetition | ||
***************************************************************************/ | ||
val cThree = Concat(Literal('c'), Concat(Literal('c'), Literal('c'))) | ||
val cThree = 'c'{3} | ||
|
||
require(cThree matches "ccc") | ||
println("Passed repetition requires!") | ||
|
||
/**************************************************************************** | ||
* Additional pattern | ||
* Once you've added all the operators, it should be possible to replace | ||
* the following several definitions with: | ||
* val pattern = "42" || ( ('a' <*>) ~ ('b' <+>) ~ ('c'{3})) | ||
* Uses multiple operators together | ||
***************************************************************************/ | ||
val aStar = Star(Literal('a')) | ||
val bPlus = Concat(Literal('b'), Star(Literal('b'))) | ||
val pattern = Union(answer, Concat(aStar, Concat(bPlus, cThree))) | ||
val pattern = "42" || (('a'<*>) ~ ('b'<+>) ~ ('c'{3})) | ||
|
||
require(pattern matches "42") | ||
require(pattern matches "bccc") | ||
require(pattern matches "abccc") | ||
require(pattern matches "aabccc") | ||
require(pattern matches "aabbccc") | ||
require(pattern matches "aabbbbccc") | ||
println("Passed pattern requires!") | ||
|
||
/**************************************************************************** | ||
* Additional pattern | ||
* | ||
* Once you've added all the operators, it should be possible to replace | ||
* the following several definitions with: | ||
* val helloworld = ("hello" <*>) ~ "world" | ||
* Uses multiple operators together (again) | ||
***************************************************************************/ | ||
val hello = Concat(Literal('h'), Concat(Literal('e'), Concat(Literal('l'), | ||
Concat(Literal('l'), Literal('o'))))) | ||
|
||
val world = Concat(Literal('w'), Concat(Literal('o'), Concat(Literal('r'), | ||
Concat(Literal('l'), Literal('d'))))) | ||
|
||
val helloworld = Concat(Star(hello), world) | ||
val helloworld = ("hello"<*>) ~ "world" | ||
|
||
require(helloworld matches "helloworld") | ||
require(helloworld matches "world") | ||
require(helloworld matches "hellohelloworld") | ||
println("Passed hello world requires!") | ||
|
||
/**************************************************************************** | ||
* Additional pattern | ||
* | ||
* Once you've added all the operators, it should be possible to replace | ||
* the following several definitions with: | ||
* val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4} | ||
* Uses multiple operators together (again, again) | ||
***************************************************************************/ | ||
val threeDigits = Concat(digit, Concat(digit, digit)) | ||
val fourDigits = Concat(threeDigits, digit) | ||
val areaCode = Concat(Literal('('), Concat(threeDigits, Literal(')'))) | ||
val telNumber = Concat(areaCode, Concat(threeDigits, Concat(Literal('-'), fourDigits))) | ||
val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4} | ||
|
||
require(telNumber matches "(202)456-1111") | ||
println("Passed telephone requires!") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These printlns are a nice idea - I was unsure a couple of times when I ran it and everything passed and no output was printed whether it had actually worked. |
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,57 @@ | ||
package dsls.regex | ||
|
||
/** | ||
* Modify this file to implement an internal DSL for regular expressions. | ||
* | ||
* You're allowed to add anything you want to this file, but you're not allowed | ||
* to *remove* anything that currently appears in the file. | ||
*/ | ||
import scala.language.implicitConversions | ||
import scala.language.postfixOps | ||
|
||
/** The top of a class hierarchy that encodes regular expressions. */ | ||
abstract class RegularExpression { | ||
/** returns true if the given string matches this regular expression */ | ||
def matches(string: String) = RegexMatcher.matches(string, this) | ||
|
||
// operators | ||
/**************************************************************************** | ||
* Defines the || operator using the Union case class | ||
***************************************************************************/ | ||
def ||(other: RegularExpression): RegularExpression = Union(this, other) | ||
|
||
/**************************************************************************** | ||
* Defines the ~ operator using the Concat case class | ||
***************************************************************************/ | ||
def ~(other: RegularExpression): RegularExpression = Concat(this, other) | ||
|
||
/**************************************************************************** | ||
* Defines the <*> operator using the Star case class | ||
***************************************************************************/ | ||
def <*> = Star(this) | ||
|
||
/**************************************************************************** | ||
* Defines the <+> operator in terms of the ~ and <*> operators. Since the | ||
* <*> operator matches to EPSILON, we can prevent the <+> operator from | ||
* matching to EPSILON by ~ an occurrence of this to the this<*>. | ||
***************************************************************************/ | ||
def <+> = this ~ this<*> | ||
|
||
/**************************************************************************** | ||
* Recursively defines the {} operator by concatenating itself to itself | ||
* num times and terminating with an EPSILON. | ||
***************************************************************************/ | ||
def apply(num:Int): RegularExpression = | ||
if (num == 0) EPSILON else this ~ this{num - 1} | ||
} | ||
|
||
object RegularExpression { | ||
/**************************************************************************** | ||
* Defines the implicit conversion from char to RegularExpression | ||
***************************************************************************/ | ||
implicit def charToRegex(c:Char) = Literal(c) | ||
|
||
/**************************************************************************** | ||
* Defines the implicit recursive conversion from string to | ||
* RegularExpression. Modified to use the implicit character conversion | ||
* and the ~ operator. | ||
***************************************************************************/ | ||
implicit def strToRegex(s:String): RegularExpression = | ||
if (s.length() == 1) s.head else s.head ~ s.tail | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some interesting slight syntactic differences between our solutions - for instance you just said def <> while I said def <>(). Apart from that looks pretty straightforward. Your organization and header comments are nice. |
||
} | ||
|
||
/** a regular expression that matches nothing */ | ||
|
@@ -23,12 +64,10 @@ object EPSILON extends RegularExpression | |
case class Literal(val literal: Char) extends RegularExpression | ||
|
||
/** a regular expression that matches either one expression or another */ | ||
case class Union(val left: RegularExpression, val right: RegularExpression) | ||
extends RegularExpression | ||
case class Union(val left: RegularExpression, val right: RegularExpression) extends RegularExpression | ||
|
||
/** a regular expression that matches one expression followed by another */ | ||
case class Concat(val left: RegularExpression, val right: RegularExpression) | ||
extends RegularExpression | ||
case class Concat(val left: RegularExpression, val right: RegularExpression) extends RegularExpression | ||
|
||
/** a regular expression that matches zero or more repetitions of another | ||
* expression | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all makes sense. What about the char and string casts? I guess they aren't really operators, but they're still things we had to figure out. That said I still agree that none of the operators were really particularly difficult.