Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Andrew Fishberg #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 89 additions & 1 deletion reflection.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,37 @@

## Which operators were easiest to implement and why?

The easiest operators to implement were ||, ~, and <*>. These operators were
literally just wrappers around the Union(), Concat(), and Star() case classes
respectively. The only real difference is these operators were implemented
within the Regular Expression class so they could be infix operators (for ||
and ~) and a postfix operator (for <*>). Although <+> operator was still pretty
easy, unlike the previous examples, it was defined in terms of other
operatators (i.e. ~ and <*>) rather than just being a wrapper around a case
class.

Overall, these operators were so easy to implement since all the behavior
was already defined by RegexMatcher. This means the operators were really
only responsible for properly nesting input inside predefined
RegularExpression case classes.

## Which operators were most difficult to implement and why?

Although none of the operators were particularly difficult, apply (i.e. the {}
operator) was certainly the most difficult of the 5 operators. This is because
unlike the other operators, it required more than just nesting case classes
or using our other operators. Rather, the {} operator needed some type of
repetitive logical structure. Although I chose to use recursion (i.e. the {}
operator recursively uses itself until it hits the base case) a _for loop_
could have provided this same functionality.

In other words, although most of the behavior of the {} operator is defined
in RegexMatcher, this operator required a logical construction of the
case class nesting rather than simply relying on a case class to embody all
of this information. Although there could be a case class object to embody
this information, by defining the {} operator in terms of other operators and
case classes, we can simplify the behavior code in RegexMatcher.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense. What about the char and string casts? I guess they aren't really operators, but they're still things we had to figure out. That said I still agree that none of the operators were really particularly difficult.

## Comment on the design of this internal DSL

Write a few brief paragraphs that discuss:
Expand All @@ -15,4 +44,63 @@ Write a few brief paragraphs that discuss:
you implement it _or_ what features of Scala would prevent you from
implementing it? (You don't have to write code for this part. You could say
"I would use literal extension to..." or "Scala's rules for valid
identifiers prevent...")
identifiers prevent...")

This design works very well for a simple implementation of a Scala regex
internal DSL. That is, this implementation allows Scala users to natively use
the most common regex features within Scala. For instance, if Java was going to
attempt to implement a similar structure for regex (i.e. using classes rather
than packing it within the String class, like it actually does), you would
not be able to be this expressive. Java code, by not allowing implicit
conversions, operator overloads, and () omission in function calls, would be
forced to look similar to the initial Scala code we were provided. Although the
line between internal DSL and API is not clearly defined, the feel of this
final Scala code is much more DSL-like than anything Java would be able to
produce.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, and I'd hope that it'd be the case, given that it's what Scala was really designed for.


That being said, there are parts of this DSL that are a little clunky.
Java's implementation of regex, built into the String class, comes by default
with some very useful predefined regular expressions that match common
"character sets". Although we managed to implement digits in our code (e.g.
the set of digits '\d' in Java), we needed to do it by manually unioning all
the numeric characters. Although this was not too bad thanks to the ||
operator, this quickly become unrealistic for "character sets" that
include more than 10 characters (e.g. the set of non-digits '\D'). Although
not having to implement this saves the implementer work, it is very
undesirable for users.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But presumably, the implementers for Java's regex had to do the same thing - so we, as implementers, could be nice to our users and give them some of these sets if we're willing to put in the effort to define them. I guess this could be an implementer vs user tradeoff to some degree.


Additionally, since this internal DSL only tackles the most common regex
operations there are many features that are simply not available currently.
For instance, the '[^...]' subexpression, '\A', \z' in Java cannot be
implemented with the existing set of operations without further expanding
this DSL and the RegexMatcher. ('[^...]' matches any single character not
included in the brackets; '\A' matches the beginning of the string; '\z'
matches the end of a string). Again, although not having to implement these
operations saves the implementer work, it is very undesirable for users.

That being said, nothing about the Scala environment prevents there from
being ways to define these operations and character sets. Just like we
internally defined the objects EMPTY and EPSILON which match nothing
and the empty string respectively, we could define objects like DIGIT,
NONDIGIT, BEGIN, and END which match to digits, non-digits, string starts, and
string ends respectively. Although this would require the implementer to
convolute this implementation, the additional features would be necessary for
this internal DSL to match up against most common modern regex libraries.

Similarly, although we would probably need to define it as a postfix operator,
we could make something that matches the functionality of the '[^...]'. Although
this change would certainly make the DSL better, it is debatable whether this
can be incorperated into the syntax (this is a very important point). Although
things like NONDIGIT could be incorperated into our internal DSL so it is
functionally and almost syntaxtically iterchangable with the Java regex
implementation (i.e. at least in terms of ordering), there are certain
restrictions Scala puts that we cannot change. That is, as long as our regex
implementation remains an internal DSL, there is a set of features we cannot
change. For instance, Scala limits us to a small subset of unary prefix
operators. It is for this reason we would probably want to implement '[^...]'
as a postfix operator of some sort. Similarly, + and * are normally binary
operators in Scala, thus we use <+> and <*> as our unary postfix operators.
The only way to overcome these shortcomings would be to implement an
external DSL which adds a lot more work for the developer. Furthermore, forcing
users to relearn a standard (i.e. regex) just because of a shortcoming in your
implementation is a little unreasonable.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - that said, regexes aren't already perfectly standardized (I think), so we'll have to pick some kind of standard to go with. I'm curious what Scala features might let us implement '[^...]', or if something would prevent us. I think I'd have to see an example of it in use for that though.

It does seem like most of this can be viewed as tradeoffs between implementer and user, though, which I think is a good point.

125 changes: 48 additions & 77 deletions src/main/scala/dsls/regex/Program.scala
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,23 @@ package dsls.regex
object Program extends App {

/****************************************************************************
* TODO: Extend characters to support regular expressions
*
* Make it possible to replace the definition of the numbers with:
* val zero = '0'
* etc.
* Allows us to use our DSL in this Program object
***************************************************************************/
val zero = Literal('0')
val one = Literal('1')
val two = Literal('2')
val three = Literal('3')
val four = Literal('4')
val five = Literal('5')
val six = Literal('6')
val seven = Literal('7')
val eight = Literal('8')
val nine = Literal('9')
import RegularExpression._

/****************************************************************************
* Implicitly casts characters to a RegularExpression
***************************************************************************/
val zero = '0'
val one = '1'
val two = '2'
val three = '3'
val four = '4'
val five = '5'
val six = '6'
val seven = '7'
val eight = '8'
val nine = '9'

require(zero matches "0")
require(one matches "1")
Expand All @@ -30,25 +31,21 @@ object Program extends App {
require(seven matches "7")
require(eight matches "8")
require(nine matches "9")
println("Passed individual digit requires!")

/****************************************************************************
* TODO: Extend strings to support regular expressions
*
* Make it possible to replace the definition of answer with:
* val answer = "42"
***************************************************************************/
val answer = Concat(four, two)

* Implicitly casts strings to a RegularExpression
****************************************************************************/
val answer = "42"

require(answer matches "42")
println("Passed implicit string conversion require!")

/****************************************************************************
* TODO: Add the union operator for regular expressions
*
* Make it possible to replace the definition of digit with:
* val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9'
* Implicitly defines a RegularExpression with characters and the ||
* operator that matches to any digit
***************************************************************************/
val digit = Union(zero, Union(one, Union(two, Union(three, Union(four,
Union(five, Union(six, Union(seven, Union(eight, nine)))))))))
val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9'

require(digit matches "0")
require(digit matches "1")
Expand All @@ -59,103 +56,77 @@ object Program extends App {
require(digit matches "6")
require(digit matches "7")
require(digit matches "8")
require(digit matches "9")
require(digit matches "9")
println("Passed digit requires!")

/****************************************************************************
* TODO: Add the concatenation operator for regular expressions
*
* Make it possible to replace the definition of digit with:
* val pi = '3' ~ '1' ~ '4'
* Uses the ~ operator for concatenation
***************************************************************************/
val pi = Concat(Literal('3'), Concat(Literal('1'), Literal('4')))
val pi = '3' ~ '1' ~ '4'

require(pi matches "314")
println("Passed pi requires!")

/****************************************************************************
* TODO: Add the star operator for regular expressions
*
* Make it possible to replace the definition of zeroOrMoreDigits with:
* val zeroOrMoreDigits = digit <*>
* Uses the <*> operator
***************************************************************************/
val zeroOrMoreDigits = Star(digit)
val zeroOrMoreDigits = digit<*>

require(zeroOrMoreDigits matches "")
require(zeroOrMoreDigits matches "0")
require(zeroOrMoreDigits matches "9")
require(zeroOrMoreDigits matches "09")
require(zeroOrMoreDigits matches "987651234")
println("Passed digit requires!")

/****************************************************************************
* TODO: Add the plus operator for regular expressions
*
* Make it possible to replace the definition of number with:
* val number = digit <+>
* Uses the <+> operator
***************************************************************************/
val number = Concat(digit, zeroOrMoreDigits)
val number = digit<+>

require(!(number matches ""))
require(number matches "0")
require(number matches "9")
require(number matches "09")
require(number matches "987651234")
println("Passed number requires!")

/****************************************************************************
* TODO: Add the repetition operator for regular expressions
*
* Make it possible to replace the definition of cThree with:
* val cThree = 'c'{3}
* Uses the {} operator for repetition
***************************************************************************/
val cThree = Concat(Literal('c'), Concat(Literal('c'), Literal('c')))
val cThree = 'c'{3}

require(cThree matches "ccc")
println("Passed repetition requires!")

/****************************************************************************
* Additional pattern
* Once you've added all the operators, it should be possible to replace
* the following several definitions with:
* val pattern = "42" || ( ('a' <*>) ~ ('b' <+>) ~ ('c'{3}))
* Uses multiple operators together
***************************************************************************/
val aStar = Star(Literal('a'))
val bPlus = Concat(Literal('b'), Star(Literal('b')))
val pattern = Union(answer, Concat(aStar, Concat(bPlus, cThree)))
val pattern = "42" || (('a'<*>) ~ ('b'<+>) ~ ('c'{3}))

require(pattern matches "42")
require(pattern matches "bccc")
require(pattern matches "abccc")
require(pattern matches "aabccc")
require(pattern matches "aabbccc")
require(pattern matches "aabbbbccc")
println("Passed pattern requires!")

/****************************************************************************
* Additional pattern
*
* Once you've added all the operators, it should be possible to replace
* the following several definitions with:
* val helloworld = ("hello" <*>) ~ "world"
* Uses multiple operators together (again)
***************************************************************************/
val hello = Concat(Literal('h'), Concat(Literal('e'), Concat(Literal('l'),
Concat(Literal('l'), Literal('o')))))

val world = Concat(Literal('w'), Concat(Literal('o'), Concat(Literal('r'),
Concat(Literal('l'), Literal('d')))))

val helloworld = Concat(Star(hello), world)
val helloworld = ("hello"<*>) ~ "world"

require(helloworld matches "helloworld")
require(helloworld matches "world")
require(helloworld matches "hellohelloworld")
println("Passed hello world requires!")

/****************************************************************************
* Additional pattern
*
* Once you've added all the operators, it should be possible to replace
* the following several definitions with:
* val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4}
* Uses multiple operators together (again, again)
***************************************************************************/
val threeDigits = Concat(digit, Concat(digit, digit))
val fourDigits = Concat(threeDigits, digit)
val areaCode = Concat(Literal('('), Concat(threeDigits, Literal(')')))
val telNumber = Concat(areaCode, Concat(threeDigits, Concat(Literal('-'), fourDigits)))
val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4}

require(telNumber matches "(202)456-1111")
println("Passed telephone requires!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These printlns are a nice idea - I was unsure a couple of times when I ran it and everything passed and no output was printed whether it had actually worked.

}
59 changes: 49 additions & 10 deletions src/main/scala/dsls/regex/RegularExpression.scala
Original file line number Diff line number Diff line change
@@ -1,16 +1,57 @@
package dsls.regex

/**
* Modify this file to implement an internal DSL for regular expressions.
*
* You're allowed to add anything you want to this file, but you're not allowed
* to *remove* anything that currently appears in the file.
*/
import scala.language.implicitConversions
import scala.language.postfixOps

/** The top of a class hierarchy that encodes regular expressions. */
abstract class RegularExpression {
/** returns true if the given string matches this regular expression */
def matches(string: String) = RegexMatcher.matches(string, this)

// operators
/****************************************************************************
* Defines the || operator using the Union case class
***************************************************************************/
def ||(other: RegularExpression): RegularExpression = Union(this, other)

/****************************************************************************
* Defines the ~ operator using the Concat case class
***************************************************************************/
def ~(other: RegularExpression): RegularExpression = Concat(this, other)

/****************************************************************************
* Defines the <*> operator using the Star case class
***************************************************************************/
def <*> = Star(this)

/****************************************************************************
* Defines the <+> operator in terms of the ~ and <*> operators. Since the
* <*> operator matches to EPSILON, we can prevent the <+> operator from
* matching to EPSILON by ~ an occurrence of this to the this<*>.
***************************************************************************/
def <+> = this ~ this<*>

/****************************************************************************
* Recursively defines the {} operator by concatenating itself to itself
* num times and terminating with an EPSILON.
***************************************************************************/
def apply(num:Int): RegularExpression =
if (num == 0) EPSILON else this ~ this{num - 1}
}

object RegularExpression {
/****************************************************************************
* Defines the implicit conversion from char to RegularExpression
***************************************************************************/
implicit def charToRegex(c:Char) = Literal(c)

/****************************************************************************
* Defines the implicit recursive conversion from string to
* RegularExpression. Modified to use the implicit character conversion
* and the ~ operator.
***************************************************************************/
implicit def strToRegex(s:String): RegularExpression =
if (s.length() == 1) s.head else s.head ~ s.tail
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some interesting slight syntactic differences between our solutions - for instance you just said def <> while I said def <>(). Apart from that looks pretty straightforward. Your organization and header comments are nice.

}

/** a regular expression that matches nothing */
Expand All @@ -23,12 +64,10 @@ object EPSILON extends RegularExpression
case class Literal(val literal: Char) extends RegularExpression

/** a regular expression that matches either one expression or another */
case class Union(val left: RegularExpression, val right: RegularExpression)
extends RegularExpression
case class Union(val left: RegularExpression, val right: RegularExpression) extends RegularExpression

/** a regular expression that matches one expression followed by another */
case class Concat(val left: RegularExpression, val right: RegularExpression)
extends RegularExpression
case class Concat(val left: RegularExpression, val right: RegularExpression) extends RegularExpression

/** a regular expression that matches zero or more repetitions of another
* expression
Expand Down