From 0755f8a098a3d5c629745286d1ab0b933d804d07 Mon Sep 17 00:00:00 2001 From: Andrew Fishberg Date: Tue, 1 Mar 2016 23:58:44 -0800 Subject: [PATCH] complete hw5 --- reflection.md | 90 ++++++++++++- src/main/scala/dsls/regex/Program.scala | 125 +++++++----------- .../scala/dsls/regex/RegularExpression.scala | 59 +++++++-- 3 files changed, 186 insertions(+), 88 deletions(-) diff --git a/reflection.md b/reflection.md index 0719389..34e3515 100644 --- a/reflection.md +++ b/reflection.md @@ -2,8 +2,37 @@ ## Which operators were easiest to implement and why? +The easiest operators to implement were ||, ~, and <*>. These operators were +literally just wrappers around the Union(), Concat(), and Star() case classes +respectively. The only real difference is these operators were implemented +within the Regular Expression class so they could be infix operators (for || +and ~) and a postfix operator (for <*>). Although <+> operator was still pretty +easy, unlike the previous examples, it was defined in terms of other +operatators (i.e. ~ and <*>) rather than just being a wrapper around a case +class. + +Overall, these operators were so easy to implement since all the behavior +was already defined by RegexMatcher. This means the operators were really +only responsible for properly nesting input inside predefined +RegularExpression case classes. + ## Which operators were most difficult to implement and why? +Although none of the operators were particularly difficult, apply (i.e. the {} +operator) was certainly the most difficult of the 5 operators. This is because +unlike the other operators, it required more than just nesting case classes +or using our other operators. Rather, the {} operator needed some type of +repetitive logical structure. Although I chose to use recursion (i.e. the {} +operator recursively uses itself until it hits the base case) a _for loop_ +could have provided this same functionality. + +In other words, although most of the behavior of the {} operator is defined +in RegexMatcher, this operator required a logical construction of the +case class nesting rather than simply relying on a case class to embody all +of this information. Although there could be a case class object to embody +this information, by defining the {} operator in terms of other operators and +case classes, we can simplify the behavior code in RegexMatcher. + ## Comment on the design of this internal DSL Write a few brief paragraphs that discuss: @@ -15,4 +44,63 @@ Write a few brief paragraphs that discuss: you implement it _or_ what features of Scala would prevent you from implementing it? (You don't have to write code for this part. You could say "I would use literal extension to..." or "Scala's rules for valid - identifiers prevent...") \ No newline at end of file + identifiers prevent...") + +This design works very well for a simple implementation of a Scala regex +internal DSL. That is, this implementation allows Scala users to natively use +the most common regex features within Scala. For instance, if Java was going to +attempt to implement a similar structure for regex (i.e. using classes rather +than packing it within the String class, like it actually does), you would +not be able to be this expressive. Java code, by not allowing implicit +conversions, operator overloads, and () omission in function calls, would be +forced to look similar to the initial Scala code we were provided. Although the +line between internal DSL and API is not clearly defined, the feel of this +final Scala code is much more DSL-like than anything Java would be able to +produce. + +That being said, there are parts of this DSL that are a little clunky. +Java's implementation of regex, built into the String class, comes by default +with some very useful predefined regular expressions that match common +"character sets". Although we managed to implement digits in our code (e.g. +the set of digits '\d' in Java), we needed to do it by manually unioning all +the numeric characters. Although this was not too bad thanks to the || +operator, this quickly become unrealistic for "character sets" that +include more than 10 characters (e.g. the set of non-digits '\D'). Although +not having to implement this saves the implementer work, it is very +undesirable for users. + +Additionally, since this internal DSL only tackles the most common regex +operations there are many features that are simply not available currently. +For instance, the '[^...]' subexpression, '\A', \z' in Java cannot be +implemented with the existing set of operations without further expanding +this DSL and the RegexMatcher. ('[^...]' matches any single character not +included in the brackets; '\A' matches the beginning of the string; '\z' +matches the end of a string). Again, although not having to implement these +operations saves the implementer work, it is very undesirable for users. + +That being said, nothing about the Scala environment prevents there from +being ways to define these operations and character sets. Just like we +internally defined the objects EMPTY and EPSILON which match nothing +and the empty string respectively, we could define objects like DIGIT, +NONDIGIT, BEGIN, and END which match to digits, non-digits, string starts, and +string ends respectively. Although this would require the implementer to +convolute this implementation, the additional features would be necessary for +this internal DSL to match up against most common modern regex libraries. + +Similarly, although we would probably need to define it as a postfix operator, +we could make something that matches the functionality of the '[^...]'. Although +this change would certainly make the DSL better, it is debatable whether this +can be incorperated into the syntax (this is a very important point). Although +things like NONDIGIT could be incorperated into our internal DSL so it is +functionally and almost syntaxtically iterchangable with the Java regex +implementation (i.e. at least in terms of ordering), there are certain +restrictions Scala puts that we cannot change. That is, as long as our regex +implementation remains an internal DSL, there is a set of features we cannot +change. For instance, Scala limits us to a small subset of unary prefix +operators. It is for this reason we would probably want to implement '[^...]' +as a postfix operator of some sort. Similarly, + and * are normally binary +operators in Scala, thus we use <+> and <*> as our unary postfix operators. +The only way to overcome these shortcomings would be to implement an +external DSL which adds a lot more work for the developer. Furthermore, forcing +users to relearn a standard (i.e. regex) just because of a shortcoming in your +implementation is a little unreasonable. diff --git a/src/main/scala/dsls/regex/Program.scala b/src/main/scala/dsls/regex/Program.scala index 38a0302..bc91e78 100644 --- a/src/main/scala/dsls/regex/Program.scala +++ b/src/main/scala/dsls/regex/Program.scala @@ -3,22 +3,23 @@ package dsls.regex object Program extends App { /**************************************************************************** - * TODO: Extend characters to support regular expressions - * - * Make it possible to replace the definition of the numbers with: - * val zero = '0' - * etc. + * Allows us to use our DSL in this Program object ***************************************************************************/ - val zero = Literal('0') - val one = Literal('1') - val two = Literal('2') - val three = Literal('3') - val four = Literal('4') - val five = Literal('5') - val six = Literal('6') - val seven = Literal('7') - val eight = Literal('8') - val nine = Literal('9') + import RegularExpression._ + + /**************************************************************************** + * Implicitly casts characters to a RegularExpression + ***************************************************************************/ + val zero = '0' + val one = '1' + val two = '2' + val three = '3' + val four = '4' + val five = '5' + val six = '6' + val seven = '7' + val eight = '8' + val nine = '9' require(zero matches "0") require(one matches "1") @@ -30,25 +31,21 @@ object Program extends App { require(seven matches "7") require(eight matches "8") require(nine matches "9") + println("Passed individual digit requires!") /**************************************************************************** - * TODO: Extend strings to support regular expressions - * - * Make it possible to replace the definition of answer with: - * val answer = "42" - ***************************************************************************/ - val answer = Concat(four, two) - + * Implicitly casts strings to a RegularExpression + ****************************************************************************/ + val answer = "42" + require(answer matches "42") + println("Passed implicit string conversion require!") /**************************************************************************** - * TODO: Add the union operator for regular expressions - * - * Make it possible to replace the definition of digit with: - * val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9' + * Implicitly defines a RegularExpression with characters and the || + * operator that matches to any digit ***************************************************************************/ - val digit = Union(zero, Union(one, Union(two, Union(three, Union(four, - Union(five, Union(six, Union(seven, Union(eight, nine))))))))) + val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9' require(digit matches "0") require(digit matches "1") @@ -59,65 +56,53 @@ object Program extends App { require(digit matches "6") require(digit matches "7") require(digit matches "8") - require(digit matches "9") + require(digit matches "9") + println("Passed digit requires!") /**************************************************************************** - * TODO: Add the concatenation operator for regular expressions - * - * Make it possible to replace the definition of digit with: - * val pi = '3' ~ '1' ~ '4' + * Uses the ~ operator for concatenation ***************************************************************************/ - val pi = Concat(Literal('3'), Concat(Literal('1'), Literal('4'))) + val pi = '3' ~ '1' ~ '4' require(pi matches "314") + println("Passed pi requires!") /**************************************************************************** - * TODO: Add the star operator for regular expressions - * - * Make it possible to replace the definition of zeroOrMoreDigits with: - * val zeroOrMoreDigits = digit <*> + * Uses the <*> operator ***************************************************************************/ - val zeroOrMoreDigits = Star(digit) + val zeroOrMoreDigits = digit<*> require(zeroOrMoreDigits matches "") require(zeroOrMoreDigits matches "0") require(zeroOrMoreDigits matches "9") require(zeroOrMoreDigits matches "09") require(zeroOrMoreDigits matches "987651234") + println("Passed digit requires!") /**************************************************************************** - * TODO: Add the plus operator for regular expressions - * - * Make it possible to replace the definition of number with: - * val number = digit <+> + * Uses the <+> operator ***************************************************************************/ - val number = Concat(digit, zeroOrMoreDigits) + val number = digit<+> require(!(number matches "")) require(number matches "0") require(number matches "9") require(number matches "09") require(number matches "987651234") + println("Passed number requires!") /**************************************************************************** - * TODO: Add the repetition operator for regular expressions - * - * Make it possible to replace the definition of cThree with: - * val cThree = 'c'{3} + * Uses the {} operator for repetition ***************************************************************************/ - val cThree = Concat(Literal('c'), Concat(Literal('c'), Literal('c'))) + val cThree = 'c'{3} require(cThree matches "ccc") + println("Passed repetition requires!") /**************************************************************************** - * Additional pattern - * Once you've added all the operators, it should be possible to replace - * the following several definitions with: - * val pattern = "42" || ( ('a' <*>) ~ ('b' <+>) ~ ('c'{3})) + * Uses multiple operators together ***************************************************************************/ - val aStar = Star(Literal('a')) - val bPlus = Concat(Literal('b'), Star(Literal('b'))) - val pattern = Union(answer, Concat(aStar, Concat(bPlus, cThree))) + val pattern = "42" || (('a'<*>) ~ ('b'<+>) ~ ('c'{3})) require(pattern matches "42") require(pattern matches "bccc") @@ -125,37 +110,23 @@ object Program extends App { require(pattern matches "aabccc") require(pattern matches "aabbccc") require(pattern matches "aabbbbccc") + println("Passed pattern requires!") /**************************************************************************** - * Additional pattern - * - * Once you've added all the operators, it should be possible to replace - * the following several definitions with: - * val helloworld = ("hello" <*>) ~ "world" + * Uses multiple operators together (again) ***************************************************************************/ - val hello = Concat(Literal('h'), Concat(Literal('e'), Concat(Literal('l'), - Concat(Literal('l'), Literal('o'))))) - - val world = Concat(Literal('w'), Concat(Literal('o'), Concat(Literal('r'), - Concat(Literal('l'), Literal('d'))))) - - val helloworld = Concat(Star(hello), world) + val helloworld = ("hello"<*>) ~ "world" require(helloworld matches "helloworld") require(helloworld matches "world") require(helloworld matches "hellohelloworld") + println("Passed hello world requires!") /**************************************************************************** - * Additional pattern - * - * Once you've added all the operators, it should be possible to replace - * the following several definitions with: - * val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4} + * Uses multiple operators together (again, again) ***************************************************************************/ - val threeDigits = Concat(digit, Concat(digit, digit)) - val fourDigits = Concat(threeDigits, digit) - val areaCode = Concat(Literal('('), Concat(threeDigits, Literal(')'))) - val telNumber = Concat(areaCode, Concat(threeDigits, Concat(Literal('-'), fourDigits))) + val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4} require(telNumber matches "(202)456-1111") + println("Passed telephone requires!") } diff --git a/src/main/scala/dsls/regex/RegularExpression.scala b/src/main/scala/dsls/regex/RegularExpression.scala index 199fe1e..e32e1c4 100644 --- a/src/main/scala/dsls/regex/RegularExpression.scala +++ b/src/main/scala/dsls/regex/RegularExpression.scala @@ -1,16 +1,57 @@ package dsls.regex -/** - * Modify this file to implement an internal DSL for regular expressions. - * - * You're allowed to add anything you want to this file, but you're not allowed - * to *remove* anything that currently appears in the file. - */ +import scala.language.implicitConversions +import scala.language.postfixOps /** The top of a class hierarchy that encodes regular expressions. */ abstract class RegularExpression { /** returns true if the given string matches this regular expression */ def matches(string: String) = RegexMatcher.matches(string, this) + + // operators + /**************************************************************************** + * Defines the || operator using the Union case class + ***************************************************************************/ + def ||(other: RegularExpression): RegularExpression = Union(this, other) + + /**************************************************************************** + * Defines the ~ operator using the Concat case class + ***************************************************************************/ + def ~(other: RegularExpression): RegularExpression = Concat(this, other) + + /**************************************************************************** + * Defines the <*> operator using the Star case class + ***************************************************************************/ + def <*> = Star(this) + + /**************************************************************************** + * Defines the <+> operator in terms of the ~ and <*> operators. Since the + * <*> operator matches to EPSILON, we can prevent the <+> operator from + * matching to EPSILON by ~ an occurrence of this to the this<*>. + ***************************************************************************/ + def <+> = this ~ this<*> + + /**************************************************************************** + * Recursively defines the {} operator by concatenating itself to itself + * num times and terminating with an EPSILON. + ***************************************************************************/ + def apply(num:Int): RegularExpression = + if (num == 0) EPSILON else this ~ this{num - 1} +} + +object RegularExpression { + /**************************************************************************** + * Defines the implicit conversion from char to RegularExpression + ***************************************************************************/ + implicit def charToRegex(c:Char) = Literal(c) + + /**************************************************************************** + * Defines the implicit recursive conversion from string to + * RegularExpression. Modified to use the implicit character conversion + * and the ~ operator. + ***************************************************************************/ + implicit def strToRegex(s:String): RegularExpression = + if (s.length() == 1) s.head else s.head ~ s.tail } /** a regular expression that matches nothing */ @@ -23,12 +64,10 @@ object EPSILON extends RegularExpression case class Literal(val literal: Char) extends RegularExpression /** a regular expression that matches either one expression or another */ -case class Union(val left: RegularExpression, val right: RegularExpression) - extends RegularExpression +case class Union(val left: RegularExpression, val right: RegularExpression) extends RegularExpression /** a regular expression that matches one expression followed by another */ -case class Concat(val left: RegularExpression, val right: RegularExpression) - extends RegularExpression +case class Concat(val left: RegularExpression, val right: RegularExpression) extends RegularExpression /** a regular expression that matches zero or more repetitions of another * expression