Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aaron Stringer-Usdan #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions reflection.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
# Reflection on implementing regular expressions of a DSL

## Which operators were easiest to implement and why?
I found all of the operators pretty easy to implement, but I guess the * and + operators were the easiest. * was of course easy because all it took was calling the Star constructor. + was nearly as easy since its definition wasn't much more complex and was entirely in terms of other operators.

## Which operators were most difficult to implement and why?
The ~ and || were somewhat more difficult, by which I mean that I spent more time figuring out how to make them work and not that I had to write significantly more complex code for them. For reasons I'm not entirely sure of (I guess it's because of the pattern matching?) it was not possible to use pattern matching on a Union or Concat object created with the operator when the definition looked like
```Scala
def ||(other:RegularExpression) = Union(this, other)
```
I looked at the sample solution to see why this might not be working, and found "unapply" methods. After reading about them online, I figured out what they do but wouldn't have thought to use them so I wanted to see if I could allow pattern matching in a different way. For reasons I'm completely not sure of (and this might be super hacky and I shouldn't have done it), specifying the return type of these operators as `RegularExpression` made the pattern matching work without
extractor methods.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird. My Union/Concat methods were working fine without me specifying the return type, and I didn't use any extractor methods.

## Comment on the design of this internal DSL
Now that I've got the whole thing working, it seems to have a lot of the features you'd want in a DSL. From a strictly theoretical point of view, you can write any regular expression you could want. All the operators are there, and represented by no more than three characters, too (unless you count the number of characters in the number _n_ you supply to the repetition operator).

However, the formal mathematical definition of regular expressions doesn't have wildcard characters so far as I can recall, and neither does this DSL. It _is_ possible to get a wildcard expression by taking the union of all characters that could possibly be matched against, but this is _really_ not practical for someone using the DSL.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be too concerned about the formal mathematical definition of regular expressions. What's important is making it easy for users to write regular expressions, and I agree that adding wildcards would be a good next step. In addition to the general wildcard, it could also be nice to have symbols that would match against all digits, all letters, etc.

It might be nice if the language had some kind of wildcard object (just like there's an empty regex object). It probably couldn't be called "." since, I'm pretty sure, that's against the rules for identifiers in Scala. You could still call it something like "WILD", or use a
unicode character, and just that would make it much easier to use wildcards.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would there be a way to use a unicode character without having users type out the hex value? WILD seems like it'd be okay, though it's a bit long to type out.

Write a few brief paragraphs that discuss:
+ What works about this design? (For example, what things seem easy and
natural to say, using the DSL?)
+ What doesn't work about this design? (For example, what things seem
cumbersome to say?)
+ Think of a syntactic change that might make the language better. How would
you implement it _or_ what features of Scala would prevent you from
implementing it? (You don't have to write code for this part. You could say
"I would use literal extension to..." or "Scala's rules for valid
identifiers prevent...")
53 changes: 23 additions & 30 deletions src/main/scala/dsls/regex/Program.scala
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
package dsls.regex

import scala.language.postfixOps

object Program extends App {

import RegularExpression._

/****************************************************************************
* TODO: Extend characters to support regular expressions
Expand All @@ -9,16 +13,16 @@ object Program extends App {
* val zero = '0'
* etc.
***************************************************************************/
val zero = Literal('0')
val one = Literal('1')
val two = Literal('2')
val three = Literal('3')
val four = Literal('4')
val five = Literal('5')
val six = Literal('6')
val seven = Literal('7')
val eight = Literal('8')
val nine = Literal('9')
val zero = '0'
val one = '1'
val two = '2'
val three = '3'
val four = '4'
val five = '5'
val six = '6'
val seven = '7'
val eight = '8'
val nine = '9'

require(zero matches "0")
require(one matches "1")
Expand All @@ -37,7 +41,7 @@ object Program extends App {
* Make it possible to replace the definition of answer with:
* val answer = "42"
***************************************************************************/
val answer = Concat(four, two)
val answer = "42"

require(answer matches "42")

Expand All @@ -47,8 +51,7 @@ object Program extends App {
* Make it possible to replace the definition of digit with:
* val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9'
***************************************************************************/
val digit = Union(zero, Union(one, Union(two, Union(three, Union(four,
Union(five, Union(six, Union(seven, Union(eight, nine)))))))))
val digit = '0' || '1' || '2' || '3' || '4' || '5' || '6' || '7' || '8' || '9'

require(digit matches "0")
require(digit matches "1")
Expand All @@ -67,7 +70,7 @@ object Program extends App {
* Make it possible to replace the definition of digit with:
* val pi = '3' ~ '1' ~ '4'
***************************************************************************/
val pi = Concat(Literal('3'), Concat(Literal('1'), Literal('4')))
val pi = '3' ~ '1' ~ '4'

require(pi matches "314")

Expand All @@ -77,7 +80,7 @@ object Program extends App {
* Make it possible to replace the definition of zeroOrMoreDigits with:
* val zeroOrMoreDigits = digit <*>
***************************************************************************/
val zeroOrMoreDigits = Star(digit)
val zeroOrMoreDigits = digit*

require(zeroOrMoreDigits matches "")
require(zeroOrMoreDigits matches "0")
Expand All @@ -91,7 +94,7 @@ object Program extends App {
* Make it possible to replace the definition of number with:
* val number = digit <+>
***************************************************************************/
val number = Concat(digit, zeroOrMoreDigits)
val number = digit+

require(!(number matches ""))
require(number matches "0")
Expand All @@ -105,7 +108,7 @@ object Program extends App {
* Make it possible to replace the definition of cThree with:
* val cThree = 'c'{3}
***************************************************************************/
val cThree = Concat(Literal('c'), Concat(Literal('c'), Literal('c')))
val cThree = 'c'{3}

require(cThree matches "ccc")

Expand All @@ -115,9 +118,7 @@ object Program extends App {
* the following several definitions with:
* val pattern = "42" || ( ('a' <*>) ~ ('b' <+>) ~ ('c'{3}))
***************************************************************************/
val aStar = Star(Literal('a'))
val bPlus = Concat(Literal('b'), Star(Literal('b')))
val pattern = Union(answer, Concat(aStar, Concat(bPlus, cThree)))
val pattern = "42"||(('a'<*>)~('b'<+>)~('c'{3}))

require(pattern matches "42")
require(pattern matches "bccc")
Expand All @@ -133,13 +134,8 @@ object Program extends App {
* the following several definitions with:
* val helloworld = ("hello" <*>) ~ "world"
***************************************************************************/
val hello = Concat(Literal('h'), Concat(Literal('e'), Concat(Literal('l'),
Concat(Literal('l'), Literal('o')))))

val world = Concat(Literal('w'), Concat(Literal('o'), Concat(Literal('r'),
Concat(Literal('l'), Literal('d')))))

val helloworld = Concat(Star(hello), world)
val helloworld = ("hello"<*>)~"world"

require(helloworld matches "helloworld")
require(helloworld matches "world")
Expand All @@ -152,10 +148,7 @@ object Program extends App {
* the following several definitions with:
* val telNumber = '(' ~ digit{3} ~ ')' ~ digit{3} ~ '-' ~ digit{4}
***************************************************************************/
val threeDigits = Concat(digit, Concat(digit, digit))
val fourDigits = Concat(threeDigits, digit)
val areaCode = Concat(Literal('('), Concat(threeDigits, Literal(')')))
val telNumber = Concat(areaCode, Concat(threeDigits, Concat(Literal('-'), fourDigits)))
val telNumber = '('~digit{3}~')'~digit{3}~'-'~digit{4}

require(telNumber matches "(202)456-1111")
}
27 changes: 27 additions & 0 deletions src/main/scala/dsls/regex/RegularExpression.scala
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
package dsls.regex

import scala.language.implicitConversions
import scala.language.postfixOps

/**
* Modify this file to implement an internal DSL for regular expressions.
*
Expand All @@ -11,6 +14,21 @@ package dsls.regex
abstract class RegularExpression {
/** returns true if the given string matches this regular expression */
def matches(string: String) = RegexMatcher.matches(string, this)
def matches(char: Char) = RegexMatcher.matches(char.toString, this)

def ||(other:RegularExpression):RegularExpression = Union(this, other)
def ~(other:RegularExpression):RegularExpression = Concat(this, other)
//Specifying the return type of the above functions means we don't need extractors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error message did you get when you didn't specify the return type? Curious because, as stated above, I didn't include the return type in my code and it worked fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember right, it said that objects of type Union and Concat don't have matches as a member function. I don't know why it couldn't figure out to typecast them to RegularExpression, but it couldn't, apparently. edit: or for that matter why they didn't inherit matches to begin with

def <*> = Star(this)
def * = this<*>
def <+> = this ~ (this <*>)
def + = this<+>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no comments...


def apply(n:Int):RegularExpression = {
if(n==0) EPSILON
else this~this{n-1}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concise than my version. You probably want to do something about negative values, though (ie throw an error). No one likes stack overflows..

}

/** a regular expression that matches nothing */
Expand All @@ -34,3 +52,12 @@ case class Concat(val left: RegularExpression, val right: RegularExpression)
* expression
*/
case class Star(val expression: RegularExpression) extends RegularExpression

object RegularExpression {
implicit def charToRegex(c: Char): RegularExpression = Literal(c)

implicit def strToRegex(s: String): RegularExpression = {
((EPSILON:RegularExpression) /: s)((x,y)=>Concat(x,y))
}
}