Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce scalding-quotation sub-project #1755

Merged
merged 1 commit into from
Dec 18, 2017

Conversation

fwbrasil
Copy link
Contributor

See #1754 for more details

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reading, but wanted to share some initial comments.

package com.twitter.scalding.quotation

sealed trait Projection {
def andThen(prop: String, tpe: String): Projection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the default implementation here? Looks like both cases have the same implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we should use some AnyVal wrappers here instead of String. Something like:

case class Accessor(asString: String) extends AnyVal
case class TypeName(asString: String) extends AnyVal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"method with params isn't considered as projection" in {
test
.function[Person, String](_.name.substring(1))._1
.projections.set mustEqual Set(name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what name is here? How is this compiling?

trait Test extends FreeSpec with MustMatchers

val person = TypeReference(cls[Person])
val name = person.andThen("name", cls[String])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, here it is. Can we not do this and instead add an explicit import? It makes it harder to read the tests without a clear benefit if you ask me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the projections to the Person companion object

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments.

PS: in my view, the idea PR is about 400 lines. This is better since it is smaller, but still quite large.

def of(tpe: String, superClass: Class[_]): Projections = {

def byType(p: Projection) = {
def loop(p: Projection): Boolean =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add the @annotation.tailrec here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

false
}

def loop(p: Projection): Either[Projection, Option[Projection]] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add @annotation.tailrec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not tail recursive

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duh. yep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeName in comment instead of tpe?

Either.cond(!isSubclass(tpe), None, p)
case p @ Property(path, name, tpe) =>
loop(path) match {
case Left(path) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use Left(_) => here since we are not using path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

loop(p)
}

def bySuperClass(p: Projection) = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the return type? I think Option[Projection]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

p match {
case TypeReference(tpe) =>
base match {
case TypeReference(`tpe`) => Some(p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some(p) == Some(base) in this branch right? Can we write it that way so we can see in both branches we are returning Some(base) or None. In fact, maybe we should return Boolean instead so it is easier to follow and use filter below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop here does more than just filter the projection. It reconstructs the projections on top of the base projections using the type as the matching criteria. Take a look at line 99, where it uses option.map to recreate the property on top of the new base.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I missed that branch. But line 95 could be Some(base) right (since we are checking equality)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could. It doesn't seem to matter, though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is easier to read that both branches are returning the same thing. I feel that is a value to someone understanding the algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've changed it

def apply(set: Set[Projection]) = {
new Projections(
set.filter {
case Property(path, name, tpe) => !set.contains(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me this algorithm works. It seems the order you add things in matters here. It seems you need to add shortest paths first and just going through the set may cause you to add a path that later you will not want in there.

Do you follow my point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the property you want to maintain, how are we maintaining it through ++? Can't you have two different projections in two different sets that need to have one filtered after a ++? Can you spell out the invariant above the class in the comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by ordering since it filters an unordered set based on the same complete unordered set, but indeed this algorithm has a bug. It doesn't consider nested intersections. For instance, person.contact and person.contact.phone.number would return both projections, which is wrong. I've just fixed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invariant holds for ++ since it also uses the apply method. I'm not sure the invariant needs to be in the class comments since it's already in the apply method, but I don't feel strongly about it.

})
}

def flatten(list: List[Projections]): Projections =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Iterable here since ++ is commutative right? We don't care about an particular order do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CLAassistant
Copy link

CLAassistant commented Dec 12, 2017

CLA assistant check
All committers have signed the CLA.

false
}

def loop(p: Projection): Either[Projection, Option[Projection]] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duh. yep.

p match {
case TypeReference(tpe) =>
base match {
case TypeReference(`tpe`) => Some(p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I missed that branch. But line 95 could be Some(base) right (since we are checking equality)?

val paramsProjecttions = params.flatMap(Projection.unapply)
q"""
$func match {
case f: com.twitter.scalding.quotation.QuotedFunction =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to use _root_.com here? I thought we needed to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to have it for hygiene. If the user has a com object it'll fail, for instance. Note that even _root_ doesn't guarantee hygiene because that's a value that the user could define, but that should never happen :)

I'll add _root_ to all trees


val inputSymbols = inputs.map(_.symbol).toSet

object Projection {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an object rather than just an def unapply(t: Tree): Option[Tree] method? Can you comment?

}

def functionCall(func: Tree, params: List[Tree]) = {
val paramsProjecttions = params.flatMap(Projection.unapply)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Projections is spelled wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

functionCall(func, params)
case q"$func(..$params)" if isFunction(func) =>
functionCall(func, params)
case t @ Projection(p) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I guess you are using unapply here. Can we just comment on the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to ProjectionExtractor. Is it clear enough to avoid the comment?

.exists(c.enclosingPosition.source.path.contains)

def isScalding(sym: Symbol): Boolean =
sym.fullName.contains("com.twitter.scalding") || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want startsWith here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Fixed

"non-quoted" in {
val function = (p: Person) => p.name
test.function[Person, String](function)._1
.projections.set mustEqual Set(Person.typeReference)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, does typeReference mean we are keeping the whole thing (when we see that, we can't do any projections)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it must assume that all properties are used since the function isn't introspectable

val p = Projections(Set(p1)) ++ Projections(Set(p2))
p.set mustEqual Set(p1, p2)
}
"with merge" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we could do a scalacheck test here. We could test that for all List[Projections] then if we ++ all of them, we have a final result where there are not Projections which are "suffixes" of another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of property-based testing. Is there something that would have better coverage if I use it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHAT!?!? Who is not a big fan of property based checks!? Why don't you like it? I love it. It gives you an ability to clearly state properties and a decent approach at checking if what you claim is true actually is. Of course, with poor generators, it often doesn't hit tricky cases, but I have found it to be hugely helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is refactored into simple functions that don't require complex structures, unit tests are easier to implement and reason about. Property-based tests take longer to implement, are much harder to debug, and don't bring many benefits if compared to simple unit tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reason I brought it up is that I thought I noticed a bug which I think a scalacheck would have caught.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if I had thought about that scenario and checked the property :) The same is valid for unit tests. I can implement the property-based test if you feel strongly about it, though.

Projections.empty.toString mustEqual "Projections()"
}
"non-empty" in {
Projections(Set(p1, p2)).toString mustEqual "Projections(T1.p1, T2.p2)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another scalacheck might be that Projections(someSet).toSet.size <= someSet.size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@fwbrasil fwbrasil force-pushed the scalding-quotation branch 2 times, most recently from 7cc1994 to c0d1897 Compare December 13, 2017 04:43
* collected tree.
*/
def collect[T](tree: Tree)(f: PartialFunction[Tree, T]): List[T] = {
var res = List[T]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using a List.newBuilder[T] and using += on it which I think gives O(1) append (because they cheat and mutate the list). Currently I think this code is O(N^2) since it is using append on a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


"nested transitive projection" in pendingUntilFixed {
test.function[Person, Option[String]](_.alternativeContact.map(_.phone))._1.projections.set mustEqual
Set(Person.typeReference.andThen(Accessor("alternativeContact"), typeName[Option[Contact]]).andThen(Accessor("phone"), typeName[String]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this confuses me. How do we distinguish an andThen projection inside or outside an Option? Seems like we need to record the input type we are dealing with also. We have Option[Contact] as the output of the previous Projection, but how do we jump across the .map on the Option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a failing test with pendingUntilFixed. The projections in this scenario are Person.alternativeContact and Contact.phone, which are unrelated and thus don't get filtered out by Projections. Ideally, the macro should apply a transformation similar to beta reduction to produce only the projection Person.alternativeContact.phone.

@fwbrasil
Copy link
Contributor Author

@johnynek @benpence @dieu @ttim Could someone merge this? I don't have access. I'd like to move forward and submit the second PR.

@johnynek johnynek merged commit 0614b51 into twitter:develop Dec 18, 2017
Copy link
Collaborator

@ttim ttim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only real concern from me - Projections sounds more like Map[ParamName, Set[Projections]] and not like Set[Projection].

Feel free to address comments in subsequent reviews.


import scala.reflect.macros.blackbox.Context

trait Liftables {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment on what Liftable means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* A projection property (e.g. `Person.name`)
*/
final case class Property(path: Projection, accessor: Accessor, typeName: TypeName) extends Projection {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeName is type of this Projection? It sound like something which belongs to any Projection, does it make sense to put it on the level of Projection then?

Copy link
Contributor Author

@fwbrasil fwbrasil Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, there isn't immediate benefit in abstracting it and the typeName of a TypeReference doesn't really have the same meaning as the typeName of a Property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point more about typeName being part of any Projection in some way (I guess it's type of Projection). I definitely don't want to introduce it as an abstraction, more like a property which we have for any Projection. If you can see it as something non consistent between implementations I'm ok with it.

false
}

def loop(p: Projection): Either[Projection, Option[Projection]] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeName in comment instead of tpe?

* Returns the projections that are based on `tpe` and limits projections
* to only properties that extend from `superClass`.
*/
def of(typeName: TypeName, superClass: Class[_]): Projections = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of sounds like method on companion object for me and I was reading this method this way. Maybe filterBySuperClass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterBySuperClass doesn't represent well what this method does. Would you have another suggestion? I find of a good name to be honest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like of, but I perceive it as something on companion objection like Projections.of(clazz).

*/
def of(typeName: TypeName, superClass: Class[_]): Projections = {

def byType(p: Projection) = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do rootProjection(Projection): TypeReference and match on it? I think logic will be clearer then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

Projections(set.filter(byType).flatMap(bySuperClass))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this couple of times and have issues with understanding what bySuperClass does. Can we add a comment on it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And/or rename it and put on Projection itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* returns the projection
* `Person.name.contact`
*/
def basedOn(base: Set[Projection]): Projections = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying for myself: we loose correspondence between function arguments and their Projections and because of that we do this match between all base and all set projections?

Can we rename loop(base, p) and put it to Projection class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


val nestedList =
params.flatMap {
case param @ q"(..$inputs) => $body" =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda hard to read because you can't easily see alternatives together, can we extract bodies into separate functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fwbrasil
Copy link
Contributor Author

@ttim I addressed your feedback on #1761

I guess the only real concern from me - Projections sounds more like Map[ParamName, Set[Projections]] and not like Set[Projection].

We could use beta reded uction to deal with projections based on the actual parameters. It doesn't seem necessary, though. It would make the code much more complex without clear benefit since the projections are aggregated at the end and it doesn't really matter from which parameter they came from.

@ttim
Copy link
Collaborator

ttim commented Dec 20, 2017

We could use beta reded uction to deal with projections based on the actual parameters. It doesn't seem necessary, though. It would make the code much more complex without clear benefit since the projections are aggregated at the end and it doesn't really matter from which parameter they came from.

I don't think it complicates a logic anyhow significantly, but it makes part where you combine projections between function call and function body more straightforward and correct.

@fwbrasil
Copy link
Contributor Author

I don't think it complicates a logic anyhow significantly, but it makes part where you combine projections between function call and function body more straightforward and correct.

The projection logic is based on type references right now. In order to do detect which projection is from which parameter, we'd need to introduce a logic similar to beta reduction. I don't see how it'd be more correct or precise than the current implementation.

ttim pushed a commit that referenced this pull request Dec 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants