-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] introduce new scalding-quotation module #1754
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting! Here are some small comments. Still reading.
I personally would like say 3 smaller PRs if possible, maybe one adding scalding-quotation, then 2 more using? I don't know. I will take a longer look later.
project/plugins.sbt
Outdated
addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.5.0") | ||
addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "1.0") | ||
addSbtPlugin("org.wartremover" % "sbt-wartremover" % "2.1.1") | ||
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add this, or can you put it in your personal ~/.sbt
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's convenient for me because I work on projects that require different versions of the plugin so it can't be on ~/.sbt
. It's fine, I'll revert this change.
@@ -36,7 +37,7 @@ sealed trait CoGroupable[K, +R] extends HasReducers with HasDescription with jav | |||
/** | |||
* This is the list of mapped pipes, just before the (reducing) joinFunction is applied | |||
*/ | |||
def inputs: List[TypedPipe[(K, Any)]] | |||
def inputs(implicit q: Quoted): List[TypedPipe[(K, Any)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment why we need the q
here? This is not a method users should really ever be using (only people writing backends).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added it because FilterKeys.input
creates a TypedPipe
. I reverted these methods and used Quoted.internal
.
@@ -48,7 +49,7 @@ sealed trait CoGroupable[K, +R] extends HasReducers with HasDescription with jav | |||
* how to achieve, and since it is an internal function, not clear it | |||
* would actually help anyone for it to be type-safe | |||
*/ | |||
private[scalding] def joinFunction: (K, Iterator[Any], Seq[Iterable[Any]]) => Iterator[R] | |||
private[scalding] def joinFunction(implicit q: Quoted): (K, Iterator[Any], Seq[Iterable[Any]]) => Iterator[R] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question. Can you add a comment to why we need q here?
f0e8dc4
to
136e758
Compare
IDK if it makes sense to copy, but scalatest calls something similar a |
I have |
@fwbrasil I think it's a good idea to split it in a way you say. |
@isnotinvain @ttim Quotation is a principled concept like Monoid, Monad, etc. I think it's just a question of having documentation |
@johnynek @ttim @dieu @isnotinvain I've just created the first PR: #1755 |
|
||
private val log = LoggerFactory.getLogger(this.getClass) | ||
|
||
def projectionMeta[T: ClassTag](superClass: Class[_]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put the return type on this public method?
I'm confused where Quotations are coming into the picture here. Can you comment?
@@ -657,6 +661,24 @@ object OptimizationRules { | |||
} | |||
} | |||
|
|||
object ApplyProjectionPushdown extends PartialRule[TypedPipe] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, this is pretty nice...
Okay...
Problem
Scalding could leverage more information on the user's code to apply optimizations and provide a better description of the jobs.
Solution
Introduce the new module
scalding-quotation
that provides a mechanism to quote the method parameters through an implicit parameter. Example:The quoted implicit is materialized by a macro that has access to the parameters used to invoke the method. This first version has three features:
Quoted functions
The macro has access only to the code under compilation. For instance, given this code:
The macro doesn't know the tree of
phone
, so it must assume that all fields are used and thus the projection isp.contact
instead ofp.contact.phone
. To address this limitation, the user can quote functions:Even though the macro doesn't have the projections of
phone
statically, it falls back to a runtime tree that checks if the function is quoted and determines the proper projection (p.contact.phone
in this case).Quoted propagation
The quotation must be done by the methods that are called by the user code. Any internal transformation done by scalding should use the initial quoted from the user call. For instance, this
TypedPipe
method:calls other quoted methods but it doesn't materialize new
Quoted
instances for them, reusing the one coming from the user call.As a safety measure, the macro rejects scalding sources that try to materialize a
Quoted
since that's something that shouldn't happen in scalding internally. The compilation fails with the following error if a quotation is missing and it's a scalding source file:As the error message informs, it's possible to use
Quoted.internal
as a workaround for internal transformations that don't produce projections:The macro uses a simple whitelist to allow
Quoted
materialization for source file paths that have specific substrings ("test", "example", "tutorial" for now). It's a weak rule, but it seems good enough.Enabling the automatic projection
The projection is done through an optimization phase that is disabled by default and can be enabled via the
scalding.experimental.automatic_projection_pushdown
flag. I added theexperimental
so we can promote the feature later if it's stable.Note that the automatic projection happens at an earlier phase than the manual projections configuration, so it won't have an effect if users are already using the manual projection pushdown.
New typed pipe description
The previous mechanism that provided descriptions based on stack traces will now output the information from
Quoted
instead. Example description:Interfaces like DrScalding that render the descriptions will need rework. The description could be long since the user's code can have an arbitrary length.
Backwards compatibility
All user-facing methods need to take the extra implicit Quoted. This makes the changeset binary-incompatible, but it's mostly source-compatible. It'll not compile only if the user has a method call that specifies implicits explicitly. Example:
Notes
.gitignore
. It's just a convenience for me; I can remove if you prefer.transitiveDependentsOf
made me realize how convenient it is :)