-
Notifications
You must be signed in to change notification settings - Fork 706
Field rules
In the map phase (map/flatMap/pack/unpack) the rule is: if the target fields are new (disjoint from the input fields), they are appended. If source or target fields are a subset of the other, only the results are kept. Otherwise, you get an exception at flow planning stage (there is some overlap but not subset relationship).
If you use mapTo/flatMapTo/packTo/unpackTo, only the results are kept.
In the groupBy, the keys are always kept and only the target fields. So, groupBy('x) { _.sum('y -> 'ys).sum('z -> 'zs) } will return a pipe with three columns: ('x, 'ys, 'zs).
Joins keep all the fields. For inner joins, the field names can collide on the fields you are joining, and in that case only one copy is kept in the result. Otherwise, all field names must be distinct.
The documentation in FieldConversions.scala might be helpful for Cascading experts.
/**
* Rather than give the full power of cascading's selectors, we have
* a simpler set of rules encoded below:
* 1) if the input is non-definite (ALL, GROUP, ARGS, etc...) ALL is the output.
* Perhaps only fromFields=ALL will make sense
* 2) If one of from or to is a strict super set of the other, SWAP is used.
* 3) If they are equal, REPLACE is used.
* 4) Otherwise, ALL is used.
*/
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding