-
Notifications
You must be signed in to change notification settings - Fork 706
Scalding Commons
Sam Ritchie edited this page Jun 6, 2013
·
2 revisions
(This page contains the README for the former scalding-commons library. All scalding-commons code has been merged into the main Scalding repo as of June 6th, 2013.)
Common extensions to the Scalding MapReduce DSL.
Scalding-Commons includes Scalding Sources for use with the dfs-datastores project.
This library provides a VersionedKeyValSource
that allows Scalding to write out key-value pairs of any type into a binary sequencefile. Serialization is handled with the bijection-core
library's Injection trait.
VersionedKeyValSource
allows multiple writes to the same path,as write creates a new version. Optionally, given a Monoid on the value type, VersionedKeyValSource
allows for versioned incremental updates of a key-value database.
import com.twitter.scalding.source.VersionedKeyValSource
import VersionedKeyValSource._
// ## Sink Example
// The bijection library provides implicit Injections
// from String -> Array[Byte] and Int -> Array[Byte].
val versionedSource = VersionedKeyValSource[String,Int]("path")
// creates a new version on each write
someScaldingFlow.write(versionedSource)
// because Scalding provides an implicit Monoid[Int],
// the writeIncremental method will add new integers into
// each value on every write:
someScaldingFlow.writeIncremental(versionedSource)
// ## Source Examples
//
// This Source produces the most recent set of kv pairs from the VersionedStore
// located at "path":
VersionedKeyValSource[String,Int]("path")
// This source produces version 12345:
VersionedKeyValSource[String,Int]("path", Some(12345))
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding