-
Notifications
You must be signed in to change notification settings - Fork 2
Sample Code converting a log from JSON to Avro binary format.
rbodkin edited this page Jan 29, 2011
·
2 revisions
The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe) framework also supports working with Avro as its native format for Java map-reduce, but it also lets you read in JSON or text files as input to mappers, making it fairly easy to use for this kind of conversion job. E.g., the heart of the program would be just this:
ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr /*2011/01/28/03*/).of(Log.class).jsonFormat();
ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class);
ColPhase copy = new ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class).
groupBy("timestamp").reduce(IdentityReducer.class);
ColPipe conversion = new ColPipe(getClass()).named("log conversion");
Conversion.produces(outlogs);
You'd currently define an identity mapper and reducer (soon it will default to those):
public static class IdentitMapper extends BaseMapper<Log, Log> {
@Override
public void map(Log in, Log out, ColContext<Log> context) {
super.map(in, out, context);
}
}
public static class IdentityReducer extends BaseReducer<Log, Log> {
@Override
public void reduce(Iterable<Log> in, Log out, ColContext<Log> context) {
super.reduce(in, out, context);
}
}