-
Notifications
You must be signed in to change notification settings - Fork 2
Sample Code converting a log from JSON to Avro binary format.
The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe) framework also supports working with Avro as its native format for Java map-reduce, but it also lets you read in JSON or text files as input to mappers, making it fairly easy to use for this kind of conversion job. E.g., the heart of the program would be just this:
ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr /2011/01/28/03/).of(Log.class).jsonFormat(); ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class); ColPhase copy = new ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class). groupBy("timestamp").reduce(IdentityReducer.class);
ColPipe conversion = new ColPipe(getClass()).named("log conversion"); Conversion.produces(outlogs);
You'd currently define an identity mapper and reducer (soon it will default to those):
public static class IdentitMapper extends BaseMapper<Log, Log> { @Override public void map(Log in, Log out, ColContext context) { super.map(in, out, context); } }
public static class IdentityReducer extends BaseReducer<Log, Log> { @Override public void reduce(Iterable in, Log out, ColContext context) { super.reduce(in, out, context); } }