Skip to content

Sample Code converting a log from JSON to Avro binary format.

rbodkin edited this page Jan 29, 2011 · 2 revisions

The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe) framework also supports working with Avro as its native format for Java map-reduce, but it also lets you read in JSON or text files as input to mappers, making it fairly easy to use for this kind of conversion job. E.g., the heart of the program would be just this:

ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr /2011/01/28/03/).of(Log.class).jsonFormat(); ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class); ColPhase copy = new ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class). groupBy("timestamp").reduce(IdentityReducer.class);

ColPipe conversion = new ColPipe(getClass()).named("log conversion"); Conversion.produces(outlogs);

You'd currently define an identity mapper and reducer (soon it will default to those):

public static class IdentitMapper extends BaseMapper<Log, Log> { @Override public void map(Log in, Log out, ColContext context) { super.map(in, out, context); } }

public static class IdentityReducer extends BaseReducer<Log, Log> { @Override public void reduce(Iterable in, Log out, ColContext context) { super.reduce(in, out, context); } }

Clone this wiki locally