Skip to content

Sample Code converting a log from JSON to Avro binary format.

rbodkin edited this page Jan 29, 2011 · 2 revisions

The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe) framework also supports working with Avro as its native format for Java map-reduce, but it also lets you read in JSON or text files as input to mappers, making it fairly easy to use for this kind of conversion job. E.g., the heart of the program would be just this:

ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr /*2011/01/28/03*/).of(Log.class).jsonFormat();
ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class);
ColPhase copy = new ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class).
groupBy("timestamp").reduce(IdentityReducer.class);

ColPipe conversion = new ColPipe(getClass()).named("log conversion");
Conversion.produces(outlogs);

You'd currently define an identity mapper and reducer (soon it will default to those):

public static class IdentitMapper extends BaseMapper<Log, Log> {
	@Override
	public void map(Log in, Log out, ColContext<Log> context) {
	    super.map(in, out, context);
	}
}

public static class IdentityReducer extends BaseReducer<Log, Log> {
	@Override
	public void reduce(Iterable<Log> in, Log out, ColContext<Log> context) {
	    super.reduce(in, out, context);
	}
}
Clone this wiki locally