Skip to content
srisatish edited this page Aug 22, 2013 · 5 revisions

Example Coding tests/tasks for future hackers @ 0xdata or OpenSource Collaborators. __ All Contributions will be recognized and your name added as Collaborators.

(TBD - Attache JIRA-### to this)

1. "Python REST Quantile Github Task Description

Objective:

Add a quantile algorithm to H2O and call it from Python using the REST API.

  • Implement a quantile algorithm in H2O by building on top of H2O’s existing MapReduce. Calculate for the following splits: 5%, 10%, 15%, 85%, 90%, 95%. The algorithm should require only a fixed number of passes over the data (ideally 1) to calculate N quantiles. Calculating an approximation is fine (provide a reference to the algorithm you chose).

  • Add a REST API to call the new algorithm. This should provide a JSON response.

  • Write a Python script to access the new algorithm via HTTP and parse the JSON.

  • Run H2O quantile on a big dataset (say, 100M rows) using distributed H2O (at least three nodes) using Python (use the REST API). (Write a script to generate a big data set randomly and provide the script and random seed for repeatability.)

  • Print quantile JSON response result from Python.

  • Extra credit: Add a web page for use by the browser. (This can be painful. This is really extra credit!)

Notes:

  • The quantile algorithm should take as input an already-parsed dataset in memory (a HEX key). The output can be rolled up into a JSON response directly, if you like. It’s not a requirement to store the output in the Distributed Key/Value store.

  • The REST API port default is 54321. (This is the same as the browser port.)"

Clone this wiki locally