-
Notifications
You must be signed in to change notification settings - Fork 560
Hacker Interviews
__ Alll Contributions will be recognized and your name added as Collaborators.
(TBD - Attache JIRA-### to this)
** 1. "PythonRESTQuantile Github Task Description **
Objective:
Add a quantile algorithm to H2O and call it from Python using the REST API.
-
Implement a quantile algorithm in H2O by building on top of H2O’s existing MapReduce. Calculate for the following splits: 5%, 10%, 15%, 85%, 90%, 95%. The algorithm should require only a fixed number of passes over the data (ideally 1) to calculate N quantiles. Calculating an approximation is fine (provide a reference to the algorithm you chose).
-
Add a REST API to call the new algorithm. This should provide a JSON response.
-
Write a Python script to access the new algorithm via HTTP and parse the JSON.
-
Run H2O quantile on a big dataset (say, 100M rows) using distributed H2O (at least three nodes) using Python (use the REST API). (Write a script to generate a big data set randomly and provide the script and random seed for repeatability.)
-
Print quantile JSON response result from Python.
-
Extra credit: Add a web page for use by the browser. (This can be painful. This is really extra credit!)
Notes:
-
The quantile algorithm should take as input an already-parsed dataset in memory (a HEX key). The output can be rolled up into a JSON response directly, if you like. It’s not a requirement to store the output in the Distributed Key/Value store.
-
The REST API port default is 54321. (This is the same as the browser port.)"