THESE DOCS ARE DEPRECATED SEE ActionML.com/docs
The Guides are moved
The markdown templates are now in https://github.com/actionml/docs.actionml.com. Changes there are automatically published to the live site: actionml.com/docs. Please make any PRs to that new repos.
#PredictionIO Standalone Server Guide
This is a guide to setting up the PredictionIO EventServer and Universal Recommender PredictionServer in a standalone fashion so all constituent services run on a single machine. At the end of this guide we will spin up a Spark cluster and offload the majority of training work to the cluster, then take it offline so it costs nothing while idle.
##Pre-requisites
Follow the Small HA Cluster-setup instructions except for the following differences. First remember that we will be setting up only one machine.
##Build the Artifact
As per step 7 of the basic cluster-setup instructions, build the PredictionIO artifact. This produces a gzipped tarball (PredictionIO-0.9.6.tar.gz modulo the version number). The installation will require this, as well as a few other files. While the all-in-one instructions build the artifact on the same host as the target installation, this is not necessary.
##Common Installation
The EventServer and the PredictionServer run from the same artifact jar. There are common installation steps that are the same for both, and then additional installation steps required for the PredictionServer.
###Java
You'll need a JDK. It may be possible to just use a JRE, but that hasn't been tested. The easiest thing to do is to obtain an rpm (or pkg, deb, etc, as appropriate for your linux distro) and install it.
These instructions were tested with Java 8 (jdk-8u65-linux-x64.rpm on CentOS).
###PredictionIO
-
Create a user named "pio".
-
In pio's home directory, create a directory named "pio".
-
untar the PredictionIO tarball into the pio directory, creating a PredictionIO-x.y.z directory.
-
Create a symbolic link to the PIO directory for convenience: ln -s PredictionIO PredictionIO-x.y.z
-
Inside the PredictionIO directory, create the directory path vendors/hbase-1.0.0/conf (use mkdir -p).
-
Place the following hbase-site.xml file in the hbase conf directory:
hbase.zookeeper.quorum zk1,zk2,zk3 hbase.zookeeper.property.clientPort 2181
This will get used by the HBase client code to find HBase. Ideally the code will be fixed someday not to require this, but to instead get these values from PIO variables and specify them directly to the client.
-
Modify PredictionIO/conf/pio-env.sh in the following way:
- Comment out SPARK_HOME
- Comment out POSTGRES_JDBC_DRIVER
- Comment out MYSQL_JDBC_DRIVER
- HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
- PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
- PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
- PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=ELASTICSEARCH
- Comment out PIO_STORAGE_SOURCES_PGSQL_*
- PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
- PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch # has to match the value of cluster.name in ElasticSearch's configuration
- PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=es1,es2,es3 # comma-separated list of ElasticSearch hostnames
- PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
- PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
- PIO_STORAGE_SOURCES_HBASE_HOSTS=hb1,hb2,hb3 # comma-separated list of ZooKeeper hosts that know where HBase is
- PIO_STORAGE_SOURCES_HBASE_PORTS=0,0,0 # unknown, but must be list of same size as _HBASE_HOSTS
- Open port 7070 on this host; that's where events will get sent (that's the defalt port, but you can change it when launching the EventServer, in which case open the desired port here instead).
###Other Configuration
####.bash_profile
You'll need the following in your .bash_profile:
# setup Java
export JAVA_HOME=/usr/java/jdk1.8.0_65
# setup specific to PIO
export PIO_HOME=/home/pio/pio/PredictionIO
export PATH=$PIO_HOME/bin:$PATH
export JAVA_OPTS="-Xmx4g"
export SPARK_HOME=/usr/local/spark # but there's nothing there
####Open Ports
In order for the PIO servers to communicate with them, the hosts running other services must have certain ports open. Sometimes there are additional requirements. This lists the other services, and the ports that must be open on those services' hosts in order to PIO to reach them.
-
ElasticSearch Hosts Open port 9300
-
HBase Hosts HBase uses four ports that its clients need access to at various times. These defaults seem to vary on different distributions, so the safest thing to do is to specify them explicitly in the HBase installation (in hbase-site.xml), and then open those ports (60000, 60010, 60020, 60030) on the HBase hosts.
Beware that HBase is fussy over host ip resolution. Setting up /etc/hosts or DNS may require extra care.
Note that if you're trying to run against a standalone installation of HBase, this won't work; in that mode, HBase seems to assign random ports each time it is started, and relies on its client to query for the ports via Zookeeper; this stymies attempts to use it from off the standalone host because you won't know which ports to open in advance; the settings in hbase-site.xml seem to be ignored for this mode.
##Automated Setup
These procedural setup instructions could be used to construct a docker container that could be used to run instances of the EventServer. Alternatively, the required bash, HBase, and pio-env.sh changes could be packaged up along with the original tarball contents into a new package that could be installed in an automated fashion.
##Starting the EventServer
Start the EventServer with
pio eventserver
Note that this requires already having created your engine-model as per the all-in-one single host instructions. A server started in this way can be tested in the same way as described elsewhere; for example, this one was tested for a Universal Recommender, and was tested using the same examples/import_handmade.py script (see "Import Sample Data" on that page), with the addition of the --url parameter to specify this standalone eventserver's host.
The EventServer is multi-tenant, and multiple instances can be run as a scalable tier by using this installation procedure.
##Installing the PredictionServer
The PredictionServer requires the following additional setup on top of the common piece described above for the EventServer.
###Standard PredictionIO Install
Follow the usual template installation from http://docs.prediction.io. Then copy the entire engine/template directory to each PredictionServer.
###PredictionIO with Multi-tenancy
Using the standard pio workflow the resource id for the engine instance will be auto-generated and place in the manifest.json where the template/engine is build with pio build
. Then pio train
will build a model in shared storage like Elasticsearch (model storage for the Universal Recommender) The entire directory must be copied to every PredictionServer and pio deploy
must run on all.
###Deploying The Universal Recommender from Jars (only for PredictionIO with Multi-tenancy)
Note: This method of workflow completely changes the standard pio workflow and should not be mixed with it on the same engine/template.
####Create the Universal Recommender Jars
- Get the code for the Universal Recommender into a new directory
pio build
- create a directory at 'pio-home/plugins`
- copy the jars
cp universal-recommender-home/target/template* pio-home/plugins
- copy the jars to each PredictionServer, only one
pio build
is required
####On-boarding New Tenants and Training
pio deploy --resource-id <some-resource-id>
This will deploy an empty tenant in the PredictionServer which will return empty results if queried.- copy the directory to all PredictionServers and repeat the
pio deploy --resource-id <some-resource-id>
pio train
on one PredicitonServer, the model will be deployed to each empty engine instance