This project allows to run Samza jobs on Mesos cluster. The Samza jobs can either be packaged in the traditional tarball or in a Docker image.
##Status
Early development. Not tested in production. Hints/issues/PRs are welcome.
##Building
To build and install this package to local repo, run:
mvn clean install
After this you should be able to reference it like this:
<dependency>
<groupId>eu.inn</groupId>
<artifactId>samza-mesos</artifactId>
<version>0.1.0-SNAPSHOT</version>
</dependency>
##Deploying Samza jobs to Marathon
Each Samza job is a Mesos framework. This framework creates one Mesos task for each Samza container. Although not required, it is convenient to use Marathon to run the Samza job's Mesos framework.
###Samza jobs in tarball
Samza jobs are traditionally deployed in a tarball. This archive should contain the following as top-level directories:
- bin - contains standard Samza distributed shell scripts (see hello-samza)
- config - with your job .properties file(s)
- lib - contains all .jar files
Example JSON to submit to Marathon to run a Samza job in a tarball may look like this:
{
"id": "samza-jobs.my-job",
"uris": [
"http://myrepository.com/my-job.tgz"
],
"cmd": "bin/run-job.sh --config-path=file://$PWD/config/my-job.properties --config=job.factory.class=eu.inn.samza.mesos.MesosJobFactory --config=mesos.master.connect=zk://myzookeeper.com:2181/mesos --config=mesos.package.path=http://myrepository.com/my-job.tgz --config=mesos.executor.count=1",
"cpus": 0.1,
"mem": 64,
"instances": 1,
"env": {
"JAVA_HEAP_OPTS": "-Xms64M -Xmx64M"
}
}
Note that the mesos.package.path
provides the location of the tar archive.
This JSON can be submitted to Marathon via curl:
curl -X POST -H "Content-Type: application/json" -d my-job.json http://mymarathon.com:8080/v2/apps
###Samza jobs in Docker
You can also package your Samza jobs in a Docker image, instead of a tarball. The Docker image should have a root /samza
directory, containing the same bin
, config
and lib
directories as the tarball. Building this Docker image is as simple as building the tarball and then adding it to the image at /samza. In the Samza job config, use mesos.docker.image
instead of mesos.package.path
. banno/samza-mesos provides a convenient base Docker image for you to build your Samza job's Docker image on.
Example JSON to submit to Marathon to run a Samza job in a Docker container may look like this:
{
"id": "samza-jobs.my-job",
"container": {
"docker": {
"image": "myregistry.com/my-job:latest"
},
"type": "DOCKER"
},
"cmd": "/samza/bin/run-job.sh --config-path=file:///samza/conf/my-job.properties --config=job.factory.class=eu.inn.samza.mesos.MesosJobFactory --config=mesos.master.connect=zk://myzookeeper.com:2181/mesos --config=mesos.docker.image=myregistry.com/my-job:latest --config=mesos.executor.count=1",
"cpus": 0.1,
"mem": 64,
"instances": 1,
"env": {
"JAVA_HEAP_OPTS": "-Xms64M -Xmx64M"
}
}
If your Docker image does not use the standard Samza run-job.sh and run-container.sh startup scripts, but instead uses its own ENTRYPOINT to run either the Samza framework or the Samza container, then you can use the mesos.docker.entrypoint.arguments
config option.
##Configuration reference
Property | Required? | Default value | Description |
---|---|---|---|
mesos.master.connect | yes | Mesos master URL | |
mesos.package.path | yes* | Job package URI (file, http, hdfs) | |
mesos.docker.image | yes* | Docker image (registry/my-jobs:latest) | |
mesos.docker.entrypoint.arguments | Arguments for Docker image ENTRYPOINT | ||
mesos.executor.count | 1 | Number of Samza containers to run job in | |
mesos.executor.memory.mb | 1024 | Mesos task memory constraint | |
mesos.executor.cpu.cores | 1 | Mesos task CPU cores constraint | |
mesos.executor.disk.mb | 1024 | Mesos task disk constraint | |
mesos.executor.attributes.* | Slave attributes reqs (regex expressions) | ||
mesos.scheduler.user | System user for starting executors | ||
mesos.scheduler.role | Mesos role to use for this scheduler | ||
mesos.scheduler.failover.timeout | a lot (Long.MaxValue) | Framework failover timeout |
** either mesos.package.path
or mesos.docker.image
is required.
##Acknowledgements
This project is based on Jon Bringhurst's prototype.