Warning Because this repo is based upon VirtualBox which isn't available vor Apple Silicon based Macs, i have to deprecated this repo.
2023: there are test builds of VirtualBox for Apple Silicon, but so far it is not stable enough.
In case you need a local cluster providing Kafka, Cassandra and Spark you're at the right place.
- Apache Kafka 2.7.0
- Apache Spark 3.0.2
- Apache Cassandra 4.0-beta4
- Apache Hadoop 3.3.0
- Apache Flink 1.12.1
- Vagrant (tested with 2.2.14)
- VirtualBox (tested with 6.1.18)
- Ansible (tested with 2.10.5)
- The VMs take approx 18 GB of RAM, so you should have more than that.
vagrant-hostsupdater
is used to have the vms available with their names in your network.
git clone https://github.com/markush81/fastdata-cluster.git
vagrant up
The result if everything wents fine should be
IP | Hostname | Description | Settings |
---|---|---|---|
192.168.10.2 | kafka-1 | running a kafka broker | 1024 MB RAM |
192.168.10.3 | kafka-2 | running a kafka broker | 1024 MB RAM |
192.168.10.4 | kafka-3 | running a kafka broker | 1024 MB RAM |
192.168.10.5 | cassandra-1 | running a cassandra node | 1024 MB RAM |
192.168.10.6 | cassandra-2 | running a cassandra nodee | 1024 MB RAM |
192.168.10.7 | cassandra-3 | running a cassandra node | 1024 MB RAM |
192.168.10.8 | hadoop-1 | running a yarn resourcemanager and nodemanager, hdfs namenode, spark distribution, flink distribution | 4096 MB RAM |
192.168.10.9 | hadoop-2 | running a yarn nodemanager, hdfs datanode | 4096 MB RAM |
192.168.10.10 | hadoop-3 | running a yarn nodemanager, hdfs datanode | 4096 MB RAM |
Name | |
---|---|
Zookeeper | kafka-1:2181,kafka-2:2181,kafka-3:2181 |
Kafka Brokers | kafka-1:9092,kafka-2:9092,kafka-3:9092 |
Cassandra Hosts | cassandra-1,cassandra-2,cassandra-3 |
YARN Resource Manager | http://hadoop-1:8088 |
HDFS Namenode UI | http://hadoop-1:9870 |
lucky:~ markus$ vagrant ssh cassandra-1
[vagrant@cassandra-1 ~]$ cqlsh
Connected to analytics at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 4.0-beta4 | CQL spec 3.4.5 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> CREATE KEYSPACE example WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
cqlsh> USE example;
cqlsh:example> CREATE TABLE users (id UUID PRIMARY KEY, lastname text, firstname text );
cqlsh:example> INSERT INTO users (id, lastname, firstname) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'Mustermann','Max') USING TTL 86400 AND TIMESTAMP 123456789;
cqlsh:example> SELECT * FROM users;
id | firstname | lastname
--------------------------------------+-----------+------------
6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | Max | Mustermann
(1 rows)
Check Cluster Status:
[vagrant@cassandra-1 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.10.5 105.69 KiB 16 ? 74e6aff4-3561-4f48-bdbb-d030a9da0c01 rack1
UN 192.168.10.7 100.65 KiB 16 ? 3b428824-a9f2-4a49-ae1d-3639fc584e92 rack1
UN 192.168.10.6 100.66 KiB 16 ? 4418963f-5e94-4046-9cc1-f9614c6eae6e rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
[vagrant@kafka-1 ~]$ zookeeper-shell.sh kafka-1:2181/
Connecting to kafka-1:2181/
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
ls /
[admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
ls /brokers/ids
[0, 1, 2]
lucky:~ markus$ vagrant ssh kafka-1
[vagrant@kafka-1 ~]$ kafka-topics.sh --create --zookeeper kafka-1:2181 --replication-factor 2 --partitions 6 --topic sample
Created topic "sample".
[vagrant@kafka-1 ~]$ kafka-topics.sh --zookeeper kafka-1 --topic sample --describe
Topic:sample PartitionCount:6 ReplicationFactor:2 Configs:
Topic: sample Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: sample Partition: 1 Leader: 2 Replicas: 2,3 Isr: 2,3
Topic: sample Partition: 2 Leader: 3 Replicas: 3,1 Isr: 3,1
Topic: sample Partition: 3 Leader: 1 Replicas: 1,3 Isr: 1,3
Topic: sample Partition: 4 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: sample Partition: 5 Leader: 3 Replicas: 3,2 Isr: 3,2
[vagrant@kafka-1 ~]$
[vagrant@kafka-1 ~]$ kafka-console-producer.sh --broker-list kafka-1:9092,kafka-3:9092 --topic sample
Hey, is Kafka up and running?
[vagrant@kafka-1 ~]$ kafka-console-consumer.sh --bootstrap-server kafka-1:9092,kafka-3:9092 --topic sample --from-beginning
Hey, is Kafka up and running?
The YARN ResourceManager UI can be accessed by http://hadoop-1:8088, from there you can navigate to your application .
lucky:~ markus$ vagrant ssh hadoop-1
[vagrant@hadoop-1 ~]$ spark-submit --master yarn --class org.apache.spark.examples.SparkPi --deploy-mode cluster --driver-memory 512M --executor-memory 512M --num-executors 2 /usr/local/spark-3.0.2-bin-without-hadoop/examples/jars/spark-examples_2.12-3.0.2.jar 1000
http://hadoop-1:8088/cluster -> Click ID Link of "Flink session cluster" and then "Tracking URL: ApplicationMaster"
[vagrant@hadoop-1 ~]$ HADOOP_CLASSPATH=$(hadoop classpath) flink run /usr/local/flink-1.12.1/examples/streaming/WordCount.jar