-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new example: Redis Cluster Stateful MIG #1282
base: main
Are you sure you want to change the base?
Add a new example: Redis Cluster Stateful MIG #1282
Conversation
/gcbrun |
1 similar comment
/gcbrun |
4. Clone the Git repository, which contains the scripts and the manifest files to deploy and configure the example workload: | ||
|
||
``` | ||
git clone https://github.com/GoogleCloudPlatform/XXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps include the name of this github repository and not xxx
2. Enable APIs required for Packer build | ||
|
||
``` | ||
gcloud services enable sourcerepo.googleapis.com compute.googleapis.com servicemanagement.googleapis.com storage-api.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really need sourcerepo?
PROJECT="your-project-id" | ||
REGION="us-central1" | ||
|
||
git clone https://github.com/GoogleCloudPlatform/cloud-builders-community.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you seem to have committed the whole cloud-builders-community in this PR, you should remove it.
5. Create a Consul image using Packer and deploy the cluster | ||
|
||
``` | ||
PROJECT="your-project-id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to redefine these environment variables again. right? thanks
``` | ||
|
||
6. Deploy Consul cluster | ||
Update `consul-cluster/terraform.tfvars` with your project ID, region, and consul source image you've created above. To check the image name, run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its really helpful if you could include some commands that do this,
users often just run commands and don't read the description.
@@ -0,0 +1,20 @@ | |||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole directory shouldn't be committed.
|
||
# update the one ip from internal subnet range. | ||
static_ip_address = var.consul_static_ip_address | ||
static_ip_name = var.consul_static_ip_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_ip_name doesn't seem to be used
instance_group_id = var.consul_instance_group_id | ||
network = var.consul_network | ||
subnet = var.consul_subnetwork | ||
project_id = var.project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to supply the region with value var.region
apt-get install consul -y | ||
cat > /etc/consul.d/redis-master-role.sh <<EOF | ||
#!/bin/bash | ||
if [[ master == \$(redis-cli -h $my_ip info replication | tr -d '\r' | grep role: | cut -d : -f2 ) ]]; then echo 'matched'; else slave; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be "else exit -1" , otherwise it exists with "command slave not found" which is odd. still a non zero exit status, but not as clean.
thanks
} | ||
|
||
# Create the Instance Template that will be used to populate the Managed Instance Group. | ||
resource "google_compute_instance_template" "consul_server" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the consul cluster doesn't work for me because dnsmasq is binding to localhost and not to 0.0.0.0/all interfaces. i had to edit /etc/dnsmasq.d/10-consul and delete the line
listen-address=127.0.0.1
and that wasn't the only problem. i'm actually not sure why consul isn't responding to DNS requests sent to the forwarding rule ip, but it isn't.
this is the last review comment i'm making at the moment. but hopefully you can figure out the issue ?
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dnsmasq seems to respond only when listen-address contains the forwarding rule ip?
listen-address=10.128.0.32
i couldn't get it to work otherwise.
|
||
![Redis Architecture](./assets/redis_architecture.png "Redis Architecture") | ||
|
||
- At minimum, 1 set of 3 nodes consisting of a master node and its replicas is deployed to ensure high availability. If multiple sets representing shards are deployed, each master in sets should reside in distinct zones. This provides resilience against single VM failures and zonal failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of having one DNS name for all primary and another for all replica nodes?
We are starting 6 nodes. And we configure the cluster with 1 replica for each primary. This means there are 3 shards each containing a replica and primary.
Yet we only have two DNS names. How does this work?
Consul isn't shard aware, and even if it was, how would the client know which shard a key is in?
To use Redis, I'll want to get or set a key "X", what server should i talk to has the shard that contains this key?
Consul will return 3 hosts as the primary, but only one of those nodes contains shard for key X, how do I know which one to use? Consul isn't telling me because it doesn't know.
I expect the only way to properly communicate to Redis Cluster is to use the Redis Cluster protocol. And if we are doing that, what's the point of having different DNS names for the primary and replica nodes as all service discovery is done via the cluster protocol. Then it doesn't matter if we initially connect to a replica or master or what shard we connect to.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, relying on the Redis Cluster protocol or a smart client/proxy is usually the most effective way to handle service discovery and shard routing in a Redis Cluster environment. The primary reason to having separate DNS names for primaries and replicas in this example is to showcase how the reference architecture in High availability stateful clusters on Compute Engine works with real stateful applications like a Redis Cluster.
Although there are tradeoffs, this architecture would be applicable to a massive scale Redis Cluster which requires manual sharding, meaning having multiple sets of Redis Clusters with a primary and replicas per a subset of entire key ranges for better performance. While less common, this approach can also offer some basic load balancing for read operations, especially in scenarios where you might want to direct some read traffic to replicas to offload primaries.
Probably these things should be clarified in the README. I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relying on the Redis Cluster protocol or a smart client/proxy is usually the most effective way to handle service discovery and shard routing in a Redis Cluster environment.
It's not a matter of efficiency, i just don't see how it works at all.
this architecture would be applicable to a massive scale Redis Cluster which requires manual sharding, meaning having multiple sets of Redis Clusters with a primary and replicas per a subset of entire key ranges for better performance.
Once you determine cluster to talk to, how do you choose the node within the cluster using this example? I can ONLY understand how this Consul/DNS based routing between primary and replica might work when the cluster has a single shard. If so, that is a big limitation, and something we should document and explain and then of course change the example to only provision a single shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will rely on Redis Cluster protocol once the client resolved the IP through DNS/Consul, and further communication will happen between the client and the node directly. Probably adding an example client with Redis configurations would help the readers to understand how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. i do think it will be helpful to show how clients connect. maybe just using redis-cli.
if Redis Cluster protocol is used, what value is there in having two DNS names?
thanks
/gcbrun |
/gcbrun |
@shunjikawabata If you can address the comments by @bmenasha , I can get this merged. |
Thanks @agold-rh and @bmenasha for the comments/suggestions. We'll take a look and try fixing them. |
@shunjikawabata If you can address the comments by @bmenasha, I can get this merged. |
This example is to deploy high availability Redis clusters on Compute engine. It assumes that you're familiar with the reference architecture in High availability stateful clusters on Compute Engine.