Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GM HA 方案设计 #6

Open
llhuii opened this issue Jun 2, 2021 · 0 comments
Open

GM HA 方案设计 #6

llhuii opened this issue Jun 2, 2021 · 0 comments

Comments

@llhuii
Copy link
Owner

llhuii commented Jun 2, 2021

GM HA方案设计

1. GM功能

image

  1. 各个边云协同资源Operators的管理器
  2. GM与LC通信模块:
    • downstream: GM将边云协同资源对象同步到指定的LC消息
    • upstream: LC将边云协同资源对象更新操作同步给GM, GM再更新到k8s api

GM HA方案主要考虑上述两个模块的HA

2. k8s社区HA模式玩法

k8s HA分两类:

  1. 无状态模块的HA: 前面加一个负载均衡器, 如k8s-api-server
  2. 有状态HA:
  • 通过raft算法选主, 如etcd
  • 使用client-go的leader-election模块, 如kube-scheduler, kube-controller-manager
    1. 业务集成client-go的leader-election模块
    2. sidercar模式(已经于2018年不维护了): 将client-go的leader-election模块运行为一个sidecar, 业务只需访问sidecar指定端口进行判断, doc, code

以下是一些社区方案的调研

2.1. HA of kube-controller-manager: active-standby

直接集成leader-election模块进行选主, 获取到lease锁的实例执行其业务逻辑: 对k8s api server进行list/watch, 然后处理deployment/job/daemonset等资源

比如在一个实际环境, 其lease对象:kubectl get leases -n kube-system kube-controller-manager -o yaml

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  acquireTime: "2021-05-29T17:07:58.743009Z"
  holderIdentity: kind-control-plane2_c2181a4f-abaf-473c-b799-46e87ca46af6
  leaseDurationSeconds: 15
  leaseTransitions: 1
  renewTime: "2021-05-31T03:53:29.519058Z"

2.2. HA support of CloudCore:

kubeedge/kubeedge#1560
kubeedge/kubeedge#1569
kubeedge/kubeedge#1600

2.2.1 active-standby

cloudcore HA设计

  1. 选主方案: cloudcore直接集成leader-election模块
  2. 保证edgecore连接的cloudcore是主:
    1. keepalived: check cloudcore 10002端口的/readyz rest接口进行vip选择, 如果是主返回OK
    2. k8s 原生的load-balance: 设置podReadiness

2.2.1 active-active

kubeedge/kubeedge#1560 (comment)

我理解edgecore可连接任意cloudcore实例, cloudcore实例只处理其上已经连接的edgenode消息

3. GM HA

3.1 协同资源Operators的HA

active-standby: 由于是对crd list-watch并且update, 可参考kube-controller-manager

active-active: 无法做到??

3.2 GM与LC通信模块的HA

3.2.1 active-standby

参考cloudcore的active-standby模式, LC只连接主GM

3.2.2 active-active

参考cloudcore的active-active模式, LC可连接任意GM实例, GM只处理其上已经连接的LCs消息

3.2.3 另外一种思路, GM/LC通信通过api-server中转

由于kubeedge 边侧提供list-watch的能力(见Autonomic Kube-API Endpoint(AKE))

  1. GM -> LC: GM 写cr数据到api-server, LC 通过 AKE watch到变化
  2. LC -> GM: LC 通过AKE update 资源状态, GM 通过k8s-api-server watch变化

image

利:

  1. 无需维护LC-> GM 长连接

弊:

  1. 由于AKE监听localhost, 需LC运行在host network上
  2. LC都会list-watch所有边云协同资源对象, 会造成广播风暴, edgecore性能瓶颈(能否用field-selector减轻此问题?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant