Clarification Needed on Airbyte Job Parallelism and Worker Roles in GKE Deployment #42439
-
Hello, I have Airbyte deployed in GKE using Helm chart 0.248.5 and Airbyte version 0.63.4. I'm trying to understand how scaling Airbyte works internally. I've read several threads, blogs, and documentation about job parallelism in Airbyte, but it's still unclear to me how some of the components mentioned work together. I understand that when a job sync starts, several pods will be initialized, such as check, orchestrator, discovery, source, and destination (not all at the same time). I also understand the role of the parameters MAX_*_WORKERS in the values. I also understand the role of the temporal ports in communicating with these created pods and that there is a limiting number of 40 TEMPORAL_WORKER_PORTS to communicate with pods, so I assume there is a limit of 40 pods that can communicate with the Temporal DB. What I don't understand is the role of the deployment airbyte-worker. There seems to be only one persistent pod that houses this worker, and I'm not sure what it does in relation to the actual workers that do the jobs. There is a document in daspire (https://docs.daspire.com/deploying-airbyte/on-kubernetes/#increasing-job-parallelism) that mentions, "The number of worker pods can be changed by increasing the number of replicas for the airbyte-worker deployment," but I have no idea how old this is or if it is still relevant. Additionally, in my company, there is a horizontal pod autoscaler configured in the airbyte-worker deployment from 1 to 50 replicas, but I've monitored this service while we are having a high number of jobs, and the replicas always stay at 1, so I'm not even sure this is necessary. I've noticed Airbyte uses words like jobs and workers interchangeably sometimes, and sometimes they mean particular things, so I'm confused about the pod airbyte-worker in Kubernetes. Is it really a worker, or are only the pods created to execute sync jobs considered workers? what does airbyte-worker do in kubernetes? and finally, is it still relevant to create more airbyte-worker replicas to increase job concurrency? Thank you for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In Kubernetes deployments, the (Some of the "whys" of the orchestrator pod listed there will also probably give you an idea to which of the sources you're reading from are current or out of date.) This leaves the worker primarily responsible for initiating the orchestrator (which initiates the appropriate source/read and destination/write pods), monitoring the state, and handling any additional interventions needed. When this change was made, the need to scale the worker deployment became much less common. In our case we still occasionally see contention or timeouts when initiating/running many concurrent jobs (>100) . . . so we run 2 replicas for The most current docs on scaling are probably the Self-Managed Enterprise docs, which you can see here: Those apply equally well in general to OSS deployments (it's the same codebase) but their team seems to be working hard on trying to bring all the other docs current as well (most recently updating and greatly improving the Deployment docs). P.S. Regarding the interchangeable usage of terms, I think some of this comes from the generic concepts vs. deployment names. But also, naming things is the hardest part of development and docs writing 😂 |
Beta Was this translation helpful? Give feedback.
In Kubernetes deployments, the
airbyte-worker
deployment has fewer responsibilities because it delegates to a newer job called container orchestrator (whereas in a docker compose deployment this is all on the worker). More on that here:https://docs.airbyte.com/understanding-airbyte/jobs#decoupling-worker-and-job-processes
(Some of the "whys" of the orchestrator pod listed there will also probably give you an idea to which of the sources you're reading from are current or out of date.)
This leaves the worker primarily responsible for initiating the orchestrator (which initiates the appropriate source/read and destination/write pods), monitoring the state, and handling any additional inter…