Skip to content
This repository was archived by the owner on Dec 29, 2022. It is now read-only.

Add support for SQL Proxy connections in our workers #57

Closed
jcunhasilva opened this issue Jan 29, 2019 · 6 comments
Closed

Add support for SQL Proxy connections in our workers #57

jcunhasilva opened this issue Jan 29, 2019 · 6 comments
Assignees

Comments

@jcunhasilva
Copy link
Contributor

Currently if we want to connect to a Cloud SQL database in our workers/dags, we need to create a connection using the database's Public or Private IP.
The best way to connect to these databases is through SQL Proxy side containers as described here:
https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine

We should have a way of specifying predefined sql proxy connections to be attached to the scheduler or worker pods. We could define these connections directly in the cluster configuration.

The configuration spec should be similar as the one currently available in the base yaml configuration:

spec:
  sqlproxy:
    project: kubeflow-193622
    region: us-central1
    instance: testsql-cluster
@barney-s
Copy link
Contributor

Is the cloudsql different from the DB specified in airflowBase ?
Like if you use airflowBase with cloudsql (which is not functional at this point), all workers (celery based) connect to this instance to report back task status.
If it is the same cloudsql instance, you could use the same sqlproxy that the worker uses.
SqlProxy is deployed as part of the base cluster.

Or are you asking for a general pattern of injecting side-cars into workers.

@jcunhasilva
Copy link
Contributor Author

Yes that was my first approach (use the same sql proxy as the one being used in AirflowBase) but then I stumbled upon the Cloud SQL issue and couldn't test it further.

However, the base configuration specifies the instance that will hold the Airflow database, which in our case it's not the same instance where we want to fetch data inside our DAGs. In this case we need a different side-car for our workers.

@barney-s
Copy link
Contributor

Iam thinking if it is the right way to do it. Having a side-car would create the sqlproxy per worker.

What about creating a separate deployment just for sqlproxy to your custom cloudsql. And use that as a parameter to the Tasks.

@barney-s
Copy link
Contributor

FYI i have added a fix for cloudsql with #56

Please note when setting up cloudsql ensure the default username is postgres and password is added to the secret in the samples folder.

@jcunhasilva
Copy link
Contributor Author

@barney-s since CloudSQL is now working with Postgres, I was able to create a new Airflow database inside our existing instance and use the same SQL Proxy to connect to several databases. I guess this ticket can be closed.

Thank you for your help!

@barney-s barney-s self-assigned this Jan 30, 2019
@barney-s
Copy link
Contributor

Thanks for confirming. The bigger pattern is injecting sidecars in workers which is still open will capture in a separate doc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants