Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat(metrics): PoC of running metrics indexer migrations against spanner #37532

Closed
wants to merge 3 commits into from

Conversation

untitaker
Copy link
Member

@untitaker untitaker commented Aug 8, 2022

Quick test as to whether the Django ORM can be used with spanner. The answer is yes, I think. We will have to get rid of sequences, otherwise it Just Works.

As is tradition with Google products, there are two solutions to using Django with Cloud Spanner. When creating a database, one has to specify whether the database is supposed to run in postgres or "google standard sql" mode. This setting cannot be changed after, at least not through the web UI.

I suppose with PostgreSQL mode, one could use psycopg2 and psql and queries are translated serverside, but this PR is actually using the django-google-spanner Django app which constructs "standard sql" queries clientside. So there is no actual postgres dialect involved.

I followed https://cloud.google.com/blog/topics/developers-practitioners/introducing-django-cloud-spanner-database to set this up. In order to try this PR:

  1. make install-py-dev and pip install git+https://github.com/googleapis/python-spanner-django

  2. create database in cloud console, do not set the dialect type to postgres.

  3. follow https://cloud.google.com/docs/authentication/production to get a JSON file that contains your credentials

  4. set the following environment variables in your shell:

    GOOGLE_CLOUD_PROJECT=search-and-storage
    GOOGLE_APPLICATION_CREDENTIALS=/Users/untitaker/Downloads/search-and-storage-XXXX.json

  5. run sentry django migrate --database metrics-spanner and sentry django migrate

  6. get and set some strings

todo

  • decide what to do with sequences. the django_spanner driver just rolls a random ID for any Django AutoField, through monkeypatching Django.
  • fix existing metrics-indexer migrations
  • I am not sure if the django_migrations table is supposed to exist on the spanner db? isn't one in postgres enough?
  • fix projectcounter migration so it actually works with multiple dbs at all (not sure how we manage to deal with this in prod)
  • investigate the actual postgresql mode of spanner. Apparently the compat layer is a separate Java app? not sure https://cloud.google.com/spanner/docs/psql-connect
  • all backend tests are broken because something in pytest-django doesn't work correctly on test setup

@untitaker
Copy link
Member Author

untitaker commented Aug 8, 2022

The official PGAdapter from google cloud, in combination with Django, delivers this:

django.db.utils.InternalError: RaiseException('INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: [ERROR] relation "pg_catalog.pg_class" does not exist - Statement: \'SELECT c.relname,\n            CASE WHEN FALSE THEN \'p\' WHEN c.relkind IN (\'m\', \'v\') THEN \'v\' ELSE \'t\' END\n            FROM pg_catalog.pg_class c\n            LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace\n            WHERE c.relkind IN (\'f\', \'m\', \'p\', \'r\', \'v\')\n                AND n.nspname NOT IN (\'pg_catalog\', \'pg_toast\')\n                AND pg_catalog.pg_table_is_visible(c.oid)\'\n')
SQL:
            SELECT c.relname,
            CASE WHEN FALSE THEN 'p' WHEN c.relkind IN ('m', 'v') THEN 'v' ELSE 't' END
            FROM pg_catalog.pg_class c
            LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
            WHERE c.relkind IN ('f', 'm', 'p', 'r', 'v')
                AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
                AND pg_catalog.pg_table_is_visible(c.oid)

So I think it may be better to use django-spanner here, even though it does suspicious things like monkeypatching django's autofield

here is how to get pgadapter running locally:

docker run -d -p 5050:5432     -v /Users/untitaker/Downloads/search-and-storage-XXXX.json:/acct_credentials.json     gcr.io/cloud-spanner-pg-adapter/pgadapter:latest     -p search-and-storage -i markus-test-spanner-pg -d markus-test-spanner-pg-db     -c /acct_credentials.json -q -x

patch on top of this PR is just:

     "metrics-spanner": {
-        "ENGINE": "django_spanner",
+        "ENGINE": "sentry.db.postgres",
         "PROJECT": "search-and-storage",
-        "INSTANCE": "markus-test-spanner",
-        "NAME": "markus-test-spanner-db",
+        "INSTANCE": "markus-test-spanner-pg",
+        "NAME": "markus-test-spanner-db-pg",
+        "PORT": "5050",
+        "HOST": "localhost",
     },
 }

@@ -66,7 +66,7 @@ class BoundedBigAutoField(models.AutoField): # type: ignore
MAX_VALUE = 9223372036854775807

def db_type(self, connection: BaseDatabaseWrapper) -> str:
return "bigserial"
return "int64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be fixed. somehow this type makes it into the generated google standard sql statement without any sort of translation. So we need to either make db_type check the database backend somehow, or stop using our own bigautofield.

@@ -102,4 +102,4 @@ phabricator>=0.7.0
# test dependencies, but unable to move to requirements-test until
# sentry.utils.pytest and sentry.testutils are moved to tests/
selenium>=4.1.5
sqlparse>=0.2.4
sqlparse>=0.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to import this as well ?
https://github.com/googleapis/python-spanner-django

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but i couldn't figure out how to add a git dependency in our requirements. We need a fix from master to get anywhere at all

Comment on lines 149 to 154
"metrics-spanner": {
"ENGINE": "django_spanner",
"PROJECT": "search-and-storage",
"INSTANCE": "markus-test-spanner",
"NAME": "markus-test-spanner-db",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect to move this out of the self hosted settings ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about it yet. Perhaps we can move this database config into getsentry entirely again, where the db router also already exists.

@@ -27,16 +27,4 @@ class Migration(CheckedMigration):
("sentry", "0283_extend_externalissue_key"),
]

operations = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw sql statements mentioning sequence don't work. We cannot run this migration as-is, so I think we would need to edit it to skip doing that for spanner, or create a new table name etc for spanner where this migration doesn't run.

@untitaker
Copy link
Member Author

This PR would be blocked on googleapis/python-spanner-django#784 and it's not clear whether we want to invest as much into spanner to fix that issue. Postponing.

@github-actions
Copy link
Contributor

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@untitaker
Copy link
Member Author

Let's revisit when we actually need it. I don't think this is safe to merge in any capacity. Even if we end up moving to spanner, unclear if we want ot use this package.

@untitaker untitaker closed this Aug 31, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Sep 16, 2022
@asottile-sentry asottile-sentry deleted the feat/cloud-spanner-postgres-indexer-poc branch December 1, 2023 16:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants