Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is asyncpg doing type introspection on json types? #1206

Closed
swanysimon opened this issue Nov 1, 2024 · 8 comments · Fixed by #1243
Closed

Why is asyncpg doing type introspection on json types? #1206

swanysimon opened this issue Nov 1, 2024 · 8 comments · Fixed by #1243

Comments

@swanysimon
Copy link

  • asyncpg version: 0.30.0
  • PostgreSQL version: 15.3
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : RDS, and yes
  • Python version: 3.12.6
  • Platform: MacOS and Linux
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: No, poetry
  • If you built asyncpg locally, which version of Cython did you use?: n/a
  • Can the issue be reproduced under both asyncio and uvloop?: Have not tried. Happy to if you think it would be beneficial

Spinning out of #1138 (comment) because it feels like a different discussion.


I'm running a FastAPI service that connects to AWS RDS, and needs to refresh credentials every 15 minutes. Normally, the type introspection queries don't take up much time because they run once per connection, but I have a lot of churn in my connection pool so run them a decent number of times. Recently I'm seen more traffic and thus more connections being created, and with more connections, the more often we're likely to see slow queries on things that are normally fast.

At a very high level, my service is set to connect to the database with:

engine = create_async_engine(
    postgres_url(use_asyncpg=True),
    pool_size=10,
    max_overflow=25,
    pool_recycle=600,  # IAM credentials expire after 15 mins
    pool_pre_ping=True,
)

@event.listens_for(engine.sync_engine, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams) -> None:
    cparams["password"] = boto3.client("rds").generate_db_auth_token(
        config.POSTGRES_HOST, config.POSTGRES_PORT, config.POSTGRES_USER,
    )

Even abnormally slow type introspection queries aren't horrible but they are noticeable, as in the example below these 2 queries took more than 50% of the service's total response time.

Screenshot 2024-10-31 at 17 57 49

Debugging locally a little with command: ["postgres", "-c", "log_statement=all"] in my docker-compose.yml, I can see what type asyncpg needs to examine:

2024-11-01 20:52:52.239 UTC [491] LOG:  execute __asyncpg_stmt_1__: SELECT
            t.oid,
            t.typelem     AS elemtype,
            t.typtype     AS kind
        FROM
            pg_catalog.pg_type AS t
        WHERE
            t.oid = $1

2024-11-01 20:52:52.239 UTC [491] DETAIL:  parameters: $1 = '114'
2024-11-01 20:52:52.240 UTC [491] LOG:  execute __asyncpg_stmt_2__: SELECT
            t.oid,
            t.typelem     AS elemtype,
            t.typtype     AS kind
        FROM
            pg_catalog.pg_type AS t
        WHERE
            t.oid = $1

2024-11-01 20:52:52.240 UTC [491] DETAIL:  parameters: $1 = '3802'

These correspond to the JSON and JSONB types, respectively, not even custom types.


The actual question: how can I pre-register the JSON and JSONB types in each connection so I don't have to keep running the introspection query? I've tried the json_{de,}serializer argument to the SQLAlchemy engine, as well as trying to hook into SQLAlchemy events to intercept connection creation and set the codecs.

@nathan-gage
Copy link

bumping this -- we are experiencing this issue as well. the strange part is the latency is ok for sometime, before completely shitting the bed:

CleanShot 2024-11-12 at 14 30 51

@swanysimon
Copy link
Author

Hey team, would appreciate any thoughts on how this could be accomplished as we're seeing this problem scale with the number of connections. I took a little bit of a stab in the codebase here to see if I could get something working but believe I'm firmly out of my depth.

@swanysimon
Copy link
Author

Some more looking: it seems that JSON and JSONB should already be in the type map. So the question remains: why is there type introspection happening? Is it because my Python type for the column is Mapped[dict[str, Any]] instead of str?

@suhjohn
Copy link

suhjohn commented Mar 14, 2025

#1138
#1242
#1082
Despite this, it's not fixed

@elprans
Copy link
Member

elprans commented Mar 14, 2025

This seems to me like an RDS/Aurora bug where access to the system catalog is sometimes this slow. I mean this:

SELECT
            t.oid,
            t.typelem     AS elemtype,
            t.typtype     AS kind
        FROM
            pg_catalog.pg_type AS t
        WHERE
            t.oid = $1

is a trivial query looking thing up by an indexed key. It should run in well under a millisecond.

That said, it should be possible to avoid running it when setting codecs on builtin types.

elprans added a commit that referenced this issue Mar 15, 2025
Type codec setup functions will no longer attempt to introspect the type
if it's one of the known builtin types.

Fixes: #1206
Fixes: #1138
Fixes: #1242
elprans added a commit that referenced this issue Mar 15, 2025
Type codec setup functions will no longer attempt to introspect the type
if it's one of the known builtin types.

Fixes: #1206
Fixes: #1138
Fixes: #1242
elprans added a commit that referenced this issue Mar 15, 2025
Type codec setup functions will no longer attempt to introspect the type
if it's one of the known builtin types.

Fixes: #1206
Fixes: #1138
Fixes: #1242
@antonmyronyuk
Copy link
Contributor

@elprans Thank you! Looking forward to the next release!

@swanysimon
Copy link
Author

This seems to me like an RDS/Aurora bug where access to the system catalog is sometimes this slow

My understanding is that Aurora is a server less architecture, and so can be victim to cold starts. This is likely what we're seeing, but exactly as you say:

it should be possible to avoid running it when setting codecs on builtin types

I can't say I fully understand the change but I'm excited to see these queries disappear from my dashboard. Thank you!

For the future, if I did have a custom type in the database, is there a way I could I preregister it with asyncpg on connection creation and get the same functionality I get from SQLAlchemy's pools?

@elprans
Copy link
Member

elprans commented Mar 19, 2025

For the future, if I did have a custom type in the database, is there a way I could I preregister it with asyncpg on connection creation and get the same functionality I get from SQLAlchemy's pools?

Yes, this is what the init callback argument on create_pool is for (the documentation even states explicitly that it's where you should register custom codecs in).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants