Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QgsPostgresConn::retrieveLayerTypes scans full table by default #58742

Open
pramsey opened this issue Sep 13, 2024 · 1 comment
Open

QgsPostgresConn::retrieveLayerTypes scans full table by default #58742

pramsey opened this issue Sep 13, 2024 · 1 comment

Comments

@pramsey
Copy link

pramsey commented Sep 13, 2024

Feature description

const QString tableScanLimit { useEstimatedMetadata ? QStringLiteral( " LIMIT %1" ).arg( GEOM_TYPE_SELECT_LIMIT ) : QString() };

A full scan of the tables is the default action when connecting to a PostGIS database, and when a database has some very large tables, or FDW tables (slow to query sometimes) that can result in a hung QGIS. Probably "full scan" is the wrong default, and maybe never the right answer.

  • At a first go, just having "Use estimated table metadata" checked instead of unchecked would be an improvement.
  • Instead of taking "first 100 rows", using the PostgreSQL TABLESAMPLE facility can get a more representative sample at a lower cost, which might make having "default is sample" more palatable in terms of outputs.

Additional context

No response

@bertday
Copy link

bertday commented Sep 27, 2024

I came to report the same behavior and stumbled on this issue. When I connect to a Postgres DB containing tables with lots of rows, it takes quite a while to list the tables in the QGIS browser.

I took a look in pg_stat_activity and saw QGIS is making a query like this for every table in the database:

SELECT 0, array_agg(DISTINCT st_srid("geom"::geometry)::text || ':' || UPPER(geometrytype("geom"::geometry))  || ':' || ST_Zmflag("geom"::geometry))  FROM (SELECT "geom" FROM "public"."a_big_table_with_lots_of_rows") AS _unused;

I believe that's the same full scan @pramsey described above, but the underlying SQL. I can see how it makes sense for a complete understanding of the data, but would agree with Paul that the overhead may not be justified in most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants