Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading the same schema from PostgreSQL takes a lot longer than from MSSQL #146

Open
Arnagos opened this issue Sep 26, 2022 · 4 comments
Open

Comments

@Arnagos
Copy link

Arnagos commented Sep 26, 2022

I was running my test suite and noticed that the PostgreSQL tests take A LOT longer than the MSSQL tests.
To verify this I wrote a short benchmark with BenchmarkDotNet and got these results:

MSSQL

Method Mean Error StdDev Median Gen0 Gen1 Allocated
ReadAll 765.5 ms 18.91 ms 54.85 ms 750.3 ms 2000.0000 1000.0000 14.47 MB
AllTables 456.9 ms 12.77 ms 37.45 ms 457.6 ms 2000.0000 1000.0000 13.98 MB

PostgreSQL

Method Mean Error StdDev Gen0 Gen1 Allocated
ReadAll 2.835 s 0.0536 s 0.0638 s 1000.0000 - 7.03 MB
AllTables 2.653 s 0.0485 s 0.0454 s - - 4.48 MB

As you can see both ReadAll and AllTables take a lot longer for PostgreSQL (3.7x and 5.8x respectively).
Knowing the performance of PostgreSQL this result is really surprising to me and seems rather strange.

Based on my profiling the culprits seem to be the PrimaryKeys (1.2s) and ForeignKeys (1s) methods of the PostgreSQLAdapter.

@Arnagos
Copy link
Author

Arnagos commented Sep 26, 2022

I took a quick look at the SQL used to load the constraints and think a faster solution to the problem would be to load all the constraints simultaneously instead of executing the same (very expensive) SQL multiple times. Every run of the constraints SQL takes 70-100ms, which results in the above numbers if applied to every single loaded table and constraint type. Loading all constraints at once on the other hand only takes ~200ms and results in a few hundred results that can then easily be filtered in C#.

@martinjw
Copy link
Owner

ReadAll internally calls AllTables, which internally calls TableBuilder.

TableBuilder does seeral calls on constraints- for the different types (primary key, foreign key, unique, check, null). For all, it does all tables - so it shouldn't be doing anything table by table.

For PostgreSql, 3 are the constraints use the same query (primary, foreign, unique), so yes, that is repeated, but maximum 3 times irregardless of number of tables. It may be possible to optimize this in the postgreSqlAdaptor class down to one, which is cached and reused between the (consecutive) calls- although you wouldn't want that cache to remain in memory.

@Arnagos
Copy link
Author

Arnagos commented Oct 5, 2022

Do you have an idea, why PostgreSQL takes more than thrice the time of MSSQL? Ordinarily, one'd expect the opposite ...

@martinjw
Copy link
Owner

martinjw commented Oct 5, 2022

I guess the metadata tables are not optimized - for example, no indexes on schema or table name, so every query is a full table scan. Also sometimes the metadata tables contain vast amounts of system data- Oracle is a bad offender here, it is by far the slowest to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants