Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

underscores in project name kills database performance #1725

Open
XzzX opened this issue Jan 9, 2025 · 6 comments
Open

underscores in project name kills database performance #1725

XzzX opened this issue Jan 9, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@XzzX
Copy link
Contributor

XzzX commented Jan 9, 2025

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'test%';
Time: 0.481 ms

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te_st%';
Time: 394.972 ms

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te\_st%';
Time: 0.607 ms

_ stands for any single character in LIKE statements. https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE
To counter this we need to escape any unwanted % and _ characters.

@XzzX XzzX added the bug Something isn't working label Jan 9, 2025
@jan-janssen jan-janssen linked a pull request Jan 13, 2025 that will close this issue
@jan-janssen
Copy link
Member

Hi @XzzX,

I looked into this and noticed that there are two possible options to change this behavior, either:
https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L337
or:
https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L956

But in both cases it fails in our test suits as we are using SQLite. Maybe you can still give it a try on PostgreSQL.

Best,

Jan

@XzzX
Copy link
Contributor Author

XzzX commented Jan 13, 2025

If it breaks SQLite we should guard it with the TYPE configuration variable? Or do you want to drop support for SQLite?

@XzzX
Copy link
Contributor Author

XzzX commented Jan 13, 2025

@pmrv is the better person to test it. I just saw this kind of queries in the log for long running queries. But I do not know where the originate from.

@pmrv
Copy link
Contributor

pmrv commented Jan 13, 2025

I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form

conn.execute("SELECT foo FROM bar where project = :name", name=...)

rather than

conn.execute("SELECT foo FROM bar where project = %s" % name)

that we currently do, but it wasn't straightforward in the ten minutes I gave it.

@XzzX
Copy link
Contributor Author

XzzX commented Jan 14, 2025

I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form

conn.execute("SELECT foo FROM bar where project = :name", name=...)

rather than

conn.execute("SELECT foo FROM bar where project = %s" % name)

that we currently do, but it wasn't straightforward in the ten minutes I gave it.

Alone for security reasons we should do this. Due to RLS not as bad but this is prone to SQL injection.

@pmrv
Copy link
Contributor

pmrv commented Jan 15, 2025

Turns out I was wrong, we do construct the queries properly in get_items_dict, but not in an older method that somehow was still around. I've removed it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants