You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame.
I tested with 500k rows, and the DataFrame is not exactly the problem, it consumed around 500MB of memory, the problem is the amount of rows the browser needs to render (my Brave tab was taking 2GB to render 500k rows), so the Lab UI slows down considerably. I've also observed during tests that in some runs, pexpect times out whilst waiting for a data from the MariaDB client (in this case the timeout settings are too low).
A potential solution could be to introduce a new config option which will specify a limit for each SELECT statement, this should have a default value of something like 50k rows.
There should also be a magic command that issues the SELECT and writes the output directly on disk, we want users to be able to chart large datasets.
The tricky part is to make the charting magic commands work efficiently if a large result set is written on disk.
The text was updated successfully, but these errors were encountered:
i ran into the timeout as well.
looks like pexpect has timeout=30 defaults on their run functions and on their constructors.
probably a good start would be to actually pass our own timeout variables (on our end fortunately defaulting to -1) to them.
If you SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame.
I tested with 500k rows, and the DataFrame is not exactly the problem, it consumed around 500MB of memory, the problem is the amount of rows the browser needs to render (my Brave tab was taking 2GB to render 500k rows), so the Lab UI slows down considerably. I've also observed during tests that in some runs, pexpect times out whilst waiting for a data from the MariaDB client (in this case the timeout settings are too low).
A potential solution could be to introduce a new config option which will specify a limit for each SELECT statement, this should have a default value of something like 50k rows.
There should also be a magic command that issues the SELECT and writes the output directly on disk, we want users to be able to chart large datasets.
The tricky part is to make the charting magic commands work efficiently if a large result set is written on disk.
The text was updated successfully, but these errors were encountered: