-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when fetching many #986
Comments
Suspect code defect in ibm_db.c (of release 3.2.5) line number 15487 and also line number 15529. |
Hello @krnhotwings Based on your description, I tried running the following script to reproduce the issue:
I'm getting the expected output without any errors. As per @imavo's suggestion, I also tried modifying the code as follows:
After implementing this change, I'm still getting the expected output without any errors. Could you please confirm if there's anything I might be missing? Or does the change suggested by @imavo appear to resolve the memory leak issue? Thank you! |
My approach and code change are both different from yours. I explain further below. First, my approach was to use (on linux only, this will not work on MS-windows) the However, if your test environment is MS-Windows then you can use a different module to report the memory consumption per loop iteration, as the So my test code to try to recreate the symptom (adjusted with the
With the ibm_db release 3.2.5, for every iteration (in my case there were 109 iterations) the screen output from the above script showed that memory usage jumped by at least 6 megabytes per each loop iteration, and it never reached steady state, so went from 45mb (startpoint) to 625mb (end point) approx, and to me this showed there was a leak, and that the leak size per iteration may be similar to the size of the Next my code change is different from yours, so I should explain that my intention is that after the After the code change, the output of the script above showed that the memory consumption reached steady state and never increased for the rest of the loop. So decrementing the reference count of My code change in im_db_fetchmany and ibm_db_fetchall, differs from yours and is below:
|
@imavo I’m working in a Linux environment. I’ve now applied the correct code change, and after doing so, the output of the script showed that the memory consumption reached a steady state and didn’t increase during the remaining iterations of the loop. However, when I apply the script below, the memory consumption does not stabilize. Instead, it continues to increase with each loop iteration. Here’s the script:
The above script internally uses the fetchmany method of the ibm_db_dbi module from the python-ibmdb library to retrieve the data in chunks. Could you please let me know if I’m doing anything wrong or if there's a particular reason the memory usage keeps increasing in this case? I would appreciate any further guidance on this. Thank you! |
@bchoudhary6415 This testcode below exhibits a leak per loop iteration on the DBI fetchmany:
The reason that the code change I suggested makes no difference, is that the DBI @krhotwings states that the 3.2.3 release does not show the leak, but the 3.2.4 and 3.2.5 does show a leak with this DBI fetchmany(). I verified this. The biggest change between 3.2.3 and 3.2.4 was the debug logging but need to eliminate any other changes in called functions. So there is more than one leak problem, i.e. more than one code fix might be needed. Investigating continues. |
@bchoudhary6415
PyObject_Repr() will return a new reference This appears to resolve the DBI fetchmany() leak. I will check if this also resolves the pandas issue, but in the meanwhile you might want to work on a code change. |
@bchoudhary6415 I checked with the pandas example, after my second code chage (to dispose of the argsStr objects), and find that it also appears to resolve the leak seen with pandas, and memory consumption reaches a steady state. |
@imavo Thank you! |
@bchoudhary6415 When the debug_mode is zero it would be wise that no heap allocation happens that is exclusively for logging. In other words that PyObject_Repr() ought to be conditional on debugging being enabled. |
uname: Linux
uname -m: x86_64
PATH: /root/bin:/home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LD_LIBRARY_PATH: /usr/local/lib
Hi folks, there appears to be some kind of memory leak that was introduced in 3.2.4 when fetching. We noticed this when our airflow infrastructure would run out of memory, and we were able to pinpoint it to ibm_db 3.2.4+. 3.2.3 does NOT exhibit this issue.
Init stuff
Issues here
No issues here
It seems unusual that the new
ibm_db.fetchmany()
API and the existingibm_db_dbi.Cursor.fetchmany()
are both failing since they're independent of one another (the latter depending onibm_db.fetch_tuple()
viaibm_db_dbi.Cursor._fetch_helper()
)More background info
What we're doing is getting a sqlalchemy connection and passing that to
pandas.read_sql()
w/ a chunksize to stream ETL-related processes. Presumably,pandas.read_sql()
callssqlalchemy.engine.CursorResult.fetchmany()
, which in turn presumably callsibm_db_dbi.Cursor.fetchmany()
.Airflow 2.10.4-python3.12 installs sqlalchemy 1.4.54
Presumably, ibm_db_sa imports ibm_db_dbi
The text was updated successfully, but these errors were encountered: