You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Digital Earth Australia provides a Python Environment on the NCI which includes ODC. It's currently running the very old version 1.8.8 under Python 3.8.13 and we want to upgrade to the latest version. We've created a new environment using ODC core version 1.8.17 running under Python 3.10.8, but in testing we've been getting errors which weren't occuring before.
This issue was reported and discussed internally in Jira Issue DSNS-304, and included code to reproduce the error.
Error
The problem was reported as an unhandled PostgreSQL exception being thrown in a production script, which used to work cleanly.
Original code and stack trace from error
When executing the following script with the new module load dea/20231204
#!/usr/bin/env python3importdatacubemessage="""Will this work? Will this cause the new dea module to error out?"""defmain(product="ga_ls8c_ard_3"):
dc=datacube.Datacube(app="gen-list")
forl1_datasetindc.index.datasets.search(product=product):
withopen('io.txt', 'a') asfile:
file.write(message+'\n')
if__name__=="__main__":
main(product="ga_s2am_ard_3")
Stack Trace
Traceback (most recent call last):
File "/g/data/u46/users/dsg547/sandbox/ard_pipeline_support/tickets/DSNS-304/example/./odc_call.py", line 18, in <module>
main(product="ga_s2am_ard_3")
File "/g/data/u46/users/dsg547/sandbox/ard_pipeline_support/tickets/DSNS-304/example/./odc_call.py", line 14, in main
for l1_dataset in dc.index.datasets.search(product=product):
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_datasets.py", line 548, in search
for product, datasets in self._do_search_by_product(query,
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_datasets.py", line 691, in _do_search_by_product
with self._db_connection() as connection:
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/_transaction.py", line 60, in _db_connection
with self._index._active_connection(transaction=transaction) as conn:
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/index/postgres/index.py", line 157, in _active_connection
with self._db._connect() as conn:
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/datacube/drivers/postgres/_connections.py", line 231, in _connect
connection.close()
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1251, in close
conn.close()
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1166, in close
self._checkin()
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1008, in _checkin
_finalize_fairy(
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 800, in _finalize_fairy
connection_record.checkin()
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 542, in checkin
finalizer(connection)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 677, in _reset_characteristics
characteristic.reset_characteristic(self, dbapi_connection)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/characteristics.py", line 50, in reset_characteristic
dialect.reset_isolation_level(dbapi_conn)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 748, in reset_isolation_level
self.set_isolation_level(dbapi_conn, self.default_isolation_level)
File "/g/data/v10/public/modules/dea/20231204/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 851, in set_isolation_level
connection.set_isolation_level(level)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I believe @Ariana-B did some investigation resulting in #1525 , especially in relation to the connection.set_isolation_level() changes, but it hasn't yet been resolved.
New Findings
I've run some more tests today, and have more of an idea of what the issue is.
The network configuration at the NCI drops idle TCP connections after 5 minutes. This is extremely short, the normal default is 2 hours.
The above code first loads 4Gb of database results into memory (about 3 minutes), then slowly writes lines to disk IO while iterating over DB results (more than 10 minutes for the ~500k results).
Under ODC 1.8.8, the script then exits cleanly.
Under ODC 1.8.17, something attempts to use the db connection to change the isolation level. That fails because the network connection was idle for more than 5 minutes.
I'm not sure whether the change is ODC or SQLAlchemy or psycopg2, but I suspect it's ODC.
I'm about to write up a proposed improvement in a new issue, to allow passing arbitrary connection parameters to libpq, which will allow working around the TCP idle timeout issue here.
The text was updated successfully, but these errors were encountered:
As discussed, any connection parameter that can be included in the database URL can already be user-specfied in 1.9 thanks to the configuration layer changes from EP10.
Background
Digital Earth Australia provides a Python Environment on the NCI which includes ODC. It's currently running the very old version 1.8.8 under Python 3.8.13 and we want to upgrade to the latest version. We've created a new environment using ODC core version 1.8.17 running under Python 3.10.8, but in testing we've been getting errors which weren't occuring before.
This issue was reported and discussed internally in Jira Issue DSNS-304, and included code to reproduce the error.
Error
The problem was reported as an unhandled PostgreSQL exception being thrown in a production script, which used to work cleanly.
Original code and stack trace from error
When executing the following script with the new
module load dea/20231204
Stack Trace
I believe @Ariana-B did some investigation resulting in #1525 , especially in relation to the
connection.set_isolation_level()
changes, but it hasn't yet been resolved.New Findings
I've run some more tests today, and have more of an idea of what the issue is.
I'm not sure whether the change is ODC or SQLAlchemy or psycopg2, but I suspect it's ODC.
I'm about to write up a proposed improvement in a new issue, to allow passing arbitrary connection parameters to
libpq
, which will allow working around the TCP idle timeout issue here.The text was updated successfully, but these errors were encountered: