Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic_direct_extract.py hangs when extracting data for 100 subjects #1

Open
BoonthichaSaejia opened this issue Sep 17, 2024 · 0 comments

Comments

@BoonthichaSaejia
Copy link

I'm experiencing an issue where the mimic_direct_extract.py script seems to hang when attempting to extract data for a small number of subjects (100 in this case).

Steps to Reproduce:

  1. Followed the setup instructions as per the repository guidelines.

  2. Executed the following command:

python3 mimic_direct_extract.py \
  --duckdb_database=${DATA_DIR}/mimic3.db \
  --duckdb_schema=main \
  --resource_path=./resources \
  --plot_hist=0 \
  --out_path=${DATA_DIR}/extract \
  --pop_size=100 \
  --extract_notes=0
  1. Observed the output and script behavior.

Expected Behavior:

The script should complete the data extraction for 100 subjects relatively quickly (ideally within a few minutes), especially since processing the entire population is expected to take around 5-10 hours.

Actual Behavior:

The script outputs: starting db query with 100 subjects...
It then appears to hang indefinitely. I've waited for over 30 minutes without any further output or progress indication.
Environment:

Operating System: Ubuntu 20.04.6 LTS
Python Version: Python 3.10.14
DuckDB Version: duckdb: command not found
Hardware Specs: Intel(R) Xeon(R) W-2155 CPU @ 3.30GHz

              total        used        free      shared  buff/cache   available
Mem:          125Gi       6.7Gi       3.9Gi        35Mi       114Gi       117Gi

Additional Information:

The database file mimic3.db is accessible and located at ${DATA_DIR}/mimic3.db.
No error messages are displayed; the script simply does not progress past the initial message.
I've verified that all file paths and environment variables are correctly set.
Running the script with --pop_size set to a larger number hasn't been tested due to time constraints.

Questions:

Is there a known issue with the script when using a small --pop_size value?
Are there any additional flags or verbosity options that can help diagnose where the script is stalling?
Could this behavior be related to system resources or configuration?

Logs:

(If applicable, include any logs or console output here)

starting db query with 100 subjects...

What I've Tried:

Restarted the script multiple times with the same result.
Checked system resource usage to see if the script is utilizing CPU or I/O, but resource usage remains minimal.
Request:

Any guidance on resolving this issue would be greatly appreciated.
If more information is needed, please let me know, and I'll provide it promptly.
Thank you for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant