Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up incomplete files #55

Open
mdavis-xyz opened this issue Jan 29, 2025 · 1 comment
Open

Tidy up incomplete files #55

mdavis-xyz opened this issue Jan 29, 2025 · 1 comment

Comments

@mdavis-xyz
Copy link
Contributor

When converting DISPATCHLOAD files to parquet, the process often dies due to running out of memory. This can result in an empty 0 byte parquet file, or other partially-written files. Even if the user reruns nemosis, the library sees a file exists, and doesn't check that it's a valid file, so the error isn't picked up until later when we try to read the file.

This can be prevented by wrapping writes like:

try:
    df.write_parquet(path)
except Exception:
    if os.path.exists(path):
        os.remove(path)
    raise

(True of CSV, feather etc)

This can also happen if the process is terminated for some other reason. (e.g. user clicks stop in jupyter.)

@mdavis-xyz
Copy link
Contributor Author

Hmm, actually for an out of memory error specifically, the process may be killed before python can gracefully throw an exception. But I still think it's worth adding this because some halting conditions may result in corrupt files being cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant