Tidy up incomplete files #55

mdavis-xyz · 2025-01-29T16:27:01Z

When converting DISPATCHLOAD files to parquet, the process often dies due to running out of memory. This can result in an empty 0 byte parquet file, or other partially-written files. Even if the user reruns nemosis, the library sees a file exists, and doesn't check that it's a valid file, so the error isn't picked up until later when we try to read the file.

This can be prevented by wrapping writes like:

try:
    df.write_parquet(path)
except Exception:
    if os.path.exists(path):
        os.remove(path)
    raise

(True of CSV, feather etc)

This can also happen if the process is terminated for some other reason. (e.g. user clicks stop in jupyter.)

The text was updated successfully, but these errors were encountered:

mdavis-xyz · 2025-01-29T16:28:03Z

Hmm, actually for an out of memory error specifically, the process may be killed before python can gracefully throw an exception. But I still think it's worth adding this because some halting conditions may result in corrupt files being cleaned up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tidy up incomplete files #55

Tidy up incomplete files #55

mdavis-xyz commented Jan 29, 2025

mdavis-xyz commented Jan 29, 2025

Tidy up incomplete files #55

Tidy up incomplete files #55

Comments

mdavis-xyz commented Jan 29, 2025

mdavis-xyz commented Jan 29, 2025