You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using this project to split the .sql file to make the pg_dump dumped file in an order that backup programs can deduplicate the existing data.
The dumped file is more than 1000GB, which is a kind of big. So I guess the data may be sorted in memory, so it's easy to use out.
The text was updated successfully, but these errors were encountered:
You're right, the data for individual tables from COPY SQL commands is sorted in memory. The total size of the SQL file doesn't matter as long as individual tables fit in RAM.
One way to solve this on Unix-like systems would be to pipe COPY data into the Unix sort command instead of sorting Python lists in memory. I'd be fine with that approach, but I guess that would still leave Windows users out of luck, and we'd need separate code paths for different OS's.
Also, the intended use for pg_dump_splitsort is to allow storing database dumps efficiently in version control. Storing a terabyte's worth of SQL in Git definitely doesn't make sense – @oldcai did you have another use case in mind here?
I'm using this project to split the .sql file to make the pg_dump dumped file in an order that backup programs can deduplicate the existing data.
The dumped file is more than 1000GB, which is a kind of big. So I guess the data may be sorted in memory, so it's easy to use out.
The text was updated successfully, but these errors were encountered: