Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Too Much Memory, Killed by System #1

Closed
oldcai opened this issue Feb 6, 2018 · 4 comments · Fixed by #13
Closed

Use Too Much Memory, Killed by System #1

oldcai opened this issue Feb 6, 2018 · 4 comments · Fixed by #13
Assignees
Milestone

Comments

@oldcai
Copy link
Collaborator

oldcai commented Feb 6, 2018

I'm using this project to split the .sql file to make the pg_dump dumped file in an order that backup programs can deduplicate the existing data.
The dumped file is more than 1000GB, which is a kind of big. So I guess the data may be sorted in memory, so it's easy to use out.

@akaihola
Copy link
Owner

You're right, the data for individual tables from COPY SQL commands is sorted in memory. The total size of the SQL file doesn't matter as long as individual tables fit in RAM.

One way to solve this on Unix-like systems would be to pipe COPY data into the Unix sort command instead of sorting Python lists in memory. I'd be fine with that approach, but I guess that would still leave Windows users out of luck, and we'd need separate code paths for different OS's.

@akaihola
Copy link
Owner

Also, the intended use for pg_dump_splitsort is to allow storing database dumps efficiently in version control. Storing a terabyte's worth of SQL in Git definitely doesn't make sense – @oldcai did you have another use case in mind here?

@oldcai
Copy link
Collaborator Author

oldcai commented Jun 25, 2020

Not yet. I was trying to make database backup incremental.
In that case, the bigger the file is, the more bytes need to save.

@akaihola
Copy link
Owner

akaihola commented Jul 4, 2020

@oldcai, I'm experimenting on a Python-based merge sort solution in #13.

@akaihola akaihola self-assigned this Sep 12, 2021
@akaihola akaihola linked a pull request Sep 12, 2021 that will close this issue
@akaihola akaihola added this to the 1.0.1 milestone Apr 20, 2024
@akaihola akaihola modified the milestones: 1.0.1, 1.1.0 Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants