-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No option to append to tables instead of truncating them #8
Comments
Fixed the issue by overwriting the copy functions, i wasn't expecting the truncate to be deleting the full table.
Can you explain why the need of using of both |
@ziedbouf Apologies for the late reply - we seemed to have missed this! Our use case is mostly for importing data from scratch into postgres, so for us it was a natural default choice, but I see how it doesn't make sense for a lot of other use cases! Overwriting truncate() with an empty function is a decent option for you here. The reason we drop the keys and re-create them is that having indexes in the table slows down insertion speed, especially during bulk loads, by a large amount - so much that dropping and regenerating at the end is often much faster for large loads. In your scenario, if you're importing many source tables into one destination, of course dropping and recreating indexes after loading each source table doesn't make sense. One way to do this could be to load all the source tables beforehand and concatenate them into one large dataframe with pd.concat, and then load that one. The other way would be to look at the implementation of DataFrameCopy here:
and instead write your own child class of BaseCopy that drops keys, loads multiple tables, then recreates them. The code for that part isn't overly complex. Good luck! |
(keeping this ticket open as a reminder to implement support for this) |
Thanks @makmanalp in my case i need multiple large file that i cannot consolidate these csv files in one pandas dataframe. In these case truncate will be creating new table each time I agree tweaking the |
More notes for us: perhaps we could support
|
I agree on this. However i think that the chunk size should be optimized between having small/big chunk size to get the best in performance. I found the following article interesting. The author stated the following:
What's the best approach to scale the chunk size considering the overall size of data? it might something to explore |
Made a PR based on code above for similar use cases: |
Thanks for the project it helps with the slow process for Dataframe
to_sql
and it's more straight to proceed with odoo or others library.For now i have an issue that i am doing dataframe copy inside a for loop but it seems to overwrite table each time it push data.
Trying to figure out if i missing any options to
append
similair to the to_sql or i need to manage the commit myself, but no clue for now any help on how to solve this?The text was updated successfully, but these errors were encountered: