-
-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite ODS support based on loxun XMLWriter module #244
base: master
Are you sure you want to change the base?
Conversation
It uses constant memory and is a lot faster than odf and odf3 packages.
@bdauvergne , just out of curiosity, could I find the ods writer lib(Copyright (C) 2005-2016 Entr'ouvert) on pypi or github? |
This code is new, I produced it on my employer (Entr'ouvert) time, it's freely inspired by this package (http://git.entrouvert.org/wcs.git/tree/wcs/qommon/ods.py) also from Entr'ouvert which use ElementTree and so do not have bounded memory consumption for this you need a streaming XmlWriter like API. |
Thanks for your reply. I planned to copy your code to produce a specialised ods writer for pyexcel, as pyexcel-odsw. As you mentioned in this PR, odfpy and ezodf does not use constant memory in writing an ods. I hope you will be OK with my copying. For your information, messy-tables had a better performing ods reader and it inspired pyexcel-odsr. So your code is the missing puzzle to complete ods story: performant writer + performant reader. |
No problem, just keep the copyright. |
self.status = self.INSHEET | ||
self.xmlwriter.endTag() | ||
|
||
def add_cell(self, content, hint=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation here. It is not a bug or anything.
add_cell
does not support other cell data types, such as: int, float but unicode string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, small amerliorations are still possible, I would do it if i had information from the maintainer that a possible integration is possible soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent an invitation to you.
…n from https://github.com/kennethreitz/tablib/pull/244 and adapted for pyexcel
@bdauvergne Why not add loxun in the |
Just thought it was the tablib way, it contains (contained?) so much external dependencies, I did not know they were all not packaged on pypi. |
It uses constant memory and is a lot faster than odf and odf3 packages as the document is not built in memory prior to serialization. OpenDocument is a simple format that should not need many thousand lines of code and gigabytes of memory to export a simple table of tens of thousand of lines.
A temporary file is needed as zipfile does not support streaming directly into it, if it's a problem I can do it in memory with BytesIO augmenting a little bit the memory consumption.
With the current implementation it's nearly impossible to export a 100 000 lines table to ODS in a constrained memory environment (VM with 1 Gb of memory).