Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ODS support based on loxun XMLWriter module #244

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bdauvergne
Copy link

It uses constant memory and is a lot faster than odf and odf3 packages as the document is not built in memory prior to serialization. OpenDocument is a simple format that should not need many thousand lines of code and gigabytes of memory to export a simple table of tens of thousand of lines.

A temporary file is needed as zipfile does not support streaming directly into it, if it's a problem I can do it in memory with BytesIO augmenting a little bit the memory consumption.

With the current implementation it's nearly impossible to export a 100 000 lines table to ODS in a constrained memory environment (VM with 1 Gb of memory).

It uses constant memory and is a lot faster than odf and odf3 packages.
@chfw
Copy link

chfw commented Aug 3, 2018

@bdauvergne , just out of curiosity, could I find the ods writer lib(Copyright (C) 2005-2016 Entr'ouvert) on pypi or github?

@bdauvergne
Copy link
Author

This code is new, I produced it on my employer (Entr'ouvert) time, it's freely inspired by this package (http://git.entrouvert.org/wcs.git/tree/wcs/qommon/ods.py) also from Entr'ouvert which use ElementTree and so do not have bounded memory consumption for this you need a streaming XmlWriter like API.

@chfw
Copy link

chfw commented Aug 4, 2018

Thanks for your reply.

I planned to copy your code to produce a specialised ods writer for pyexcel, as pyexcel-odsw. As you mentioned in this PR, odfpy and ezodf does not use constant memory in writing an ods. I hope you will be OK with my copying.

For your information, messy-tables had a better performing ods reader and it inspired pyexcel-odsr. So your code is the missing puzzle to complete ods story: performant writer + performant reader.

@bdauvergne
Copy link
Author

No problem, just keep the copyright.

self.status = self.INSHEET
self.xmlwriter.endTag()

def add_cell(self, content, hint=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation here. It is not a bug or anything.

add_cell does not support other cell data types, such as: int, float but unicode string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, small amerliorations are still possible, I would do it if i had information from the maintainer that a possible integration is possible soon.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent an invitation to you.

@frallain
Copy link

@bdauvergne Why not add loxun in the requirements.txt as it is available on pypi at https://pypi.org/project/loxun/ instead of copy pasting the whole file in the tablib project?

@bdauvergne
Copy link
Author

Just thought it was the tablib way, it contains (contained?) so much external dependencies, I did not know they were all not packaged on pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants