Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added stream to CSV #337

Merged
merged 2 commits into from
Jun 28, 2019
Merged

Added stream to CSV #337

merged 2 commits into from
Jun 28, 2019

Conversation

ZuluPro
Copy link
Contributor

@ZuluPro ZuluPro commented Aug 8, 2018

Hello
Here's a proof of concept of what I talk in #207 .
This PR just add the possibility to get a stream or its content.
I didn't change the API, just added seek(0), to have a file ready to use.

I use the same code here https://github.com/django-import-export/django-import-export/pull/821/files#diff-9c335e3895cab39c4af9e510b328c2f0R157
It's a Django app for import/export from DB.

To have a good integration we need to get file-like object.

@ZuluPro ZuluPro mentioned this pull request Aug 8, 2018
@timofurrer
Copy link
Member

Could you please rebase onto master? Thanks 🎉

@ZuluPro
Copy link
Contributor Author

ZuluPro commented Mar 30, 2019

Hey @timofurrer

I rebased from master.

@claudep
Copy link
Contributor

claudep commented Mar 30, 2019

Could that feature be tested to avoid regressions?

@ZuluPro
Copy link
Contributor Author

ZuluPro commented Mar 31, 2019

@claudep
I'm gonna write tests of course ;)
But can you tell me that this will be ± the final API for this feature ?

@claudep
Copy link
Contributor

claudep commented Mar 31, 2019

I'm no maintainer, but if a maintainer asked you to rebase, that's a good sign the API is fine!

@timofurrer
Copy link
Member

the final API for this feature ?

There won't be a final API - but we'll adjust the versioning accordingly to API breaks, so we don't break your code.
However, for now it looks good to me 🎉

@ZuluPro
Copy link
Contributor Author

ZuluPro commented Jun 27, 2019

@claudep @timofurrer
I added a test

@frostming frostming merged commit d25d24a into jazzband:master Jun 28, 2019
@misli
Copy link

misli commented Jan 10, 2023

In order to do a real stream, there also need to be a streaming input and streaming processing. Recently I've implemented following StreamingDataset, which is (in cooperation with custom streaming formats) capable of working with iterator (generator) as data. It allows me to export large database tables with constant memory requirements.

class SubscriptableIterable:
    def __init__(self, iterable):
        iterator = iter(iterable)
        self._first_item = next(iterator)
        self._iter = chain([self._first_item], iterator)

    def __getitem__(self, index):
        if index == 0:
            return self._first_item
        raise IndexError

    def __iter__(self):
        return self

    def __next__(self):
        return next(self._iter)


class StreamingDataset(Dataset):
    def __init__(self, data=None, **kwargs):
        super().__init__(**kwargs)

        if data is not None:
            self.set_data(data)

    def set_data(self, data):
        self._data = _SubscriptableIterable(map(Row, data))

    def __iter__(self):
        return self

    def __next__(self):
        return next(self._data)

    def __repr__(self):
        try:
            return "<%s streaming dataset>" % (self.title.lower())
        except AttributeError:
            return "<streaming dataset>"

    def _apply_formatters(self):
        if not self._formatters:
            yield from self._data
        for row in self._data:
            for col, callback in self._formatters:
                try:
                    if col is None:
                        for j, c in enumerate(row):
                            row[j] = callback(c)
                    else:
                        row[col] = callback(row[col])
                except IndexError:
                    raise InvalidDatasetIndex
                yield row

    def _package(self, dicts=True, ordered=True):
        """Packages Dataset into lists of dictionaries for transmission."""

        if ordered:
            dict_pack = OrderedDict
        else:
            dict_pack = dict

        data = self._apply_formatters()

        if self.headers:
            if dicts:
                data = (dict_pack(zip(self.headers, data_row)) for data_row in data)
            else:
                data = chain([self.headers], data)

        return data

@harkabeeparolus
Copy link

If you want to stream large sets of data, I would recommend looking into petl, which is also a pure Python tabular data library.

https://github.com/petl-developers/petl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants