Added stream to CSV #337

ZuluPro · 2018-08-08T14:11:45Z

Hello
Here's a proof of concept of what I talk in #207 .
This PR just add the possibility to get a stream or its content.
I didn't change the API, just added seek(0), to have a file ready to use.

I use the same code here https://github.com/django-import-export/django-import-export/pull/821/files#diff-9c335e3895cab39c4af9e510b328c2f0R157
It's a Django app for import/export from DB.

To have a good integration we need to get file-like object.

timofurrer · 2019-03-02T12:04:52Z

Could you please rebase onto master? Thanks 🎉

ZuluPro · 2019-03-30T19:11:32Z

Hey @timofurrer

I rebased from master.

claudep · 2019-03-30T19:14:35Z

Could that feature be tested to avoid regressions?

ZuluPro · 2019-03-31T13:13:52Z

@claudep
I'm gonna write tests of course ;)
But can you tell me that this will be ± the final API for this feature ?

claudep · 2019-03-31T13:24:57Z

I'm no maintainer, but if a maintainer asked you to rebase, that's a good sign the API is fine!

timofurrer · 2019-04-12T15:45:35Z

the final API for this feature ?

There won't be a final API - but we'll adjust the versioning accordingly to API breaks, so we don't break your code.
However, for now it looks good to me 🎉

ZuluPro · 2019-06-27T22:40:43Z

@claudep @timofurrer
I added a test

misli · 2023-01-10T09:38:17Z

In order to do a real stream, there also need to be a streaming input and streaming processing. Recently I've implemented following StreamingDataset, which is (in cooperation with custom streaming formats) capable of working with iterator (generator) as data. It allows me to export large database tables with constant memory requirements.

class SubscriptableIterable:
    def __init__(self, iterable):
        iterator = iter(iterable)
        self._first_item = next(iterator)
        self._iter = chain([self._first_item], iterator)

    def __getitem__(self, index):
        if index == 0:
            return self._first_item
        raise IndexError

    def __iter__(self):
        return self

    def __next__(self):
        return next(self._iter)


class StreamingDataset(Dataset):
    def __init__(self, data=None, **kwargs):
        super().__init__(**kwargs)

        if data is not None:
            self.set_data(data)

    def set_data(self, data):
        self._data = _SubscriptableIterable(map(Row, data))

    def __iter__(self):
        return self

    def __next__(self):
        return next(self._data)

    def __repr__(self):
        try:
            return "<%s streaming dataset>" % (self.title.lower())
        except AttributeError:
            return "<streaming dataset>"

    def _apply_formatters(self):
        if not self._formatters:
            yield from self._data
        for row in self._data:
            for col, callback in self._formatters:
                try:
                    if col is None:
                        for j, c in enumerate(row):
                            row[j] = callback(c)
                    else:
                        row[col] = callback(row[col])
                except IndexError:
                    raise InvalidDatasetIndex
                yield row

    def _package(self, dicts=True, ordered=True):
        """Packages Dataset into lists of dictionaries for transmission."""

        if ordered:
            dict_pack = OrderedDict
        else:
            dict_pack = dict

        data = self._apply_formatters()

        if self.headers:
            if dicts:
                data = (dict_pack(zip(self.headers, data_row)) for data_row in data)
            else:
                data = chain([self.headers], data)

        return data

harkabeeparolus · 2023-01-20T19:17:53Z

If you want to stream large sets of data, I would recommend looking into petl, which is also a pure Python tabular data library.

https://github.com/petl-developers/petl

ZuluPro mentioned this pull request Aug 8, 2018

Streaming responses #207

Open

Added stream to CSV

f55f56a

ZuluPro force-pushed the stream branch from 36d3a40 to f55f56a Compare March 30, 2019 19:09

Added CSV stream test

513bba2

frostming merged commit d25d24a into jazzband:master Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added stream to CSV #337

Added stream to CSV #337

ZuluPro commented Aug 8, 2018

timofurrer commented Mar 2, 2019

ZuluPro commented Mar 30, 2019

claudep commented Mar 30, 2019

ZuluPro commented Mar 31, 2019 •

edited

Loading

claudep commented Mar 31, 2019

timofurrer commented Apr 12, 2019

ZuluPro commented Jun 27, 2019

misli commented Jan 10, 2023 •

edited

Loading

harkabeeparolus commented Jan 20, 2023

Added stream to CSV #337

Added stream to CSV #337

Conversation

ZuluPro commented Aug 8, 2018

timofurrer commented Mar 2, 2019

ZuluPro commented Mar 30, 2019

claudep commented Mar 30, 2019

ZuluPro commented Mar 31, 2019 • edited Loading

claudep commented Mar 31, 2019

timofurrer commented Apr 12, 2019

ZuluPro commented Jun 27, 2019

misli commented Jan 10, 2023 • edited Loading

harkabeeparolus commented Jan 20, 2023

ZuluPro commented Mar 31, 2019 •

edited

Loading

misli commented Jan 10, 2023 •

edited

Loading