Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix large csv reading #585 #605

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions odo/backends/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
from ..numpy_dtype import dshape_to_pandas
from .pandas import coerce_datetimes
from functools import partial
from itertools import chain


dialect_terms = '''delimiter doublequote escapechar lineterminator quotechar
quoting skipinitialspace strict'''.split()
Expand Down Expand Up @@ -344,17 +346,16 @@ def _csv_to_dataframe(c, dshape=None, chunksize=None, **kwargs):
header = None

kwargs = keyfilter(keywords(pd.read_csv).__contains__, kwargs)
with c.open() as f:
return pd.read_csv(f,
header=header,
sep=sep,
encoding=encoding,
dtype=dtypes,
parse_dates=parse_dates,
names=names,
chunksize=chunksize,
usecols=usecols,
**kwargs)
return pd.read_csv(c.path,
header=header,
sep=sep,
encoding=encoding,
dtype=dtypes,
parse_dates=parse_dates,
names=names,
chunksize=chunksize,
usecols=usecols,
**kwargs)


@convert.register(chunks(pd.DataFrame), (Temp(CSV), CSV), cost=10.0)
Expand All @@ -368,7 +369,7 @@ def CSV_to_chunks_of_dataframes(c, chunksize=2 ** 20, **kwargs):
else:
rest = []

data = [first] + rest
data = chain([first], rest)
Copy link

@brandonwillard brandonwillard Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toolz.concat should be able to do this, no? It's already imported, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no toolz.chain. similar functions in toolz just import itertools.
toolz source:

def concat(seqs):
    """ Concatenate zero or more iterables, any of which may be infinite.

    An infinite sequence will prevent the rest of the arguments from
    being included.

    We use chain.from_iterable rather than ``chain(*seqs)`` so that seqs
    can be a generator.

    >>> list(concat([[], [1], [2, 3]]))
    [1, 2, 3]

    See also:
        itertools.chain.from_iterable  equivalent
    """
    return itertools.chain.from_iterable(seqs)



def concatv(*seqs):
    """ Variadic version of concat

    >>> list(concatv([], ["a"], ["b", "c"]))
    ['a', 'b', 'c']

    See also:
        itertools.chain
    """
    return concat(seqs)

Copy link

@brandonwillard brandonwillard Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, typo; I meant toolz.concat, and it looks like that was already imported.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to change it?

return chunks(pd.DataFrame)(data)


Expand Down