Add ability to read CSV without header row #77

djalova · 2020-12-15T18:26:06Z

Checklist:

Does this pull request close an issue? We encourage you to open an issue first if this pull request (PR) is not a
minor change.
- No
  - The change in this pull request is minor.
- Yes
  - Close CSV loader should support those without headers #54

For the following questions, only check the boxes that are applicable.

xuhdev · 2020-12-15T19:36:53Z

pydax/loaders/_table.py

@@ -37,6 +37,7 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
               - ``columns`` key specifies the data type of each column. Each data type corresponds to a Pandas'
                 supported dtype. If unspecified, then it is default.
               - ``delimiter`` key specifies the delimiter of the input CSV file.
+               - ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True


It seems like as long as header is not False, it is treated as True (even for empty strings, empty lists, which are usually evaluated to False in Python). Could you make this point clear in this document?

xuhdev · 2020-12-15T20:00:45Z

pydax/loaders/_table.py

@@ -55,9 +56,15 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
            else:
                dtypes[column] = type_

+        names = None


Does names default to None in read_csv? I don't see this in the document. Perhaps it's better if we simply do not provide names if header is False?

names is None by default. I thought about doing that but I wanted to avoid having 2 versions of read_csv

xuhdev · 2020-12-15T20:02:29Z

tests/test_loaders.py

+        with pytest.raises(ValueError):  # Pandas should error from trying to read string as another dtype
+            noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False


Suggested change

with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype

noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False

noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False

with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype

Could you also assert a couple of keyword in the exception message?

xuhdev · 2020-12-15T22:08:45Z

Could you also remake this PR on a branch in this repo due to #81 ?

xuhdev · 2020-12-15T22:42:06Z

Closing this in favor of #82

djalova requested review from xuhdev and edwardleardi December 15, 2020 18:26

Add ability to read CSV without headers

5cd8f4d

djalova force-pushed the header branch from 078b3ae to 5cd8f4d Compare December 15, 2020 18:31

djalova added 2 commits December 15, 2020 10:33

Remove print

7c6b5eb

Fix lint

a46e8a2

xuhdev reviewed Dec 15, 2020

View reviewed changes

xuhdev closed this Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to read CSV without header row #77

Add ability to read CSV without header row #77

djalova commented Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev Dec 15, 2020

djalova Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev commented Dec 15, 2020

xuhdev commented Dec 15, 2020

		with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype
		noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False

Add ability to read CSV without header row #77

Add ability to read CSV without header row #77

Conversation

djalova commented Dec 15, 2020

Checklist:

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

djalova Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev commented Dec 15, 2020

xuhdev commented Dec 15, 2020