-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to read CSV without header row #77
Conversation
@@ -37,6 +37,7 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic | |||
- ``columns`` key specifies the data type of each column. Each data type corresponds to a Pandas' | |||
supported dtype. If unspecified, then it is default. | |||
- ``delimiter`` key specifies the delimiter of the input CSV file. | |||
- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like as long as header
is not False
, it is treated as True
(even for empty strings, empty lists, which are usually evaluated to False in Python). Could you make this point clear in this document?
@@ -55,9 +56,15 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic | |||
else: | |||
dtypes[column] = type_ | |||
|
|||
names = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does names
default to None
in read_csv
? I don't see this in the document. Perhaps it's better if we simply do not provide names
if header
is False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names
is None
by default. I thought about doing that but I wanted to avoid having 2 versions of read_csv
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype | ||
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype | |
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False | |
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False | |
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also assert a couple of keyword in the exception message?
Could you also remake this PR on a branch in this repo due to #81 ? |
Closing this in favor of #82 |
Checklist:
minor change.
For the following questions, only check the boxes that are applicable.
Ensure that the pull request text above the horizontal line is descriptive.
Add/update relevant tests.
Add/update relevant documents.
Do you have triage permission of this repository?