Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to read CSV without header row #77

Closed
wants to merge 3 commits into from

Conversation

djalova
Copy link
Collaborator

@djalova djalova commented Dec 15, 2020


Checklist:

For the following questions, only check the boxes that are applicable.

  • Ensure that the pull request text above the horizontal line is descriptive.

  • Add/update relevant tests.

  • Add/update relevant documents.

  • Do you have triage permission of this repository?

    • No
      • You are good.
    • Yes
      • Does this pull request close an issue?
        • No, this pull request also acts as an issue by itself.
          • Assign yourself.
          • Label this pull request.
          • Put a milestone.
          • Put this pull request under "In Progress" or "Under Review" pipeline on ZenHub.
          • Put this pull request under the appropriate epic on ZenHub.
          • Put an estimate on ZenHub.
          • Add dependency relationship with other issues/pull requests on ZenHub.
        • Yes, this pull request addresses an issue and does not act as an issue.
          • Connect this pull request to the underlying issue on ZenHub.
          • DON'T:
            • DON'T assign yourself or anyone else.
            • DON'T label.
            • DON'T put a milestone.
            • DON'T put this pull request under any epic on ZenHub.
            • DON'T put an estimate on ZenHub.
      • When the pull request is ready, request review.

@@ -37,6 +37,7 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
- ``columns`` key specifies the data type of each column. Each data type corresponds to a Pandas'
supported dtype. If unspecified, then it is default.
- ``delimiter`` key specifies the delimiter of the input CSV file.
- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like as long as header is not False, it is treated as True (even for empty strings, empty lists, which are usually evaluated to False in Python). Could you make this point clear in this document?

@@ -55,9 +56,15 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
else:
dtypes[column] = type_

names = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does names default to None in read_csv? I don't see this in the document. Perhaps it's better if we simply do not provide names if header is False?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names is None by default. I thought about doing that but I wanted to avoid having 2 versions of read_csv

Comment on lines +253 to +254
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False
with pytest.raises(ValueError): # Pandas should error from trying to read string as another dtype

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also assert a couple of keyword in the exception message?

@xuhdev
Copy link
Collaborator

xuhdev commented Dec 15, 2020

Could you also remake this PR on a branch in this repo due to #81 ?

@xuhdev
Copy link
Collaborator

xuhdev commented Dec 15, 2020

Closing this in favor of #82

@xuhdev xuhdev closed this Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV loader should support those without headers
2 participants