-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added support for a dtypes parameter #195
Conversation
closes #173 Signed-off-by: Luka Peschke <[email protected]>
Signed-off-by: Luka Peschke <[email protected]>
Signed-off-by: Luka Peschke <[email protected]>
Sorry for the delay on this one, I was AFK for the last couple of days |
I tried to play with it. Overall looks good!
compared to
But maybe we can merge the PR like that and dig a bit more into "duration", "datetime" support with format, mix of strings + dates... |
@@ -27,6 +29,9 @@ | |||
) | |||
from ._fastexcel import read_excel as _read_excel | |||
|
|||
DType = Literal["null", "int", "float", "string", "boolean", "datetime", "date", "duration"] | |||
DTypeMap: TypeAlias = "dict[str, DType] | dict[int, DType]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to work with my tests
ipdb> excel_file.load_sheet_by_idx(1, dtypes={"__UNNAMED__1": "boolean"}).to_polars()
shape: (24, 12)
┌────────────────────────┬──────────────┬──────────────────────────────┬──────────────┬───┬──────────────┬──────────────┬───────────────┬───────┐
│ Phone Economics Model ┆ __UNNAMED__1 ┆ Asset Type: ┆ __UNNAMED__3 ┆ … ┆ __UNNAMED__8 ┆ __UNNAMED__9 ┆ __UNNAMED__10 ┆ ALL │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ bool ┆ str ┆ str ┆ ┆ null ┆ null ┆ null ┆ str │
╞════════════════════════╪══════════════╪══════════════════════════════╪══════════════╪═══╪══════════════╪══════════════╪═══════════════╪═══════╡
│ null ┆ null ┆ Forecast Error (less than): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ COO │
│ null ┆ null ┆ Production Time (Between): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ DELCO │
│ null ┆ null ┆ Store Call Volume (Between): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ RBD │
│ null ┆ null ┆ Dine-In Volume (Less than): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ RR │
│ null ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ All calls Instore ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Busy Call Forward ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Call Forward No Answer ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Promise Time ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ 100% Call Center ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
└────────────────────────┴──────────────┴──────────────────────────────┴──────────────┴───┴──────────────┴──────────────┴───────────────┴───────┘
ipdb> excel_file.load_sheet_by_idx(1, dtypes={1: "boolean"}).to_polars()
shape: (24, 12)
┌────────────────────────┬───────────────────────────────────┬──────────────────────────────┬──────────────┬───┬──────────────┬──────────────┬───────────────┬───────┐
│ Phone Economics Model ┆ __UNNAMED__1 ┆ Asset Type: ┆ __UNNAMED__3 ┆ … ┆ __UNNAMED__8 ┆ __UNNAMED__9 ┆ __UNNAMED__10 ┆ ALL │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ ┆ null ┆ null ┆ null ┆ str │
╞════════════════════════╪═══════════════════════════════════╪══════════════════════════════╪══════════════╪═══╪══════════════╪══════════════╪═══════════════╪═══════╡
│ null ┆ null ┆ Forecast Error (less than): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ COO │
│ null ┆ null ┆ Production Time (Between): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ DELCO │
│ null ┆ null ┆ Store Call Volume (Between): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ RBD │
│ null ┆ null ┆ Dine-In Volume (Less than): ┆ null ┆ … ┆ null ┆ null ┆ null ┆ RR │
│ null ┆ null ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ All calls Instore ┆ Store answers all calls (no call… ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Busy Call Forward ┆ Calls are forwarded to the call … ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Call Forward No Answer ┆ Calls are forwarded to the call … ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ Promise Time ┆ All calls are forwarded after th… ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ 100% Call Center ┆ Call center answers all calls (M… ┆ null ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
└────────────────────────┴───────────────────────────────────┴──────────────────────────────┴──────────────┴───┴──────────────┴──────────────┴───────────────┴───────┘
- I reckon we could support
dict[str | int, Dtype]
for unnamed columns (instead of recalling to write__UNNAMED__1
. ..)
@@ -64,6 +69,11 @@ def available_columns(self) -> list[str]: | |||
"""The columns available for the given sheet""" | |||
return self._sheet.available_columns | |||
|
|||
@property | |||
def dtypes(self) -> DTypeMap | None: | |||
"""The dtypes specified for the sheet""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have all the dtypes? (not only the overridden ones)?
Hi @PrettyWood thanks for the testing! About your remarks:
|
Can probably be tackled in another PR (if needed)
Yes, perfect
Not sure if I was explicit enough. If I write
Completely fine with me. Was just a remark. Your proposal seems ok. We can probably write that in an issue, discuss the implementation and then open the PR |
…l columns are selected Signed-off-by: Luka Peschke <[email protected]>
Signed-off-by: Luka Peschke <[email protected]>
@PrettyWood 94e59d4 allows for dtypes to be specified by name or index in case all columns are selected and ef55f27 renames the In the end, I haven't added the |
closes #173