-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ibis bokeh datetime axis #5522
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5522 +/- ##
==========================================
- Coverage 88.07% 87.57% -0.51%
==========================================
Files 302 302
Lines 62283 62335 +52
==========================================
- Hits 54855 54589 -266
- Misses 7428 7746 +318
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I could see that after my change a CategoricalAxis would not be correctly identified. I have tried for 1½ hours now to fix, but the function |
@@ -111,7 +111,13 @@ def nonzero(cls, dataset): | |||
@cached | |||
def range(cls, dataset, dimension): | |||
dimension = dataset.get_dimension(dimension, strict=True) | |||
if cls.dtype(dataset, dimension).kind in 'SUO': | |||
if cls.dtype(dataset, dimension).kind == 'O': | |||
# Can this be done more efficiently? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your previous implementation was better. This will be slow if the database table has a billion rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the previous implementation did not produce the expected results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a problem with ibis. I will probably add an if statement to your previous implementation, checking if the output is a pd.series and take the first value if it is.
import os
import sqlite3
import ibis
import pandas as pd
def create_sqlite(df, name):
filename = "tmp.sqlite"
os.remove(filename)
con = sqlite3.Connection(filename)
df.to_sql(name, con, index=False)
return ibis.sqlite.connect(filename).table(name)
df = pd.DataFrame(
{"string": ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]},
)
sqlite_table = create_sqlite(df, "table")
pandas_table = ibis.pandas.connect({"df": df}).table("df")
pandas_table["string"], sqlite_table["string"]
pandas_table["string"].first().execute()
sqlite_table["string"].first().execute()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to figure out that we are working with a datetime column? This approach for getting the first and last value really should be removed entirely. Indeed at some point I planned to consistently return None
for all non-numeric and non-datetime types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not for a datetime column but for a categorical column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcSkovMadsen can correct me. But the fix for the datetime column was done by adding dimension_type
to the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to fully explain, range
is really an operation that should be reserved for numeric and datetime values. Early on I decided that in the categorical/string/object case I would simply grab the first/last value in the array. This was fine when the only interfaces we had were in-memory but for lazy interfaces like SQL or dask backed data the behavior is often ill-defined or actually quite expensive to compute. So the correct thing going forward is to simply abort early and return (None, None)
if the column is categorical. The remaining problem is that some interfaces do not correctly declare datetime columns, reporting an object
dtype. So we will still need to handle that case correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply don't understand what I should do here to resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's understandable, I was just trying to provide context and didn't have any prescription for what should be done.
My question is whether the change to dimension_type
is sufficient to fix the issue? If so I would suggest reverting the change to range
and eventually removing this conditional branch entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not enough. Both the DateTime and Categorical axis where broken. Needs two different fixes
Closes #5521
This is my first real attempt at a contribution to HoloViews. So please be nice to me 😄
I hope an experienced core contributor can show me how this should really be solved or point me in the right direction. It a bit hard to do a real unit test because the functions contains so much code.