-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError
if DataFrame contains duplicated column name (in some cases)
#2718
Comments
My sense is that this issue should be handled by pandas and that it should not be possible to create a dataframe where two columns have the same name. Have you raised this on their issue tracker? |
I think it would be nice to raise a more informative error. It came up for me too. In my case, I had a big dataframe that causes some encoding errors if I dump the whole dataframe in. So I made a list of the subset of columns that I wanted to send to Altair. However, if this list is long or generated by complex logic, then it is easy to mistakenly include one column name twice. A cartoon of my workflow was somewhat as follows, but I kept about 10 of 300 columns when I sent it to Altair and my list of ~10 had a duplicate.:
|
Maybe we could introduce something like |
@MarcoGorelli is this resolved by adopting https://narwhals-dev.github.io/narwhals/pandas_like_concepts/column_names/
UpdateYeah it is 🙂 import io
import pandas as pd
import altair as alt
df = pd.read_csv(
io.StringIO("""
a, b, c, d
0, 1, 2, 2022-01-01
2, 3, 4, 2022-01-01
""")
)
df.columns = ["a", "b", "c", "c"]
alt.Chart(df).mark_point().encode(x="a", y="b") File /site-packages/narwhals/_pandas_like/dataframe.py:106, in PandasLikeDataFrame._validate_columns(self, columns)
[104] msg += f"\n- '{key}' {value} times"
[105] msg = f"Expected unique column names, got:{msg}"
--> [106] raise ValueError(msg)
ValueError: Expected unique column names, got:
- 'c' 2 times |
This is kind of an edge case but the error message makes it somewhat difficult to identify the underlying issue.
If you have a Pandas DataFrame where there are duplicated column names and they are not integers, you'd get an exception when trying to plot something. MWE:
results in
Note that
toPandas()
after join two PySpark DataFrames.Using altair 4.2.0
Tracking
The text was updated successfully, but these errors were encountered: