Skip to content

Fix Index.equals between object and string #61541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sanggon6107
Copy link

Description of the code change on Index.equals

On the main branch, Index.equals casts self to object only when self.dtype.na_value is np.nan. The comparison actually succeeds when self.dtype.na_value is np.nan as below.

>>> import pandas as pd
>>> import numpy as np

>>> s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
>>> s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
>>> s2.index = s2.index.astype(pd.StringDtype(storage="pyarrow", na_value=np.nan))

>>> print(s1 < s2)
a    True
b    True
c    True
dtype: bool

However, since doc stated that dtype is not compared, self should be casted regardless of self.dtype.na_value so that self could be compared with other dtypes as desired.

Description of the code change on test_mixed_col_index_dtype

using_infer_string has been removed since I think that result should be string regardless of using_infer_string. This is becaus of the code change made on Index.equals - since Index.equals consider df1.columns is equal to df2.colums, Index.intersection returns self(which is string). You could see the result becomes object(which is the dtype of df2) in case of result = df2 + df1. On the main branch, on the other hand, Index.intersection returns object because Index.equals returns False, and then both self and other are cast to object by _find_common_type_compat. (see L3287 at pandas/core/indexes/base.py)

elif self.dtype != other.dtype:
dtype = self._find_common_type_compat(other)
this = self.astype(dtype, copy=False)
other = other.astype(dtype, copy=False)
return this.intersection(other, sort=sort)

  • I created this pull request since @MayurKishorKumar doesn't seem to work on this issue anymore, but please let me know if there is going to be further actions on the previous PR and I am supposed to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Can only compare identically-labeled Series objects (string vs. object)
1 participant