Astype keeps nan when converting into string #28176

makbigc · 2019-08-27T14:36:58Z

closes astype(str) / astype_unicode: np.nan converted to "nan" (checknull, skipna) #25353
1 test added
passes black pandas
whatsnew entry

WillAyd · 2019-08-27T14:51:43Z

doc/source/whatsnew/v1.0.0.rst

@@ -145,7 +145,7 @@ Indexing
 Missing
 ^^^^^^^

-
+- When converting into string, :meth:`Series.astype` will keep ``np.nan`` as missing value (:issue:`25353`)


Might want to move this to API breaking - I could foresee some people doing a "if x == 'nan'" after something like this

yeah I think this is ok to actually make this change, but let's make it more prominent

WillAyd · 2019-08-27T14:54:30Z

pandas/tests/series/test_dtypes.py

@@ -147,7 +147,7 @@ def test_astype_datetime64tz(self):
    )
    def test_astype_str_map(self, dtype, series):
        # see gh-4405
-        result = series.astype(dtype)
+        result = series.astype(dtype, skipna=False)


I'm not sure about skipna here as it has a slightly different meaning than the usual context (i.e. ignoring it as part of an aggregation).

Which method is this actually getting passed through to?

astype method doesn't have skipna keyword. skipna is passed in kwargs.

pandas/pandas/core/generic.py

Line 5764 in e0c63b4

def astype(self, dtype, copy=True, errors="raise", **kwargs):

skipna is pop out from kwargs if any. If not, skipna=True is set and passed to astype_nansafe. (In astype_nansafe, skipna=False is default.)

pandas/pandas/core/dtypes/cast.py

Line 663 in e0c63b4

def astype_nansafe(arr, dtype, copy=True, skipna=False):

So, the default behaviour after this change is to skip converting nan into string.

we shouldn't be passing non-accepted keywords in kwargs.

TomAugspurger

This likely needs a deprecation cycle.

We also need to consider whether skipna is a valid keyword to .astype and add it to the function signature, document it, etc.

For now, I think a specific keyword for controlling how np.nan is converted with .astype(str) make the most sense. I'm not sure what the name would be.

TomAugspurger · 2019-08-27T15:12:30Z

pandas/tests/frame/test_dtypes.py

        result = DataFrame([np.NaN]).astype(str)
-        expected = DataFrame(["nan"])


@jreback do you recall, was the intent of this test that np.nan be converted to the string 'nan'?

IIRC wanted to match numpy

WillAyd · 2019-08-27T15:53:45Z

So just some thoughts - I wouldn't consider the astype -> "nan" a feature as much as a bug, so while potentially breaking I somewhat feel like a deprecation cycle for buggy behavior is overkill (though others may differ)

w.r.t. the skipna argument I don't see why users would ever really want "nan" when np.nan is available so maybe not needed?

TomAugspurger · 2019-08-27T16:28:45Z

Right, the skipna argument would only be necessary if we're deprecating.

…

On Tue, Aug 27, 2019 at 10:54 AM William Ayd ***@***.***> wrote: So just some thoughts - I wouldn't consider the astype -> "nan" a feature as much as a bug, so while potentially breaking I somewhat feel like a deprecation cycle for buggy behavior is overkill (though others may differ) w.r.t. the skipna argument I don't see why users would ever really want "nan" when np.nan is available so maybe not needed? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28176?email_source=notifications&email_token=AAKAOIVGVMT73RPUWCAU2JLQGVEZXA5CNFSM4IQFI3MKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5IHBYA#issuecomment-525365472>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIVVNTOGMBZSQQBDPMLQGVEZXANCNFSM4IQFI3MA> .

jschendel · 2019-08-27T18:26:24Z

I wouldn't consider the astype -> "nan" a feature as much as a bug, so while potentially breaking I somewhat feel like a deprecation cycle for buggy behavior is overkill

I'm +1 on having a deprecation cycle. I agree that this behavior feels buggy but it's actually consistent with numpy's behavior, and silently breaking consistency with numpy could be surprising to users.

For example, using astype(str) on an object dtype array converts np.nan -->'nan':

In [1]: import numpy as np; np.__version__
Out[1]: '1.16.4'

In [2]: a = np.array(['foo', 1, np.nan, 3.14], dtype=object)

In [3]: a
Out[3]: array(['foo', 1, nan, 3.14], dtype=object)

In [4]: a.astype(str)
Out[4]: array(['foo', '1', 'nan', '3.14'], dtype='<U4')

Also note that when constructing a as above, if dtype=object isn't explicitly specified the string conversion will silently occur:

In [5]: np.array(['foo', 1, np.nan, 3.14])
Out[5]: array(['foo', '1', 'nan', '3.14'], dtype='<U4')

jreback · 2019-09-02T21:41:57Z

pandas/core/internals/blocks.py

                # _astype_nansafe works fine with 1-d only
                vals1d = values.ravel()
-                values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
+                values = astype_nansafe(
+                    vals1d, dtype, copy=True, skipna=skipna, **kwargs


not sure I understand why you changed this code, is this not already passed thru?

astype method doesn't have skipna keyword. skipna is passed in kwargs.

pandas/pandas/core/generic.py

Line 5764 in e0c63b4

def astype(self, dtype, copy=True, errors="raise", **kwargs):

skipna is pop out from kwargs if any. If not, skipna=True is set and passed to astype_nansafe. (In astype_nansafe, skipna=False is default.)

pandas/pandas/core/dtypes/cast.py

Line 663 in e0c63b4

def astype_nansafe(arr, dtype, copy=True, skipna=False):

So, the default behaviour after this change is to skip converting nan into string.

jreback · 2019-09-02T21:42:30Z

pandas/tests/frame/test_dtypes.py

        result = DataFrame([np.NaN]).astype(str)
-        expected = DataFrame(["nan"])


IIRC wanted to match numpy

jreback · 2019-09-07T17:59:06Z

doc/source/whatsnew/v1.0.0.rst

@@ -145,7 +145,7 @@ Indexing
 Missing
 ^^^^^^^

-
+- When converting into string, :meth:`Series.astype` will keep ``np.nan`` as missing value (:issue:`25353`)


yeah I think this is ok to actually make this change, but let's make it more prominent

jreback

need to revise the test where skipna is passed to .astype() which is not allowed. Can you create an issue about this (disallow kwargs in .astype).

jreback · 2019-09-10T11:46:27Z

pandas/tests/series/test_dtypes.py

@@ -147,7 +147,7 @@ def test_astype_datetime64tz(self):
    )
    def test_astype_str_map(self, dtype, series):
        # see gh-4405
-        result = series.astype(dtype)
+        result = series.astype(dtype, skipna=False)


we shouldn't be passing non-accepted keywords in kwargs.

jreback · 2019-09-10T11:47:21Z

doc/source/whatsnew/v1.0.0.rst

+
+ When converting into string, :meth:`Series.astype` will not convert ``np.nan`` into string and keep it as missing value (:issue:`25353`)
+
+ *pandas 0.25.x*


use Previous behavior

jreback · 2019-09-10T11:48:08Z

doc/source/whatsnew/v1.0.0.rst

+   1    nan
+   dtype: object
+
+*pandas 1.0.0*


use New behavior

jreback · 2019-09-10T11:49:17Z

doc/source/whatsnew/v1.0.0.rst

+String conversion of Series with nan
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ When converting into string, :meth:`Series.astype` will not convert ``np.nan`` into string and keep it as missing value (:issue:`25353`)


:meth:Series.astype(str) previously would coerce a np.nan to the string nan. Now pandas will preserve the missing value indicator.

TomAugspurger · 2019-09-16T14:56:56Z

doc/source/whatsnew/v1.0.0.rst

+String conversion of Series with nan
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:meth:Series.astype(str) previously would coerce a np.nan to the string nan. Now pandas will preserve the missing value indicator (:issue:`25353`)


IIUC, @jschendel and I both prefer a deprecation cycle. It looks like you're making a breaking change. Is that correct?

rendorHaevyn · 2019-09-17T01:41:23Z

Just so you're all aware, this non-feature also appears to impact NoneType:

>>> na = np.array(['foo','bar',5,np.nan,None])
>>> [type(x) for x in na]
[<class 'str'>, <class 'str'>, <class 'int'>, <class 'float'>, <class 'NoneType'>]

>>> da = pd.Series(na)
>>> [type(x) for x in da.astype('str')]
[<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>]
>>> [type(x) for x in da.astype('str',skipna=True)]
[<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>]

WillAyd · 2019-09-19T15:44:47Z

Loosely thought about this after reading through #27949 but do we care to differentiate between the following like numpy?

>>> np.array([1, np.nan, 3]).astype(object)
array([1.0, nan, 3.0], dtype=object)

>>> np.array([1, np.nan, 3]).astype(str)
array(['1.0', 'nan', '3.0'], dtype='<U32')

Coercing to an object in NumPy would maintain the actual NA value, but to a string would write as 'nan'. I suppose this makes more sense in that world where object and string types are different, but I'm just wondering what kind of churn we would be causing if we actually did want astype(str) to stringify nan with a proper StringArray in place

TomAugspurger · 2019-09-19T16:53:11Z

I think that the np.nan -> 'nan' behavior is surprising to most people the first time they see it. So by default, I would want to exclude NA values from being astyped (after a deprecation cycle).

pep8speaks · 2019-10-06T15:21:42Z

Hello @makbigc! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-04 06:25:32 UTC

makbigc · 2019-10-07T15:06:36Z

Linting test failed (i.e., black formatting). But I can pass the black formatting check in my own machine. Where did I do wrong?

WillAyd · 2019-11-07T21:12:38Z

@makbigc might want to give a pre-commit hook a try:

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#python-pep8-black

In any case can you merge master and re-push?

makbigc · 2019-12-16T13:50:13Z

@jreback Anything else? Please tell me.

jreback · 2019-12-27T19:48:47Z

can you merge master and will look again

makbigc · 2020-01-02T15:47:06Z

The -W error is set for pytest command in azure-37-numpydev.

pandas/ci/azure/posix.yml

Lines 55 to 59 in 8806ed7

    
           py37_np_dev: 
        
             ENV_FILE: ci/deps/azure-37-numpydev.yaml 
        
             CONDA_PY: "37" 
        
             PATTERN: "not slow and not network" 
        
             TEST_ARGS: "-W error"

A lot of tests were failed by the FutureWarning. What should I do?

WillAyd · 2020-01-03T02:58:43Z

@makbigc I think need to address #28176 (review) first; the problem now is that the warning is getting thrown all over the place because skipna was added to astype

If you can adjust design per comment can re-evaluate issues from there

makbigc · 2020-01-04T08:01:25Z

@WillAyd Thanks for your reply. The skipna parameter has been removed. The skipna is set True by default in astype_nansafe.

The tests are failed by the FutureWarning. #28176 (comment)

makbigc · 2020-01-11T07:55:56Z

@jreback The tests were failed by the issue of FutureWarning. #28176 (comment)

Any resolution?

makbigc · 2020-01-18T05:05:23Z

@jreback Any feedback? Thanks.

languitar · 2020-02-04T11:23:41Z

It seems this didn't make it into pandas 1.0 but also skipna has been removed from astype. What is the intended way of casting to string with NaNs in pandas 1.0?

jorisvandenbossche · 2020-02-06T13:01:43Z

If we are going to deprecate this, I think we need a keyword to opt in to the future behaviour, so you have the possibility to silence the warning.

And if we are adding a keyword, it seems to make sense to use skipna for that, as that could (at least in 0.25) already be used for this purpose (although undocumented)

TomAugspurger · 2020-03-06T17:58:06Z

Where are we at here? Seems like the preference is for

>>> s = pd.Series([1, np.nan])
>>> type(s.astype(str)[1])
UserWarning("Converting NaN to string 'nan'. In the future NaN will remain as NaN. Specify skipna=True to silence this warning and adopt the new behavior, or skipna=False to restore the old behavior").
str

>>> type(s.astype(str, skipna=False)[1])  # no warning
str

>>> type(s.astype(str, skipna=True)[1])  # no warning
float

Is that right? If so, @makbigc are you interested in implementing that?

simonjayhawkins · 2020-03-29T11:17:59Z

some time has passed since this PR was opened and pandas 1.0 has since been released.

If we are going to deprecate this, I think we need a keyword to opt in to the future behaviour, so you have the possibility to silence the warning.

And if we are adding a keyword, it seems to make sense to use skipna for that, as that could (at least in 0.25) already be used for this purpose (although undocumented)

So, for me, a deprecation cycle on old (incorrect) behaviour and keep the skipna keyword (but deprecate it) is probably now a requirement to keep a stable api through to pandas 2.0.

This approach addresses both the regression in #31708 and the unexpected behaviour of #25353

simonjayhawkins · 2020-03-29T11:19:03Z

@makbigc can you merge master to resolve conflicts, fix-up failing tests and address #28176 (comment)

simonjayhawkins · 2020-05-08T16:40:44Z

@makbigc closing as stale. ping if you want to continue.

NumberPiOso · 2022-02-01T21:12:29Z

Hello, @simonjayhawkins, @TomAugspurger

I would like to continue this PR as it seems it would solve multiple issues, I just want to declare everything that must be done before and to ensure that this new behaviour is still preferred.

Add skipna keyword default to False at pandas/core/generic.NDFrame.astype method. Following the UserWarning Astype keeps nan when converting into string #28176 (comment)
Document this new behavior.
Start deprecation cycle

Open Issues related

Unmerged PRs related

Fix issue #31708 Series.astype(str, skipna=True) vanished in the 1.0 release #35060

WillAyd reviewed Aug 27, 2019

View reviewed changes

WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Aug 27, 2019

TomAugspurger reviewed Aug 27, 2019

View reviewed changes

jreback requested changes Sep 2, 2019

View reviewed changes

jreback requested changes Sep 7, 2019

View reviewed changes

makbigc force-pushed the missing-25353 branch from 63a8872 to 2cdd5b1 Compare September 10, 2019 02:34

jreback requested changes Sep 10, 2019

View reviewed changes

jreback mentioned this pull request Sep 12, 2019

NaN is converted to strings when reassigning a column with .loc #28403

Closed

makbigc mentioned this pull request Sep 16, 2019

[TST] Revise test_astype_str_map #28454

Closed

TomAugspurger reviewed Sep 16, 2019

View reviewed changes

makbigc added 4 commits September 19, 2019 22:41

astype keeps nan when converting into string

294dd2e

Move the entry to API change section and make it prominent

39b4294

Fix entry in v1.0

b82e02f

Move the whatsnew entry to deprecation section

55d1cf7

makbigc force-pushed the missing-25353 branch from 1820165 to 55d1cf7 Compare September 19, 2019 14:52

makbigc added 2 commits October 6, 2019 16:47

merge for update

99dc246

Add skipna keyword into astype

f44afcf

makbigc force-pushed the missing-25353 branch from 5ed92d9 to f44afcf Compare October 6, 2019 15:21

Fix linting adn docstring format

b5428c3

merge for update

aa62364

makbigc added 6 commits December 9, 2019 18:42

Add okwarning to suppress FutureWarning

35fd58f

Add :okwarning: in whatsnew to suppress FutureWarning

1d29cd0

Add :okwarning: to suppress FutureWarning

34c51e0

Add :okwarning: into getting_started/basic.rst

d879778

Add :okwarning: into integer_na.rst

9765497

merge for update

ffed0a0

merge for resolving conflict

c0cbe9a

makbigc added 6 commits January 3, 2020 18:39

merge for update

5ff30e0

Remove skipna parameter and set skipna=True in astype_nansafe

3057073

Change test_astype_str_map

59030b1

fix test_astype_str

11c2015

Fix black format

68c8e85

merge for update

4b80090

jorisvandenbossche mentioned this pull request Feb 6, 2020

Series.astype(str, skipna=True) vanished in the 1.0 release #31708

Closed

simonjayhawkins closed this May 8, 2020

jorisvandenbossche mentioned this pull request Dec 9, 2022

astype(str) / astype_unicode: np.nan converted to "nan" (checknull, skipna) #25353

Open

		result = DataFrame([np.NaN]).astype(str)
		expected = DataFrame(["nan"])


		When converting into string, :meth:`Series.astype` will not convert ``np.nan`` into string and keep it as missing value (:issue:`25353`)

		pandas 0.25.x

Uh oh!

Astype keeps nan when converting into string #28176

Astype keeps nan when converting into string #28176

Uh oh!

Conversation

makbigc commented Aug 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Aug 27, 2019

Uh oh!

TomAugspurger commented Aug 27, 2019 via email

Uh oh!

jschendel commented Aug 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rendorHaevyn commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd commented Sep 19, 2019

Uh oh!

TomAugspurger commented Sep 19, 2019

Uh oh!

pep8speaks commented Oct 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-01-04 06:25:32 UTC

Uh oh!

makbigc commented Oct 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd commented Nov 7, 2019

Uh oh!

makbigc commented Dec 16, 2019

Uh oh!

jreback commented Dec 27, 2019

Uh oh!

makbigc commented Jan 2, 2020

Uh oh!

WillAyd commented Jan 3, 2020

Uh oh!

makbigc commented Jan 4, 2020

Uh oh!

makbigc commented Jan 11, 2020

Uh oh!

makbigc commented Aug 27, 2019 •

edited

Loading

rendorHaevyn commented Sep 17, 2019 •

edited

Loading

pep8speaks commented Oct 6, 2019 •

edited

Loading

makbigc commented Oct 7, 2019 •

edited

Loading

NumberPiOso commented Feb 1, 2022 •

edited

Loading