Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Rename arg to func in Series.map for consistency #61260

Closed
datapythonista opened this issue Apr 9, 2025 · 2 comments · Fixed by #61264
Closed

API: Rename arg to func in Series.map for consistency #61260

datapythonista opened this issue Apr 9, 2025 · 2 comments · Fixed by #61264
Labels
API - Consistency Internal Consistency of API/Behavior API Design Apply Apply, Aggregate, Transform, Map
Milestone

Comments

@datapythonista
Copy link
Member

datapythonista commented Apr 9, 2025

The API of methods taking udf follow certain patterns that make them consistent and easier to learn and use. There are some small differences, which have been listed in #40112 and #61128.

This issue is to rename the arg parameter of Series.map to func, which is the name consistently used in almost all methods. In the case of Series.map, the argument is slightly different than others, given that arg or func can also be a dict or a Series, which will make map replace values from these mappings, instead of executing an elementwise udf.

This issue is for the renaming of the parameter, making the parameter consistent with other methods such as DataFrame.apply can be considered in another issue. But there are some cases to consider, given that the behavior of map is slightly different when providing a mapping, than when providing a function that maps. In particular, map will use NaN when the mapping returns None, but it will use None when the function returns None. Also, if we stop supporting dictionaries, users in general should just replace their code from Series.map(my_dict) to Series.map(my_dict.get). But there are some special cases, for example when the dictionary is a defaultdict, .get will return None, while the current map implementation with a defaultdict will consider the default value.

@datapythonista datapythonista added API Design Apply Apply, Aggregate, Transform, Map API - Consistency Internal Consistency of API/Behavior labels Apr 9, 2025
@datapythonista
Copy link
Member Author

@pandas-dev/pandas-core I thought that it was a good idea when renaming arg in Series.map to func to make it only accept a function, for consistency with other functions and simplicity with the name. I thought for users it'd be as simply as using my_dict.get instead of my_dict as the argument.

But seems like there is some more complexity. defaultdict for example doesn't work with .get as expected for this case, since .get will still return None and not the default value. So users should use my_series.map(lambda x: my_defaultdict[x]) instead of my_series.my_defaultdict.get).

Also, when Series.map receives a dictionary, None will be return as NaN, while when it receives a function, None will be returned as None.

If we were designing the API from zero I'd still support the consistency of just accepting functions and one behaviour. But not too sure if it's worth given that the expected user code changes, while not too complex, are not as immediate as making all dictionaries a function with .get. Thoughts?

@rhshadrach
Copy link
Member

It seems like a very common use case to use a dict with Series.map, I don't think we should be making it more inconvenient. Also I'd expect Series.map to accept a dict from the name alone.

+1 on arg -> func.

@simonjayhawkins simonjayhawkins added this to the 3.0 milestone Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Apply Apply, Aggregate, Transform, Map
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants