Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid casting the body to a string when it is already a string #1208

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

ale-rt
Copy link
Member

@ale-rt ale-rt commented Apr 30, 2024

No description provided.

@ale-rt ale-rt merged commit fdb1268 into master Apr 30, 2024
26 checks passed
@ale-rt ale-rt deleted the small-optimization branch April 30, 2024 08:03
@leorochael
Copy link
Contributor

I know I'm late to the game, but I don't see the point of this optimization. Python itself already does this, because str is immutable:

>>> x = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

>>> x is str(x)
True

Do we have any timing results that show that doing an extra isinstance() call is cheaper than letting Python itself do it?

@leorochael
Copy link
Contributor

leorochael commented Apr 30, 2024

I suppose one advantage of this is if you have a subclass of str, then it would avoid upcasting it back to str:

>>> class WordCountString(str):
...     def words(self, separator=None):
...         return len(self.split(separator))
...

>>> x = WordCountString("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")

>>> isinstance(x, str)
True
>>> x is str(x)
False

But in that case, I suggest to add a unit test to make sure we don't lose this change to the next person who decides to "optimize" this, since timeit shows this "optimization" takes double the amount of time in the case where x is exactly an str instance (IPython syntax below):

>> %%timeit x = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
... 
... y = str(x)
... 
 
38.8 ns ± 3.35 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> %%timeit x = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
... 
... if isinstance(x, str):
...   y = str(x)
... else:
...   y = x
... 

72.7 ns ± 2.22 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

@leorochael
Copy link
Contributor

leorochael commented Apr 30, 2024

Even in the case where we're avoiding the upcast, doing the if ourselves seem to be slower:

>>> %%timeit x = WordCountString("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
... 
... if isinstance(x, str):
...   y = str(x)
... else:
...   y = x
... 
85 ns ± 2.89 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> %%timeit x = WordCountString("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
... 
... y = str(x)
... 
... 
54.4 ns ± 0.296 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

@ale-rt
Copy link
Member Author

ale-rt commented Apr 30, 2024

Thanks @leorochael for your insight, I had no clue!
I will revert this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants