-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeds up the Path and derived Parameters #890
base: main
Are you sure you want to change the base?
Conversation
Additional Speed up if using faster attribute lookup.If the implementation of The below was run on The implementation would look like Its a matter of whether you want an 8% speed up over code readability for humans and static analysis tools in your editor. Additional speed up by not looking up
|
|
||
def _get_default_search_paths_as_list()->typing.List[str]: | ||
return [str(path) for path in _get_default_search_paths()] | ||
|
||
class resolve_path(ParameterizedFunction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere we should tell users that there is a overhead cost of using this class instead of the _resolve_path
function. I don't know if we should even deprecate it?
There are broken tests so I assume this is still work in progress? A test suite is not a benchmark so I'm not surprised we don't see performance penalties or regressions from the test suite itself. We now leverage |
I hoped to get some feedback first. For example its seems Param does not utilize caching today and the focus is minimum of code. This is a change in favor of speed. Thus the first step is really to spark the discussion with this PR. And I would like to know if you would like to accept this kind of change before I spend time finalizing. The tests are breaking on macOS and Windows. To me it looks like the path strings are resolved in different But equivalent ways by os and pathlib. Having thought about it, I assumed pathlib was better as its "modern" Python. But now I'm not sure. At least I should profile that too. |
I need to spend more time to review this (not sure I'll have time this week!). My feedback from working on Param 2.0:
This piece of code breaks with the suggested changes, the last line used to raise an error and it no longer does. It shows that the difficulty with resources like file/directory paths is that they can be removed without Param knowing about it, which is why I think it always resolves. Maybe the way forward is to overload the newly added import os
import param
import pathlib
fp = pathlib.Path('exists.txt')
fp.write_text('')
class P(param.Parameterized):
path = param.Path('exists.txt')
p = P()
print(p.path)
fp.unlink()
print(p.path) |
Thanks for the feedback. I plan to give this a second iteration when I have gotten panel-chemistry working for Panel 1.x. First thing would be adding the use case you describe to the tests. |
I quickly played around with another implementation (compared to diff --git a/param/parameters.py b/param/parameters.py
index ed11b53b..8852b79b 100644
--- a/param/parameters.py
+++ b/param/parameters.py
@@ -2706,9 +2706,11 @@ class Path(Parameter):
----------
search_paths : list, default=[os.getcwd()]
List of paths to search the path from
- check_exists: boolean, default=True
- If True (default) the path must exist on instantiation and set,
- otherwise the path can optionally exist.
+ check_exists: boolean or 'once', default=True
+ If True (default) the path must exist on instantiation and whenever
+ get/set is called, if 'once' the path must only exist on instantiation
+ and on get the first resolved path is returned, if False the path can
+ optionally exist.
"""
__slots__ = ['search_paths', 'check_exists']
@@ -2733,8 +2735,8 @@ class Path(Parameter):
search_paths = []
self.search_paths = search_paths
- if check_exists is not Undefined and not isinstance(check_exists, bool):
- raise ValueError("'check_exists' attribute value must be a boolean")
+ if check_exists is not Undefined and check_exists not in (True, False, 'once'):
+ raise ValueError("'check_exists' attribute value must be a boolean or 'once'")
self.check_exists = check_exists
super().__init__(default,**params)
self._validate(self.default)
@@ -2749,16 +2751,23 @@ class Path(Parameter):
else:
if not isinstance(val, (str, pathlib.Path)):
raise ValueError(f'{_validate_error_prefix(self)} only take str or pathlib.Path types')
+ path = None
try:
- self._resolve(val)
:+ path = self._resolve(val)
except OSError as e:
- if self.check_exists:
+ if self.check_exists is True:
raise OSError(e.args[0]) from None
+ if path is not None and self.check_exists == 'once':
+ # Abusing check_exists to store the resolved path on parameter
+ # instantiation.
+ self.check_exists = ('once', path)
def __get__(self, obj, objtype):
"""
Return an absolute, normalized path (see resolve_path).
"""
+ if isinstance(self.check_exists, tuple):
+ return self.check_exists[1]
raw_path = super().__get__(obj,objtype)
if raw_path is None:
path = None With these benchmarks: class ParameterPathResolveSuite:
def setup(self):
import pathlib
p = pathlib.Path('dummy')
p.write_text('dummy')
class P(param.Parameterized):
x0 = param.Path(str(p))
self.P = P
self.p = P()
def time_resolve_new(self):
p = self.P()
p.x0
def time_resolve_same(self):
self.p.x0
class ParameterPathResolveOnceSuite:
def setup(self):
import pathlib
p = pathlib.Path('dummy')
p.write_text('dummy')
class P(param.Parameterized):
x0 = param.Path(str(p), check_exists='once')
self.P = P
self.p = P()
def time_resolve_new(self):
p = self.P()
p.x0
def time_resolve_same(self):
self.p.x0 Then I started to think about how this might be related to |
Closes #889
resolve_path
function as there is big, big overhead compared to using simple function!If I run
pytest tests/testpathparam.py
, I see.main
: 57 passed in 0.21sthis
: 57 passed in 0.19sSo this is not where you see the benefit. That is because you are only testing new instances and new paths. But that is not how
Path
is being used in practice. For example in Panel thepn.config
is rather static and holds paths to css etc. which is being requested every time a page is loaded.Performance tests
My setup
Python 3.10 on Linux.
Profiling script
You can use the script below.
Results
N=100000
It makes you think about the cost of using Parameterized classes and ParameterizedFunctions! You can compare this to FastApi and Pydantic. Pydantic 2.0 was a reimplementation in Rust to make applications like FastApi fast.