-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maint: parc.find_regions as module method. clarify find_regions #517
base: main
Are you sure you want to change the base?
Conversation
f0427d2
to
827db31
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #517 +/- ##
==========================================
- Coverage 46.00% 45.97% -0.04%
==========================================
Files 75 75
Lines 7232 7224 -8
==========================================
- Hits 3327 3321 -6
+ Misses 3905 3903 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly fine.
I have questions where find_region should still be in parcellation.py
and the aggressive caching strategy we are using everywhere.
@@ -364,3 +326,45 @@ def __lt__(self, other): | |||
) | |||
return self.name < other.name | |||
return self.version.__lt__(other.version) | |||
|
|||
|
|||
def find_regions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it feasible that user would try to find_regions
without a parcellation?
(correct me if I am wrong, but this is what this function is doing?)
edit: I think I see why this is needed (it's needed to decode region spec in anatomic anchor)
since it's not used anywhere else, I almost wonder if module level function should go to anatomical anchor (where it is used exclusively)
or at top level under region (where it makes more semantic sense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than internal use, it is also exposed as siibra.find_regions()
. We can potentially use region.find()
in anchor. I think the original implementation also had species as a keyword argument, which made more sense to use there.
It needs access to parcellation.registry so I think it is better in parcellation module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it needs Parcellation.registry
is not relevant to users.
If I am a user, and expect to find a method called find_regions
(which, in fact, does not depend on any parcellations), I would expect it to find it under region.py
module.
I do like the idea of a class/static method under Region
class, which is proxy to this method (Indeed, I think rather than it being a standalone method, I think it makes more sense to be a static/class method of Region
class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm then it would still be accessible from region and parcellation instances. We can carry it to the region module but if we are going through all parcellations, why shouldn't it be under the parcellation module? And this also creates less confusion given that Region
class has find
method.
Since there is a also a similar function in atlas
, which goes through all parcellations in an atlas, I think the design is reasonable. But we might want to discuss this design with @dickscheid , who initially implemented it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can carry it to the region module but if we are going through all parcellations, why shouldn't it be under the parcellation module?
But the user doesn't know that. (nor should they)
The promise of find_region
, at least to me, is: you provide a string, and I will find you all regions which fits this string.
Whether, internally, we go through all parcellations, and iteratively, go through all regions for that parcellation, OR, we some how have a region instance table, that is not relevant to the user.
Which is why I say, either keep it private (prefix with _
) or it should go in regions module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now. I thought you were advocating for it to be a class method or static method of Region
not region module. I'll also get the opinion of Timo and make a change accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is better placed here at the module level to avoid the confusion with Region.find()
@@ -364,3 +326,44 @@ def __lt__(self, other): | |||
) | |||
return self.name < other.name | |||
return self.version.__lt__(other.version) | |||
|
|||
|
|||
@lru_cache(maxsize=128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TD: investigate how lru_cache
hashes the inputs to compare with dictionary caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.python.org/3/library/functools.html#functools.lru_cache
"Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable."
Since the kwargs of find_regions
are either a string a boolean, they are hashable. So essentially, the key of the old implementation (key = (regionspec, filter_children, find_topmost)
) is used but hashed.
Just to explain the original logic - there were different entry points for different scopes:
I think the logic is clear, but I understand that the automatic exposure of the class method |
Thank you. Then, IMO, we should make |
…, region.species does not need to check parcellation's existence
This PR addresses the following issues/points (see):
parcellation.find_regions()
was a static method ofParcellation
but it searches through all parcellations in the registry. For the user, when they have a parcellation instanceparc
, there is no clear difference betweenparc.find_regions()
andparc.find()
while they should be usingfind
to search within the parcellation. (find
is a method ofRegion
).atlas.find_regions()
as an instance method. It behaves differently thanparcellation.find_regions()
. Needs to be clarified.cache
property?lru_cache
might be necessary for long-running systems like siibra-api(Please see the discussions on the code changes for further details)