-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all imputation methods consistent in regard to encoding requirements #827
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very very much! This is great. I am a fan of simplifying the code (by removing semi important features such as the progress bars in general) and really think that generalizing the imputation tests is an awesome idea.
I have a few questions and concerns though:
- I am not a fan of adding more
_utils_*
modules. I argue that all and any utils modules are a code smell. It is never clear to developers what is actually in there and it should probably be somewhere else. Let's just have descriptive module names (or scripts) that better describe what can be found in there. - I don't know how I feel about the removal of the imputation spinners. We had added them because some imputation functions can run for a looooong time and it just gives feedback that the code is still running. I know - it's not a progress bar but it's at least something? I don't have the strongest opinion on this but wonder what other people think @Imipenem @eroell
- Removing some docstrings about raised Errors is okay with me because the errors themselves can document the behavior when raised. The docstrings are more useful for other developers if they want to build on top because they don't have to check the whole code for any potential raised Errors.
- Instead of printing to stdout and returning a bool in
_base_check_imputation
I wonder whether this function should just raise errors? prints are not for errors - they are for informative messages. Pytest will resolve the stack and point to the part that raised the error.
Co-authored-by: Lukas Heumos <[email protected]>
@Zethson Apply suggestions from code review part 2
for more information, see https://pre-commit.ci
Well, this is always a big topic. In a way you're right, having these utils is a hint something could be better organized. But it doesn't mean it is in our control. For example:
I never saw it spinning 😆 But maybe it's an issue with my stdout.
Again a bit topic 😄 As I said, it sounds like a good idea, but almost not maintenable in practice. From my experience.
This is what I would have done for production code, but usually for tests, I like that kind of output. But alright, I'll modify that. |
Looks the |
Nothing to do with us, its a dependency & they're already on it :) |
I'm split... if we were to keep it, I'd move it to a decorator? That would make the functions way more readable, and reduce redundancy
You run usually scripts if I recall correctly? If you run it from a jupyter notebook, on this 6 dots on the image, you can see something like a snake walking around for the spinning display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example: _is_val_missing is a function that would better fit in AnnData itself. So I would consider those as "extensions" more than "utils".
Then @nicolassidoux they should be in https://github.com/theislab/ehrapy/blob/main/ehrapy/anndata/anndata_ext.py right? I still expect a lot of that to eventually make it into @eroell ehrdata but for now it should probably be there.
Could you please move the new functions?
I'm okay with the spinners being removed unless you deem them worthwhile now @nicolassidoux and want to write a decorator.
After the functions have been moved, I'd love to merge this.
Thank you very much!
Oh well I should have noticed this file indeed 😄
Let me assess how much work it is to make that done. |
for more information, see https://pre-commit.ci
Hm of course all notebooks tests or almost fail because of |
Rich is much much more popular and also used in packages such as |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Ok, it doesn't work well on my stdout, but ok in notebooks. |
for more information, see https://pre-commit.ci
Spinners work, cool. Thanks for making the decorator! Nice to have these additional tests, aim of the PR is fulfilled. I have one final point on the return type. Before, all imputation methods returned a new Right now, it has been changed to all imputation methods returning Even if this could mean a small breaking change for users doing If you agree with this or convince me otherwise, this looks ready to merge for me! |
Fine for me, anyway I advocated for removing this copy thing, but that's a separate debate. |
Done, I also noticed |
Hmm pre-commit doesn't like there is no return statement if copy=False. Should i modify like: ...
if copy:
return adata
else:
return |
Also how scanpy does it: https://github.com/scverse/scanpy/blob/main/src/scanpy/tools/_dpt.py#L199C5-L199C35 |
Thanks for the final polish here - Like your suggestion, or like scanpy does it, both fine! |
Done! |
The RTD build seems to time out because of dependencies somehow urgh |
I'll merge this now and I'll take a look myself soon. |
See #824 for the details of the change.