Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation_external.py: Index mismatch between clustering results and metadata introduces NaNs #61

Open
bednarsky opened this issue Nov 18, 2024 · 4 comments · May be fixed by #65
Open
Labels
documentation Improvements or additions to documentation

Comments

@bednarsky
Copy link

bednarsky commented Nov 18, 2024

  • I am testing edge cases (not on purpose haha)
  • I have indices like this: Epi_Abs_BEST4+__H335__M21_00014
  • The clustering or some earlier step seems to replace the + with .
  • Full error for findability
Traceback (most recent call last):
  File "/path/to/proj.snakemake/scripts/tmph6wb_w6l.validation_external.py", line 68, in <module>
    idx_dfs["AMI"].loc[clustering, metadata] = metrics.adjusted_mutual_info_score(categorical_metadata[metadata], clustering_results[clustering])
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/metrics/cluster/_supervised.py", line 1000, in adjusted_mutual_info_score
    labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/metrics/cluster/_sup
ervised.py", line 50, in check_clusterings
    labels_pred = check_array(
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/utils/validation.py", line 959, in check_array
    _assert_all_finite(
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/utils/validation.py"
, line 124, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/path/to/conda/ea520d7177268389aea78d788e0c3541_/lib/python3.9/site-packages/sklearn/utils/validation.py"
, line 173, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input contains NaN.
@sreichl
Copy link
Collaborator

sreichl commented Nov 18, 2024

ah yes that's an R issue that is not changeable. We can not manage this exception effectively, because even if you fix it in one script you have to take the "fix" along to other scripts, which introduces a dependency hell to maintain. Only solution I can think of: Do not use special characters (here, and in general).

@sreichl sreichl added the documentation Improvements or additions to documentation label Nov 18, 2024
@bednarsky
Copy link
Author

  • I agree about dependency hell, and also hard to automatically change the annotations (could be that this creates duplicates, e.g., if you have two index names: BEST4+ and BEST4-
  • If we decide to make it 'Don't use special characters' then this should be communicated more clearly.
  • Maybe should also add a test for the index only containing non-special characters somewhere, if not, cause a very clear error (not just some NaNs emerging somewhere far downstream)
  • Like it was now, wasn't so straightforward to find what causes the error

@sreichl
Copy link
Collaborator

sreichl commented Nov 18, 2024

  • yes, document and communicate more strongly in config/README.md
  • maybe, check but this adds lots of code for every eventuality that might happen, I would rather stick with more clear docs.

For future reference, I would strongly recommend never using +/- nomenclature instead use e.g., pos/neg.

@bednarsky
Copy link
Author

bednarsky commented Nov 18, 2024

Great, will do a PR for (1) (and see if I don't find a one-liner somewhere that asserts (2))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants