Add two step classifier #431

anna-charlotte · 2025-01-16T12:32:55Z

Adding a two-step-classifier, that's to be used with a logistic regression followed by a neural network classifier, as inspired by DIA-NN. With this we aim to increase sensitivity, particularly in samples with only few peptides present, such as single cell samples.

Steps:

add a TwoStepClassifier
add a parameter to config to use the two-step-classifier
adjust call in FDRManager.fit_predict(), where the classifier is trained
add tests

alphadia/fdrexperimental.py

alphadia/fdr.py

alphadia/fdrexperimental.py

alphadia/fdr.py

mschwoer · 2025-01-16T18:16:55Z

alphadia/fdr_utils.py

+logger = logging.getLogger()
+
+
+def keep_best(


+1 for reorganizing code, but this makes it hard to spot any changes :-)
would it be a large effort to move this back to fdr.py for now and do the reordering (=just moving) later (or: before) in a dedicated PR?

To my understanding the functions were copied as they are from

alphadia/alphadia/fdr.py

Line 171 in da99596

def keep_best(

So there should be no changes here?

yes exactly, it was just moved over, due to a circular import issue. That one has been resolved now, so I moved it back to alphadia/fdr.py for now:)

@GeorgWa I noticed there are duplicates of the functions keep_best(), fdr_to_q_values() and some more in aphadia/fdrx/stats.py. Is that on purpose? If so, why do we have those?

alphadia/fdrexperimental.py

anna-charlotte · 2025-01-17T10:55:23Z

alphadia/fdrexperimental.py

+        X = df_[x_cols]
+        y = df_[y_col]
+        df = df_.copy()
+        if hasattr(classifier, "fitted") and classifier.fitted:


yes, you're right :)

anna-charlotte · 2025-01-17T11:27:55Z

alphadia/workflow/manager.py

                if classifier_hash not in self.classifier_store:
                    classifier = deepcopy(self.classifier_base)
-                    classifier.from_state_dict(torch.load(os.path.join(path, file)))
+                    with contextlib.suppress(Exception):
+                        classifier.from_state_dict(torch.load(os.path.join(path, file)))


@GeorgWa What is this alphadia/constants/classifier/fa9945ae23db872d.pth file that we are loading here, some pretrained model? Shall I store a similar file for the two-step-classifier?

Yes exactly 👍🏻 We will do the same with the two step classifier eventually

anna-charlotte · 2025-01-21T10:31:26Z

@GeorgWa Should I add an e2e or performance test for the two step classifier, or just the unit tests?

mschwoer

LGTM!

alphadia/fdrx/models/__init__.py

mschwoer · 2025-01-21T13:43:54Z

alphadia/fdrx/models/two_step_classifier.py

+        self.first_classifier = first_classifier
+        self.second_classifier = second_classifier
+        self.first_fdr_cutoff = first_fdr_cutoff
+        self.second_fdr_cutoff = second_fdr_cutoff
+
+        self.min_precursors_for_update = min_precursors_for_update
+        self.train_on_top_n = train_on_top_n


could those be private? (check also LogisticRegression)

Do you mean self.min_precursors_for_update and self.train_on_top_n? I think they could be private, I went with not private, following the pattern BinaryClassifierLegacyNewBatching where only _fittedis private. But i'm ok with either, shall i change it?

given it's a different class, I would favor correctness over consistency here.. so I'd prefer changing it

mschwoer · 2025-01-21T13:45:36Z

alphadia/fdrx/models/two_step_classifier.py

+                    f"Stop training after iteration {i}, "
+                    f"due to decreasing target count ({current_target_count} < {best_precursor_count})"


(nit) "Stopping .." .. ".. decreased .."

alphadia/fdrx/models/two_step_classifier.py

mschwoer · 2025-01-21T13:46:38Z

alphadia/fdrx/models/two_step_classifier.py

+
+        return best_result
+
+    def preprocess_data(self, df: pd.DataFrame, x_cols: list[str]) -> pd.DataFrame:


please check all methods for being potentially private

alphadia/workflow/manager.py

alphadia/fdrx/models/logistic_regression.py

mschwoer · 2025-01-21T14:06:52Z

alphadia/fdrx/models/logistic_regression.py

+        """
+        self._fitted = state_dict["_fitted"]
+
+        if self.fitted:


please check if this should rather be if self._fitted:? if not, add a comment which deconfuses me :-)

I figured, since we implement Classifier, which has the @property def fitted(self): ..., one would use self._fitted for setting, but self.fitted for accessing, is that right? If so, I'll add a comment, or otherwise change it 😃

as it's the class accesing an instance variable, it's fine to use self._fitted (this is equivalent in terms of logic as here the property is a 1:1 wrapper around self._fitted to make it public)
more than "fine" actually: would prefer it for consistency :-)

@GeorgWa why do we have these properties anyway? they don't seem to be used

@anna-charlotte just FYI in case you also want to set your properties, you'd use the @value.setter decorator

class DummyClass: def __init__(self, value): self._value = value @property def value(self): print("getter") return self._value @value.setter def value(self, new_value): print("setter") self._value = new_value

yeah I figured we don't have the setter here, as we would want the fitted attribute to be read-only access for the user, otherwise would there be an advantage in this property being private, versus just haveing a .fittedattribute?

add logreg and two step classifer

bf480ff

anna-charlotte marked this pull request as draft January 16, 2025 12:53

anna-charlotte added 2 commits January 16, 2025 14:24

add config param to enable 2-step-classifier

e6d3e3b

fix FDRManger two-step-classifier parameter

1c2065c

GeorgWa reviewed Jan 16, 2025

View reviewed changes

anna-charlotte added 5 commits January 16, 2025 15:30

extract fdr utility functions due to circular import

29168e4

fix logreg initialization

124260f

fix fdr_utils test

6480c6a

Merge remote-tracking branch 'origin/main' into add-two-step-classifier

e450f24

fix fdr_utils refactoring

5850360

mschwoer reviewed Jan 16, 2025

View reviewed changes

anna-charlotte added 5 commits January 17, 2025 08:38

remove redundant perform_fdr_new function

e8fcc3f

revert fdr_utils changes

052c109

clean up and add docstrings

9df38a8

add missing docstring

9e8c0c8

clean up fdrexperimental.py

1605cfa

anna-charlotte commented Jan 17, 2025

View reviewed changes

anna-charlotte added 9 commits January 17, 2025 13:57

Merge remote-tracking branch 'origin/main' into add-two-step-classifier

8551503

formatting

2d2be2a

move models to new fdr_analysis module

7a1f4a7

move files from fdr_analysis to fdrx module

ceabe7c

add max_iteration parameter to 2-step-classifier.fit_predict()

fdf62db

Merge remote-tracking branch 'origin/main' into add-two-step-classifier

1c8157f

refactoring of two-step-classifier helper functions

e6d6b95

add unit tests

65a11bb

formatting

ef6fc45

anna-charlotte requested review from mschwoer and GeorgWa January 21, 2025 10:31

fix test for get_target_decoy_partners

584bab7

mschwoer approved these changes Jan 21, 2025

View reviewed changes

anna-charlotte added 5 commits January 24, 2025 11:22

addressing PR comments

c5e3eed

addressing PR comments

21da1be

Merge remote-tracking branch 'origin/main' into add-two-step-classifier

7dd43a1

fix formatting

8c5547a

addressing pr comments, private attributes

3d602af

anna-charlotte marked this pull request as ready for review January 24, 2025 11:24

anna-charlotte merged commit 781e34e into main Jan 24, 2025
5 checks passed

anna-charlotte deleted the add-two-step-classifier branch January 24, 2025 11:25

anna-charlotte mentioned this pull request Jan 24, 2025

Adjust config and gui for max_iter parameter #449

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two step classifier #431

Add two step classifier #431

anna-charlotte commented Jan 16, 2025 •

edited

Loading

mschwoer Jan 16, 2025

GeorgWa Jan 16, 2025

anna-charlotte Jan 17, 2025

anna-charlotte Jan 17, 2025

anna-charlotte Jan 17, 2025

anna-charlotte Jan 17, 2025

GeorgWa Jan 17, 2025

anna-charlotte commented Jan 21, 2025

mschwoer left a comment

mschwoer Jan 21, 2025

anna-charlotte Jan 24, 2025

mschwoer Jan 24, 2025

mschwoer Jan 21, 2025

mschwoer Jan 21, 2025

mschwoer Jan 21, 2025

anna-charlotte Jan 24, 2025

mschwoer Jan 24, 2025

mschwoer Jan 24, 2025

anna-charlotte Jan 24, 2025

		f"Stop training after iteration {i}, "
		f"due to decreasing target count ({current_target_count} < {best_precursor_count})"


		return best_result

		def preprocess_data(self, df: pd.DataFrame, x_cols: list[str]) -> pd.DataFrame:

Add two step classifier #431

Add two step classifier #431

Conversation

anna-charlotte commented Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anna-charlotte commented Jan 21, 2025

mschwoer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anna-charlotte commented Jan 16, 2025 •

edited

Loading