Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two step classifier #431

Merged
merged 28 commits into from
Jan 24, 2025
Merged

Add two step classifier #431

merged 28 commits into from
Jan 24, 2025

Conversation

anna-charlotte
Copy link
Contributor

@anna-charlotte anna-charlotte commented Jan 16, 2025

Adding a two-step-classifier, that's to be used with a logistic regression followed by a neural network classifier, as inspired by DIA-NN. With this we aim to increase sensitivity, particularly in samples with only few peptides present, such as single cell samples.

Steps:

  • add a TwoStepClassifier
  • add a parameter to config to use the two-step-classifier
  • adjust call in FDRManager.fit_predict(), where the classifier is trained
  • add tests

@anna-charlotte anna-charlotte marked this pull request as draft January 16, 2025 12:53
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdr.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdr.py Outdated Show resolved Hide resolved
logger = logging.getLogger()


def keep_best(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for reorganizing code, but this makes it hard to spot any changes :-)
would it be a large effort to move this back to fdr.py for now and do the reordering (=just moving) later (or: before) in a dedicated PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the functions were copied as they are from

def keep_best(

So there should be no changes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly, it was just moved over, due to a circular import issue. That one has been resolved now, so I moved it back to alphadia/fdr.py for now:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeorgWa I noticed there are duplicates of the functions keep_best(), fdr_to_q_values() and some more in aphadia/fdrx/stats.py. Is that on purpose? If so, why do we have those?

alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
alphadia/fdrexperimental.py Outdated Show resolved Hide resolved
X = df_[x_cols]
y = df_[y_col]
df = df_.copy()
if hasattr(classifier, "fitted") and classifier.fitted:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right :)

Comment on lines 767 to 770
if classifier_hash not in self.classifier_store:
classifier = deepcopy(self.classifier_base)
classifier.from_state_dict(torch.load(os.path.join(path, file)))
with contextlib.suppress(Exception):
classifier.from_state_dict(torch.load(os.path.join(path, file)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeorgWa What is this alphadia/constants/classifier/fa9945ae23db872d.pth file that we are loading here, some pretrained model? Shall I store a similar file for the two-step-classifier?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly 👍🏻 We will do the same with the two step classifier eventually

@anna-charlotte
Copy link
Contributor Author

@GeorgWa Should I add an e2e or performance test for the two step classifier, or just the unit tests?

Copy link
Collaborator

@mschwoer mschwoer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

alphadia/fdrx/models/__init__.py Outdated Show resolved Hide resolved
Comment on lines 39 to 45
self.first_classifier = first_classifier
self.second_classifier = second_classifier
self.first_fdr_cutoff = first_fdr_cutoff
self.second_fdr_cutoff = second_fdr_cutoff

self.min_precursors_for_update = min_precursors_for_update
self.train_on_top_n = train_on_top_n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could those be private? (check also LogisticRegression)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean self.min_precursors_for_update and self.train_on_top_n? I think they could be private, I went with not private, following the pattern BinaryClassifierLegacyNewBatching where only _fittedis private. But i'm ok with either, shall i change it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given it's a different class, I would favor correctness over consistency here.. so I'd prefer changing it

Comment on lines 105 to 106
f"Stop training after iteration {i}, "
f"due to decreasing target count ({current_target_count} < {best_precursor_count})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) "Stopping .." .. ".. decreased .."

alphadia/fdrx/models/two_step_classifier.py Show resolved Hide resolved

return best_result

def preprocess_data(self, df: pd.DataFrame, x_cols: list[str]) -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check all methods for being potentially private

alphadia/workflow/manager.py Outdated Show resolved Hide resolved
alphadia/workflow/manager.py Outdated Show resolved Hide resolved
alphadia/workflow/manager.py Outdated Show resolved Hide resolved
alphadia/fdrx/models/logistic_regression.py Show resolved Hide resolved
"""
self._fitted = state_dict["_fitted"]

if self.fitted:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if this should rather be if self._fitted:? if not, add a comment which deconfuses me :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured, since we implement Classifier, which has the @property def fitted(self): ..., one would use self._fitted for setting, but self.fitted for accessing, is that right? If so, I'll add a comment, or otherwise change it 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it's the class accesing an instance variable, it's fine to use self._fitted (this is equivalent in terms of logic as here the property is a 1:1 wrapper around self._fitted to make it public)
more than "fine" actually: would prefer it for consistency :-)

@GeorgWa why do we have these properties anyway? they don't seem to be used

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anna-charlotte just FYI in case you also want to set your properties, you'd use the @value.setter decorator

class DummyClass:
    def __init__(self, value):
        self._value = value

    @property
    def value(self):
        print("getter")
        return self._value

    @value.setter
    def value(self, new_value):
        print("setter")
        self._value = new_value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I figured we don't have the setter here, as we would want the fitted attribute to be read-only access for the user, otherwise would there be an advantage in this property being private, versus just haveing a .fittedattribute?

@anna-charlotte anna-charlotte marked this pull request as ready for review January 24, 2025 11:24
@anna-charlotte anna-charlotte merged commit 781e34e into main Jan 24, 2025
5 checks passed
@anna-charlotte anna-charlotte deleted the add-two-step-classifier branch January 24, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants