-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two step classifier #431
Conversation
alphadia/fdr_utils.py
Outdated
logger = logging.getLogger() | ||
|
||
|
||
def keep_best( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for reorganizing code, but this makes it hard to spot any changes :-)
would it be a large effort to move this back to fdr.py
for now and do the reordering (=just moving) later (or: before) in a dedicated PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding the functions were copied as they are from
Line 171 in da99596
def keep_best( |
So there should be no changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly, it was just moved over, due to a circular import issue. That one has been resolved now, so I moved it back to alphadia/fdr.py
for now:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeorgWa I noticed there are duplicates of the functions keep_best()
, fdr_to_q_values()
and some more in aphadia/fdrx/stats.py
. Is that on purpose? If so, why do we have those?
alphadia/fdrexperimental.py
Outdated
X = df_[x_cols] | ||
y = df_[y_col] | ||
df = df_.copy() | ||
if hasattr(classifier, "fitted") and classifier.fitted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right :)
alphadia/workflow/manager.py
Outdated
if classifier_hash not in self.classifier_store: | ||
classifier = deepcopy(self.classifier_base) | ||
classifier.from_state_dict(torch.load(os.path.join(path, file))) | ||
with contextlib.suppress(Exception): | ||
classifier.from_state_dict(torch.load(os.path.join(path, file))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeorgWa What is this alphadia/constants/classifier/fa9945ae23db872d.pth
file that we are loading here, some pretrained model? Shall I store a similar file for the two-step-classifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly 👍🏻 We will do the same with the two step classifier eventually
@GeorgWa Should I add an e2e or performance test for the two step classifier, or just the unit tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
self.first_classifier = first_classifier | ||
self.second_classifier = second_classifier | ||
self.first_fdr_cutoff = first_fdr_cutoff | ||
self.second_fdr_cutoff = second_fdr_cutoff | ||
|
||
self.min_precursors_for_update = min_precursors_for_update | ||
self.train_on_top_n = train_on_top_n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could those be private? (check also LogisticRegression
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean self.min_precursors_for_update
and self.train_on_top_n
? I think they could be private, I went with not private, following the pattern BinaryClassifierLegacyNewBatching
where only _fitted
is private. But i'm ok with either, shall i change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given it's a different class, I would favor correctness over consistency here.. so I'd prefer changing it
f"Stop training after iteration {i}, " | ||
f"due to decreasing target count ({current_target_count} < {best_precursor_count})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) "Stopping .." .. ".. decreased .."
|
||
return best_result | ||
|
||
def preprocess_data(self, df: pd.DataFrame, x_cols: list[str]) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check all methods for being potentially private
""" | ||
self._fitted = state_dict["_fitted"] | ||
|
||
if self.fitted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check if this should rather be if self._fitted:
? if not, add a comment which deconfuses me :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured, since we implement Classifier
, which has the @property def fitted(self): ...
, one would use self._fitted
for setting, but self.fitted
for accessing, is that right? If so, I'll add a comment, or otherwise change it 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it's the class accesing an instance variable, it's fine to use self._fitted
(this is equivalent in terms of logic as here the property is a 1:1 wrapper around self._fitted
to make it public)
more than "fine" actually: would prefer it for consistency :-)
@GeorgWa why do we have these properties anyway? they don't seem to be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anna-charlotte just FYI in case you also want to set your properties, you'd use the @value.setter
decorator
class DummyClass:
def __init__(self, value):
self._value = value
@property
def value(self):
print("getter")
return self._value
@value.setter
def value(self, new_value):
print("setter")
self._value = new_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I figured we don't have the setter here, as we would want the fitted
attribute to be read-only access for the user, otherwise would there be an advantage in this property being private, versus just haveing a .fitted
attribute?
Adding a two-step-classifier, that's to be used with a logistic regression followed by a neural network classifier, as inspired by DIA-NN. With this we aim to increase sensitivity, particularly in samples with only few peptides present, such as single cell samples.
Steps:
TwoStepClassifier
FDRManager.fit_predict()
, where the classifier is trained