Skip to content

Commit

Permalink
Merge pull request #4 from zbw/zero-recall
Browse files Browse the repository at this point in the history
Set predicted recall to zero for docs without predicted labels
  • Loading branch information
Christopher Bartz authored Oct 5, 2022
2 parents f558a75 + 86c7690 commit 34248b7
Show file tree
Hide file tree
Showing 57 changed files with 150 additions and 66 deletions.
2 changes: 1 addition & 1 deletion qualle/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/combined.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/confidence.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/label_calibration/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/label_calibration/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/features/text.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/interface/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/interface/cli.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/interface/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/interface/internal.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/interface/rest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/label_calibration/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/label_calibration/category.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/label_calibration/simple.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/main.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/models.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
72 changes: 62 additions & 10 deletions qualle/pipeline.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from contextlib import contextmanager
from typing import List, Callable, Any
from typing import List, Callable, Any, Collection

from sklearn.model_selection import cross_val_predict

Expand Down Expand Up @@ -65,15 +65,67 @@ def train(self, data: TrainData):
self._recall_predictor.fit(features_data, true_recall)

def predict(self, data: PredictData) -> List[float]:
predicted_no_of_labels = self._label_calibrator.predict(data.docs)
label_calibration_data = LabelCalibrationData(
predicted_labels=data.predicted_labels,
predicted_no_of_labels=predicted_no_of_labels
)
features_data = self._features_data_mapper(
data, label_calibration_data
zero_idxs = self._get_pdata_idxs_with_zero_labels(data)
data_with_labels = self._get_pdata_with_labels(data, zero_idxs)
if data_with_labels.docs:
predicted_no_of_labels = self._label_calibrator.predict(
data_with_labels.docs
)
label_calibration_data = LabelCalibrationData(
predicted_labels=data_with_labels.predicted_labels,
predicted_no_of_labels=predicted_no_of_labels,
)
features_data = self._features_data_mapper(
data_with_labels, label_calibration_data
)
predicted_recall = self._recall_predictor.predict(
features_data
)
recall_scores = self._merge_zero_recall_with_predicted_recall(
predicted_recall=predicted_recall,
zero_labels_idx=zero_idxs,
)
else:
recall_scores = [0] * len(data.predicted_labels)
return recall_scores

@staticmethod
def _get_pdata_idxs_with_zero_labels(data: PredictData) -> Collection[int]:
return [
i for i in range(len(data.predicted_labels))
if not data.predicted_labels[i]
]

@staticmethod
def _get_pdata_with_labels(
data: PredictData, zero_labels_idxs: Collection[int]
) -> PredictData:
non_zero_idxs = [
i for i in range(len(data.predicted_labels))
if i not in zero_labels_idxs
]
return PredictData(
docs=[data.docs[i] for i in non_zero_idxs],
predicted_labels=[data.predicted_labels[i] for i in non_zero_idxs],
scores=[data.scores[i] for i in non_zero_idxs],
)
return self._recall_predictor.predict(features_data)

@staticmethod
def _merge_zero_recall_with_predicted_recall(
predicted_recall: List[float],
zero_labels_idx: Collection[int],
):
recall_scores = []
j = 0
for i in range(
len(zero_labels_idx) +
len(predicted_recall)):
if i in zero_labels_idx:
recall_scores.append(0)
else:
recall_scores.append(predicted_recall[j])
j += 1
return recall_scores

@contextmanager
def _debug(self, method_name):
Expand Down
2 changes: 1 addition & 1 deletion qualle/quality_estimation.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/train.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion qualle/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/common.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/features/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/features/label_calibration/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/features/test_combined.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/features/test_confidence.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/features/test_text.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/interface/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/interface/common.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/interface/test_cli.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/interface/test_internal.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/interface/test_rest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/label_calibration/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/label_calibration/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/label_calibration/test_category.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/label_calibration/test_simple.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion tests/test_eval.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021 ZBW – Leibniz Information Centre for Economics
# Copyright 2021-2022 ZBW – Leibniz Information Centre for Economics
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
Loading

0 comments on commit 34248b7

Please sign in to comment.