Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DICOM redactor improvement: Enable selection of redact approach #1113

Merged
merged 29 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
94df5f1
Adding in ability to select redact approach
niwilso Jul 5, 2023
22d18a1
Adding redact_approach arg into tests so they pass correctly
niwilso Jul 5, 2023
1ee7d93
Merge branch 'main' into niwilso/dicom/select-redact-approach
niwilso Jul 6, 2023
5ac0095
Adding test for _get_analyzer_results
niwilso Jul 6, 2023
d79b033
Linting fixes
niwilso Jul 6, 2023
22561ba
Additional linting fixes
niwilso Jul 6, 2023
b94d7e5
Resolving merge conflicts with updated main
niwilso Jul 6, 2023
a2167c1
Making default approach the default
niwilso Aug 7, 2023
6c23418
Merging in main
niwilso Aug 7, 2023
9066349
Linting fix
niwilso Aug 7, 2023
360a386
Replacing patch.object() with patch() in modified tests
niwilso Aug 7, 2023
a3ce5f8
Fixed mocker patch for new _get_analyzer_results in redact_and_return…
niwilso Aug 7, 2023
e2ab725
Removing old assertions brought over from incorrect merge conflict re…
niwilso Aug 7, 2023
ec2649a
Replacing call_count == statements with assert_called_once etc
niwilso Aug 7, 2023
529cbd6
Replacing in-test instantiation with passing in mock_engine
niwilso Aug 7, 2023
212dc8e
Suggested change from PR comment. redact_approach replaced with use_m…
niwilso Aug 17, 2023
490f3d4
Linting fixes
niwilso Aug 17, 2023
fc1f0c2
Additional linting fixes
niwilso Aug 17, 2023
fdd62b0
Merge branch 'main' into niwilso/dicom/select-redact-approach
niwilso Aug 17, 2023
fde5088
Remove output that is not checked
niwilso Aug 17, 2023
916fed6
Commenting out whole unit test file to see impact on build pipeline h…
niwilso Aug 17, 2023
510cbac
Only commenting out get_analyzer_results tests
niwilso Aug 17, 2023
4e7c8a6
Only commenting out happy path test for get_analyzer_results
niwilso Aug 17, 2023
57f2a86
Changing ad_hoc_recognizers type check to raise only TypeError
niwilso Aug 17, 2023
ed2608d
Removing type assertion for exception test
niwilso Aug 17, 2023
a5909a3
Reintroduce the happy path test for get_analyzer_results
niwilso Aug 17, 2023
ad01b43
Commenting out exception test and keeping in happy path test
niwilso Aug 17, 2023
9a1f56b
Removing constants from parameterize for exception test
niwilso Aug 17, 2023
dabc579
Adding argument= in call to get_analyzer_results in happy path
niwilso Aug 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from presidio_image_redactor import ImageRedactorEngine
from presidio_image_redactor import ImageAnalyzerEngine # noqa: F401
from presidio_analyzer import PatternRecognizer
from presidio_image_redactor.entities import ImageRecognizerResult


class DicomImageRedactorEngine(ImageRedactorEngine):
Expand All @@ -30,6 +31,7 @@ def redact(
fill: str = "contrast",
padding_width: int = 25,
crop_ratio: float = 0.75,
redact_approach: Union[str, PatternRecognizer] = "metadata",
ocr_kwargs: Optional[dict] = None,
**text_analyzer_kwargs,
):
Expand All @@ -43,6 +45,8 @@ def redact(
:param padding_width: Padding width to use when running OCR.
:param crop_ratio: Portion of image to consider when selecting
most common pixel value as the background color value.
:param redact_approach: What approach to use when redacting
niwilso marked this conversation as resolved.
Show resolved Hide resolved
("default", "metadata", or a PatternRecognizer object).
:param ocr_kwargs: Additional params for OCR methods.
:param text_analyzer_kwargs: Additional values for the analyze method
in AnalyzerEngine.
Expand Down Expand Up @@ -71,17 +75,9 @@ def redact(
loaded_image = Image.open(png_filepath)
image = self._add_padding(loaded_image, is_greyscale, padding_width)

# Create custom recognizer using DICOM metadata
original_metadata, is_name, is_patient = self._get_text_metadata(instance)
phi_list = self._make_phi_list(original_metadata, is_name, is_patient)
deny_list_recognizer = PatternRecognizer(
supported_entity="PERSON", deny_list=phi_list
)
analyzer_results = self.image_analyzer_engine.analyze(
image,
ocr_kwargs=ocr_kwargs,
ad_hoc_recognizers=[deny_list_recognizer],
**text_analyzer_kwargs,
# Detect PII
analyzer_results = self._get_analyzer_results(
image, instance, redact_approach, ocr_kwargs, **text_analyzer_kwargs
)

# Redact all bounding boxes from DICOM file
Expand All @@ -102,6 +98,7 @@ def redact_from_file(
padding_width: int = 25,
crop_ratio: float = 0.75,
fill: str = "contrast",
redact_approach: Union[str, PatternRecognizer] = "metadata",
ocr_kwargs: Optional[dict] = None,
**text_analyzer_kwargs,
) -> None:
Expand All @@ -115,6 +112,8 @@ def redact_from_file(
:param padding_width : Padding width to use when running OCR.
:param fill: Color setting to use for redaction box
("contrast" or "background").
:param redact_approach: What approach to use when redacting
("default", "metadata", or a PatternRecognizer object).
:param ocr_kwargs: Additional params for OCR methods.
:param text_analyzer_kwargs: Additional values for the analyze method
in AnalyzerEngine.
Expand All @@ -138,6 +137,7 @@ def redact_from_file(
crop_ratio=crop_ratio,
fill=fill,
padding_width=padding_width,
redact_approach=redact_approach,
overwrite=True,
dst_parent_dir=".",
ocr_kwargs=ocr_kwargs,
Expand All @@ -155,6 +155,7 @@ def redact_from_directory(
padding_width: int = 25,
crop_ratio: float = 0.75,
fill: str = "contrast",
redact_approach: Union[str, PatternRecognizer] = "metadata",
ocr_kwargs: Optional[dict] = None,
**text_analyzer_kwargs,
) -> None:
Expand All @@ -170,6 +171,8 @@ def redact_from_directory(
most common pixel value as the background color value.
:param fill: Color setting to use for redaction box
("contrast" or "background").
:param redact_approach: What approach to use when redacting
("default", "metadata", or a PatternRecognizer object).
:param ocr_kwargs: Additional params for OCR methods.
:param text_analyzer_kwargs: Additional values for the analyze method
in AnalyzerEngine.
Expand All @@ -193,6 +196,7 @@ def redact_from_directory(
crop_ratio=crop_ratio,
fill=fill,
padding_width=padding_width,
redact_approach=redact_approach,
overwrite=True,
dst_parent_dir=".",
ocr_kwargs=ocr_kwargs,
Expand Down Expand Up @@ -733,12 +737,74 @@ def _add_redact_box(

return redacted_instance

def _get_analyzer_results(
self,
image: PIL.PngImagePlugin.PngImageFile,
instance: pydicom.dataset.FileDataset,
redact_approach: Union[str, PatternRecognizer],
ocr_kwargs: Optional[dict],
**text_analyzer_kwargs
) -> List[ImageRecognizerResult]:
"""Analyze image with selected redaction approach.

:param image: DICOM pixel data as PIL image.
:param instance: DICOM instance (with metadata).
:param redact_approach: What approach to use when redacting
("default", "metadata", "allow", or a PatternRecognizer object).
niwilso marked this conversation as resolved.
Show resolved Hide resolved
:param ocr_kwargs: Additional params for OCR methods.
:param text_analyzer_kwargs: Additional values for the analyze method
in AnalyzerEngine (e.g., allow_list).

:return: Analyzer results.
"""
# Detect PII
if type(redact_approach) == str:
niwilso marked this conversation as resolved.
Show resolved Hide resolved
if redact_approach.lower() == "default":
# Use default redactor
analyzer_results = self.image_analyzer_engine.analyze(
image,
ocr_kwargs=ocr_kwargs,
**text_analyzer_kwargs,
)
elif redact_approach.lower() == "metadata":
# Create custom recognizer using DICOM metadata
original_metadata, is_name, is_patient = self._get_text_metadata(
instance
)
phi_list = self._make_phi_list(
original_metadata, is_name, is_patient
)
deny_list_recognizer = PatternRecognizer(
supported_entity="PERSON", deny_list=phi_list
)
analyzer_results = self.image_analyzer_engine.analyze(
image,
ocr_kwargs=ocr_kwargs,
ad_hoc_recognizers=[deny_list_recognizer],
**text_analyzer_kwargs,
)
else:
raise ValueError("Please enter valid string or PatternRecognizer object for redact_approach") # noqa: E501
elif type(redact_approach) == PatternRecognizer:
niwilso marked this conversation as resolved.
Show resolved Hide resolved
# Use passed in recognizer
analyzer_results = self.image_analyzer_engine.analyze(
image,
ocr_kwargs=ocr_kwargs,
ad_hoc_recognizers=[redact_approach],
**text_analyzer_kwargs,
)
else:
raise ValueError("Please enter valid string or PatternRecognizer object for redact_approach") # noqa: E501

return analyzer_results

def _redact_single_dicom_image(
self,
dcm_path: str,
crop_ratio: float,
fill: str,
padding_width: int,
redact_approach: Union[str, PatternRecognizer],
overwrite: bool,
dst_parent_dir: str,
ocr_kwargs: Optional[dict] = None,
Expand All @@ -752,6 +818,8 @@ def _redact_single_dicom_image(
:param fill: Color setting to use for bounding boxes
("contrast" or "background").
:param padding_width: Pixel width of padding (uniform).
:param redact_approach: What approach to use when redacting
("default", "metadata", or a PatternRecognizer object).
:param overwrite: Only set to True if you are providing the
duplicated DICOM path in dcm_path.
:param dst_parent_dir: String path to parent directory of where to store copies.
Expand Down Expand Up @@ -789,17 +857,9 @@ def _redact_single_dicom_image(
loaded_image = Image.open(png_filepath)
image = self._add_padding(loaded_image, is_greyscale, padding_width)

# Create custom recognizer using DICOM metadata
original_metadata, is_name, is_patient = self._get_text_metadata(instance)
phi_list = self._make_phi_list(original_metadata, is_name, is_patient)
deny_list_recognizer = PatternRecognizer(
supported_entity="PERSON", deny_list=phi_list
)
analyzer_results = self.image_analyzer_engine.analyze(
image,
ocr_kwargs=ocr_kwargs,
ad_hoc_recognizers=[deny_list_recognizer],
**text_analyzer_kwargs,
# Detect PII
analyzer_results = self._get_analyzer_results(
image, instance, redact_approach, ocr_kwargs, **text_analyzer_kwargs
)

# Redact all bounding boxes from DICOM file
Expand All @@ -822,6 +882,7 @@ def _redact_multiple_dicom_images(
crop_ratio: float,
fill: str,
padding_width: int,
redact_approach: Union[str, PatternRecognizer],
overwrite: bool,
dst_parent_dir: str,
ocr_kwargs: Optional[dict] = None,
Expand All @@ -835,6 +896,8 @@ def _redact_multiple_dicom_images(
:param fill: Color setting to use for bounding boxes
("contrast" or "background").
:param padding_width: Pixel width of padding (uniform).
:param redact_approach: What approach to use when redacting
("default", "metadata", or a PatternRecognizer object).
:param overwrite: Only set to True if you are providing
the duplicated DICOM dir in dcm_dir.
:param dst_parent_dir: String path to parent directory of where to store copies.
Expand Down Expand Up @@ -865,6 +928,7 @@ def _redact_multiple_dicom_images(
crop_ratio,
fill,
padding_width,
redact_approach,
overwrite,
dst_parent_dir,
ocr_kwargs=ocr_kwargs,
Expand Down
Loading