-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old school bit-plane overlay handling #201
Comments
That's correct - the pixel cleaner is just for the text overlays. If there is some other functionality you'd like, we welcome a PR, or providing enough detail so someone else could implement it. |
I am happy to make a code contribution, but I am still trying to learn my way around the project at this point so any guidance on how to accomplish this would be appreciated. I am basically looking to remove the old (now deprecated) style of overlays described in the note at the end of page 707 of this document: https://dicom.nema.org/MEDICAL/Dicom/2004/printed/04_03pu3.pdf These overlays can be cleared by zeroing all the bits of each pixel element above the number of "BitsStored". If "BitsStored" == "BitsAllocated" this can be skipped. In theory there should be an "OverlayBitPosition" element present if these types of overlays are being used, but it is safer to just always zero out these "extra" high bits if they exist. So there are basically three types of overlays:
So handling 1 is easy by just deleting the overlay elements in your recipe, and 3 is handled by the current pixel cleaning code. I would like to add support for handling 2. |
Ah gotcha! So if you have used pydicom before, what you'd want to do is write a little script that shows loading a dicom dataset, and then checking and parsing the attributes. Example images would help here that we can add to tests (small and anonymized ideally). If you don't have example images, then minimally it would be good to send me something I can work with (and I won't put anywhere / will delete when I finish - I just need it to test and develop). Once you have that example and can show me, I can figure out the best UI interaction to add. E.g., it might be a different kind of clean, or something we do by default (and disabled with a flag) given that we find that kind of data. I do like how you've laid out those three categories of overlays, and I think we should add something like that to the docs to explain the options (of course when the time comes). |
https://www.medicalconnections.co.uk/kb/Number-Of-Overlays-In-Image In general, I think the most paranoid thing should be the default/supported in deid. i.e. we should verify all unused bits in the pixel data are empty on output so that they cannot leak any information. I don't think we'd necessarily want to convert old overlays to new. That seems like something that should be added to pydicom itself if there's any want for it. My recommendation is deid just destroys old-school overlays until or unless pydicom provides other options for handling them. I'll need to confirm but my hunch is old-school overlays were retired when support for compressed/encoded pixel data was added so we can possibly branch based whether there are any available bits that can be cleared. |
Deleting the overlays (or setting the high bits to zero for old-school overlays) will certainly help to deidentify. I would note that even CTP doesn't clear the high bits when deidentifying so this would give pydicom an advantage! However I think it's overkill for the purposes of removing PII. Can we apply the same rectangle redaction to overlays, as we do to image planes? DICOMs can have
In order to deidentify without damaging anything else we need to redact rectangles where PII text is found, and leave all other parts alone. This means keeping overlays and removing only the sensitive text on them. I'd like to be able to say "remove rectangle (x,y,w,h) from frame 27 of overlay 13" for example. Can we do that? |
We can do almost anything if someone can show me a dummy example in code. ;) |
Basically what happens in the old-school overlays is that the images have "Bits Allocated" and "Bits Stored". This is a pixel-by-pixel setting and Bits Stored can be smaller than Bits Allocated (may not be accurate I'm just working from memory). So, for example, if you have 16-bits allocated per pixel and 14-bits stored that leave two extra bits that are wasted and unused and can be used for storing two overlays. I think where they end up also depends on byte-order and high bit settings. I'm just speaking generally here. In these old-school overlays you can just apply masks to the allocated bits to select the image vs selecting the overlay bits. To remove the overlays you basically set the extra bits that are allocated but not stored to false (or true if you want to be annoying). And you can store the extracted overlays in the new format. This transformation of overlay styles is what I was suggesting belongs in pydicom more generally. Or... not depending on how much they still exist in the real world. I suspect the interplay with these extra bit planes with compression/lossy compression is not pretty (particularly if the overlays are modified) which is why this method of overlays was retired. Compressed images don't really have dead space sitting around and the new overlay format just packs the individual bits together (so they have to be decoded). The "new" style overlays can also have higher resolution than the original image. Anyway deleting the old overlays by masking pixel bits is the easiest approach. Translating old-school overlays to new overlays seems like a utility function that belongs in pydicom itself. Just my two c. I personally have never seen a multiframe where PHI appears in a rectangle on some frames and that same location (rectangle) contains non-PHI on other frames. They might be blank but clearing blank bits isn't a problem. i.e. I've not encountered cases where anything would be lost by applying the same mask to the entire multiframe image. See also the discussion at the end of http://www.dclunie.com/medical-image-faq/html/part1.html |
I think we need four options. The original author of this issue probably wants option 1.
There are some sample images in the gdcm conformance test file collection which may be useful. |
This handles option 1, as the original poster wanted.
|
Here's a sample program which implements all required options. |
A good way of testing pydicom is to use the sample files in https://sourceforge.net/projects/gdcm/files/gdcmData/ and https://sourceforge.net/projects/gdcm/files/gdcmConformanceTests/ Some filenames from these sets are in the script. You can also get sample files, such as multi-frame images, from https://gdcm.sourceforge.net/wiki/index.php/Sample_DataSet (I used the first link on that page). |
Yes, but we do need the original poster as the source of truth to test and report that the issue is fixed or not. |
@moloney Is the code provided above helpful to you? Does it work ok? |
Does this project scrub the high bits in PixelData (above "BitsStored" and below "BitsAllocated") to clear out overlays stored this way? Initially I thought this is what the pixel cleaner code is for, but it looks like this is just for handling "burned in" text overlays where the only option is to blank out a predefined rectangular region.
The text was updated successfully, but these errors were encountered: