-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in input data and legacy conversion #17
Comments
This is a tricky question. Generally, I would argue that the library has to assume passed data sets are standard compliant and I would be cautious about just "fixing" things magically. The problem with this approach is that users will keep thinking that there is no problem with the data sets they provided and will be surprised if things break elsewhere. On the one hand, I think it would be reasonable to encourage the user to correct the input data set if there are any "errors" in the input single-frame data sets that raise an exception upon conversion. Note that this would not necessarily require the end user to change the original images (files), but the application that uses the highdicom library could apply these changes at Python runtime before passing the data sets to the constructor of the Legacy Converted Enhanced MR/CT/PET Image class. However, it may ultimately not be clear how attributes of the output (multi-frame legacy converted enhanced object) relate to attributes of the inputs (single-frame legacy objects). On the other hand, I see the opportunity for legacy conversion to further enhance data sets by addressing compliance issues and the Nonconforming Modified Attributes Sequence would be an elegant way to do this in a transparent manner. Personally, I favor the second approach even though the would mean that the library takes on more responsibility. |
@fedorov wrote "if we first patch the input datasets, and then do the conversion, we would need to reference the ephemeral patched datasets, if we want to do things right"
I would keep the same SOP Instance UIDs for the "ephemeral" patched single frame instances since they are being discarded and are only a temporary intermediate step; so there would no special effort required to make sure that the references to the source from the legacy enhanced multi-frame images was to the originals.
Further (wrt. "do[ing] things right"), even in a production environment, not every change to a DICOM attribute requires a new SOP Instance UID, but that is a complicated discussion (and common misunderstanding) that we don't need to get into here.
Also, since legacy conversion normally dumps everything not otherwise specifically handled into the "unassigned" sequences, I am not sure how you will determine what is "important for legacy conversion" (all those note "unassigned"?), but it is an interesting question.
Note that for propagating composite context correctly (patient and study information entity attributes in particular, but also common series, frame of reference and instance (esp. SOP Common Module) stuff), I usually handle those separately, rather than dumping (some of them) into unassigned ... this depends on having knowledge of which attributes are which "level" or handling of specific modules (see com.pixelmed.dicom.CompositeInstanceContext as used in com.pixelmed.dicom.MultiFrameImageFactory; these are a bit out of date wrt. the current standard in this respect, so I will update them).
…On 2/21/20 11:57 PM, Andrey Fedorov wrote:
While evaluating legacy sop classes, @afshinmessiah <https://github.com/afshinmessiah> and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.
This raised the discussion with @dclunie <https://github.com/dclunie> below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:
* this will be easier to understand and use for users trying to use legacy conversion functionality
* it may be easier to correct only errors that are important for legacy conversion specifically, rather than develop a tool that tries to fix all errors (e.g., not all of the attributes will be used for legacy conversion)
* if we first patch the input datasets, and then do the conversion, we would need to reference the ephemeral patched datasets, if we want to do things right
@hackermd <https://github.com/hackermd> did you think about this?
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper
Hi Andrey
I also copied Steve.
In short, probably option 2 (patch the source dataset, and then
do the conversion).
It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.
E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.
Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.
I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.
But if one can make a fix, it would be nice to.
There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:
ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf
http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2
In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).
I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".
In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.
In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.
This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.
This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.
When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.
It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.
BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.
David
PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.
PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.
On 2/19/20 11:18 PM, Andrey Fedorov wrote:
Hi David,
As Afshin is working on the MF conversion task, we wanted to ask you a
"fundamental" question.
As you know, TCIA datasets may often have errors. What should be the
strategy for addressing those? Should we:
1. carry forward those errors into MF representation, and just ignore
those while validating MF?
2. patch the source dataset, and then do the conversion?
3. patch the errors in the MF representation in the process of
conversion, and keep the originals intact?
I would probably prefer option 3.
AF
|
For reference, Slicer provides a DICOMPatcher module that handles a number of common issues. This is another set of code (like the plugins) that we could factor out into |
Thanks for mentioning DICOMPatcher Steve. This reminded me we already had this discussion in the context of dcmqi development. Here are related issues. DCMTK has to deal with a similar situation creating SEGs while carrying forward composite context from the referenced objects, we had quite some discussions about this with @michaelonken:
I need to think about the discussions above! |
Another related component to be mentioned here is https://github.com/innolitics/dicom-standard, which can be used to identify errors that should be patched. @afshinmessiah is going to look into this, Slicer DICOMPatcher and @dclunie tools, and we will think how to proceed. @hackermd maybe after that we should discuss if that validator/patcher belongs to highdicom, or should be a separate tool. |
While evaluating legacy sop classes, @afshinmessiah and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.
This raised the discussion with @dclunie below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:
@hackermd did you think about this?
From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper
Hi Andrey
I also copied Steve.
In short, probably option 2 (patch the source dataset, and then
do the conversion).
It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.
E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.
Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.
I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.
But if one can make a fix, it would be nice to.
There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:
ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf
http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2
In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).
I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".
In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.
In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.
This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.
This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.
When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.
It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.
BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.
David
PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.
PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.
On 2/19/20 11:18 PM, Andrey Fedorov wrote:
The text was updated successfully, but these errors were encountered: