Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiframe Conversion #34

Open
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

afshinmessiah
Copy link
Contributor

Dear Markus,
I've developed a new extensible conversion engine for HighDicom based David Clunie's PixelMed. This piece of code will be our meeting's main topic on July 9th.
Although it is not necessary, it would be wonderful if you take a look at the structure of the new conversion. This file also contains the UML class diagram of the new code to get a broader focus on it.

@afshinmessiah afshinmessiah changed the title Multiframe Conversion - JUST TO INFORM NOT FOR MERGE Multiframe Conversion - JUST TO INFORM NOT TO MERGE Jun 30, 2020
@hackermd hackermd self-assigned this Jul 1, 2020
@hackermd hackermd added the enhancement New feature or request label Jul 1, 2020
@hackermd
Copy link
Collaborator

hackermd commented Jul 1, 2020

Thanks @afshinmessiah! I had a quick look and will review more carefully before we meet.

In the meantime, as a first step I would suggest you look into the coding style section of the developer guide and make sure the code passes flake8 and mypy checks.

@fedorov
Copy link
Member

fedorov commented Jul 1, 2020

@hackermd the PR is not complete, and is known not to pass the checks. I suggested Afshin shares this incomplete PR with you so you can (if you like) look at the overall approach before the meeting.

@fedorov fedorov marked this pull request as draft July 1, 2020 17:11
Copy link
Collaborator

@hackermd hackermd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @afshinmessiah, thanks for your PR and apologies for my delay in responding. The code overall looks great. I have a few major and a couple of minor comments:

Major: related to implementation of Modules and Functional Group Macros (see comment on Abstract_MultiframeModuleAdder)

Minor: styling -> pythonic names and PEP 8 compliance

Please take a look at the individual comments and let me know what you think.

Before merging, we will also need to add unit tests for the legacy package and make sure the source code passes mypy and flake8 checks.

src/highdicom/legacy/sop.py Show resolved Hide resolved
@@ -89,25 +72,18 @@ def _convert_legacy_to_enhanced(
modalities.add(ds.Modality)
if len(series) > 1:
raise ValueError(
'All instances must belong to the same series.'
)
'All instances must belong to the same series.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave the previous style

        raise ValueError(
            'Long Message'
        )

instead of

        raise ValueError(
            'Long Message')

perframe_tags: Sequence[Tag],
shared_tags: Sequence[Tag],
multi_frame_output: Dataset = None):
self.ExcludedFromPerFrameTags = excluded_from_perframe_tags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow PEP 8 for attribute names using lower level characters and underscores instead of upper camel case: excluded_from_per_frame_tags instead of ExcludedFromPerFrameTags. The latter style is reserved for DICOM Attribute keywords.

from pydicom.valuerep import DT, TM, DA
if a.VR == 'DA' and type(a.value)==str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isinstance(a.value, str) instead of type(a.value) == str

tg = tag_for_keyword(kw)
if kw in src:
a = deepcopy(src[kw])
else:
a = DataElement(tg,
dictionary_VR(tg), default )
a = DataElement(tg, dictionary_VR(tg), default)
from pydicom.valuerep import DT, TM, DA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place import statements at top of module

# sf_sop_instance_uid = sf_datasets[0]
# mf_sop_instance_uid = LEGACY_ENHANCED_SOP_CLASS_UID_MAP[sf_sop_instance_uid]
# mf_sop_instance_uid = LEGACY_ENHANCED_SOP_CLASS_UID_MAP[
# sf_sop_instance_uid]
# iod_name = _SOP_CLASS_UID_IOD_KEY_MAP[mf_sop_instance_uid]
# modules = IOD_MODULE_MAP[iod_name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines

def _copy_attrib_if_present(self, src_ds:Dataset, dest_ds:Dataset,
src_kw_or_tg:str, dest_kw_or_tg:str=None,
check_not_to_be_perframe=True, check_not_to_be_empty=False):
def _copy_attrib_if_present(self, src_ds: Dataset, dest_ds: Dataset,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring to all methods, also "private" methods whose names start with an underscore.

self._copy_attrib_if_present(
ref_dataset, self.TargetDataset, a['keyword'],
check_not_to_be_perframe=check_not_to_be_perframe,
check_not_to_be_empty=check_not_to_be_empty)

@abstractmethod
def AddModule(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def AddModule(self):
def add_module(self) -> None:

self.ExcludedFromFunctionalGroupsTags = {
tag_for_keyword('SpecificCharacterSet'): False}

# ---------------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use spaces to logically separate blocks of lines of code

from abc import ABC, abstractmethod


class Abstract_MultiframeModuleAdder(ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with your sentiment that logical sets of attributes (modules and functional groups) should be handled in separate units/scopes and abstracted to improve readability of the code and facilitate testing.

However, I would differentiate between Modules and Functional Groups. Both are in principle just sets of attributes, but I would consider Functional Groups more reusable building blocks beyond the scope of a particular IOD when compared to Modules. Functional Groups generally have a narrower scope and it makes more sense in my opinion to instantiate them dynamically at run time and pass the resulting objects to highdicom API functions or methods. Modules on the other hand are conceptual building blocks of IODs and should in my opinion be abstracted via a specific SOPClass implementation. There are a couple of modules that are reusable across IODs. In particular modules for the Patient, Study, Series and Specimen IEs, which are handled on the SOPClass abstract base class. Most other modules differ between instance "types" (Image, Document, etc.) and should thus be handled by concrete classes derived from SOPClass.

Therefore, I would recommend the following:

  • Make units that represent Modules "private" (use names with leading underscore) to indicate to developers that their interface may be subject to change and is not considered part of the highdicom API. In addition, I think modules would be more elegantly handled using composition rather than inheritance (see for example implementation of SOPClass._copy_root_attributes_of_module).

  • Implement Functional Group Macros by inheriting from DataElementSequence. Have a look at the existing implementations in the content module (see for example content.PixelMeasuresSequence). This approach allows developers to instantiate a Functional Group Macro and include the constructed object into the PerFrameFunctionalGroupsSequence or SharedFunctionalGroupsSequence (which is shared by many IODs representing multi-frame images) in form of a list/sequence of pydicom.Dataset instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hackermd, I restructured the codes to apply your recommendations. Now there are a couple of key classes I'm going to use in order to convert the single frame.

  • There are classes FrameSet, FrameSetCollection responsible for detecting all instances that belong to a multi-frame image and then identify the shared/perframe attributes.
  • The class LegacyConvertedEnhanceImage inheriting SOPClass and is responsible for adding all necessary modules to multi-frame image.
  • There are PerframeFunctionalGroup/SharedFunctionalGroup classes inheriting DataElementSequence class which initialize sequences for multiframe images getting populated along the way.

@dclunie
Copy link

dclunie commented May 12, 2021

A "sorted set" is still a "set", not a "list".

Further, the order may need to be determined by sorting based on metadata (even if it is only by InstanceNumber within Series), not the the order in which the instances are supplied in the input, which may depend on how they are listed in the filesystem, which may vary between platforms.

@hackermd
Copy link
Collaborator

A "sorted set" is still a "set", not a "list".

Python doesn't have an ordered set and the interface of the built-in set differs significantly from that of a sequence, for example it cannot be indexed using the [] operator and contained elements must be hashable.
I am not saying the term "set" is wrong from an abstract type theory perspective, but potentially misleading to Python developers. Therefore, I would avoid the term "set" and Union[Sequence[Dataset], FrameSet].

We could instead use "unique" to emphasize that items of the sequence are not duplicated (see also numpy.unique).

Further, the order may need to be determined by sorting based on metadata (even if it is only by InstanceNumber within Series), not the the order in which the instances are supplied in the input, which may depend on how they are listed in the filesystem, which may vary between platforms.

Python provides several mechanisms for sorting sequences (lists), for example using sorted, which accepts a function as an argument to obtain an attribute for comparing items and determining their order. This could be readily applied to Sequence[Dataset].

@fedorov
Copy link
Member

fedorov commented May 12, 2021

I would still prefer the constructor to just accept Sequence[pydicom.dataset.Dataset] and avoid introducing a new type.

@afshinmessiah, can you implement this change, making the facilitating classes available as utility classes?

although they would need to be renamed, because "frame" has a specific meaning in DICOM and is very misleading in this context

Afshin was using those terms since he did not want to change the conventions used in Pixelmed. But I don't think this is worth an argument - can you suggest an alternative name that you like Markus, so Afshin can fix this?

@hackermd
Copy link
Collaborator

Afshin was using those terms since he did not want to change the conventions used in Pixelmed. But I don't think this is worth an argument - can you suggest an alternative name that you like Markus, so Afshin can fix this?

The main logic seems to be implemented in the FrameSet._find_per_frame_and_shared_tags() method, which updates the FrameSet._shared_tags and FrameSet._perframe_tags attributes. Could we just create a utility function (in highdicom.sr.utils) instead with the following signature?

def group_per_frame_and_shared_attributes(
    instances: Sequence[Dataset]
) -> Tuple[Set[BaseTag], Set[BaseTag]]:
    """Assign attributes to per-frame or shared functional groups.

    Parameters
    ----------
    instances: Sequence[pydicom.dataset.Dataset]
        DICOM SOP instances

    Returns
    -------
    Tuple[Set[pydicom.tag.BaseTag], Set[pydicom.tag.BaseTag]]
        Tags of attributes assigned to per-frame and shared functional groups

    """

I would prefer the use of keywords instead of tags, but I assume that this function should be capable of dealing with private attributes as well. Should it?

@afshinmessiah
Copy link
Contributor Author

afshinmessiah commented May 12, 2021

I would prefer the use of keywords instead of tags, but I assume that this function should be capable of dealing with private attributes as well. Should it?

It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code.
@hackermd, just pushed the new changes to code, moving the FrameSet and FrameSetCollection to highdicom.utils :)

@fedorov
Copy link
Member

fedorov commented May 12, 2021

I assume that this function should be capable of dealing with private attributes as well. Should it?

It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code.

@afshinmessiah why should that function NOT be capable to operate on private attributes, can you explain?

@afshinmessiah
Copy link
Contributor Author

I assume that this function should be capable of dealing with private attributes as well. Should it?

It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code.

@afshinmessiah why should that function NOT be capable to operate on private attributes, can you explain?

Well, let me rephrase: 'it is not necessary', according to the current logic of the code, both per-frame and shared attributes are not allowed to be private.

@hackermd
Copy link
Collaborator

In that case, I would prefer the function to have the following interface:

def group_per_frame_and_shared_attributes(
    instances: Sequence[Dataset]
) -> Tuple[Set[str], Set[str]]:
    """Assign attributes to per-frame or shared functional groups.

    Parameters
    ----------
    instances: Sequence[pydicom.dataset.Dataset]
        DICOM SOP instances

    Returns
    -------
    Tuple[Set[str], Set[str]]
        Keywords of attributes assigned to per-frame and shared functional groups

    """

The function be able to replace the FrameSet class.

@afshinmessiah
Copy link
Contributor Author

The main logic seems to be implemented in the FrameSet._find_per_frame_and_shared_tags() method, which updates the FrameSet._shared_tags and FrameSet._perframe_tags attributes. Could we just create a utility function (in highdicom.sr.utils) instead with the following signature?

@hackermd, I am more of a fan of objects and classes than of functions since they are easy to extend and well organized. Of course, we can rewrite any class and object in function form but I don't get this preference of function over the class.
In this specific case, there are other functionalities of FrameSet class that I use throughout the conversion process. I believe this is too much of a change for this single PR. I suggest we settle for the current FrameSet and FrameSetCollection classes since they are optional and users don't have to use them. I the future versions you can refactor the structure to better deal with future usecases.

@hackermd
Copy link
Collaborator

I suggest we settle for the current FrameSet and FrameSetCollection classes since they are optional and users don't have to use them. I the future versions you can refactor the structure to better deal with future use cases.

That will not be that easy if we make them part of the public API. That's why I suggested keeping them private and only using them internally in the constructors for now. We can later revisit if additional use cases come up.

@fedorov
Copy link
Member

fedorov commented May 12, 2021

@afshinmessiah I don't necessarily agree with Markus, but let's just go ahead with this suggestion - can you make those private and use them only internally in the constructors?

@afshinmessiah
Copy link
Contributor Author

@fedorov, I agree. I think it's better not to mess with those two classes and make them private, we can use a duplicate of those classes externally to handle our own needs over the course of conversion.

@afshinmessiah
Copy link
Contributor Author

@hackermd and @CPBridge, I made both FrameSet and FrameSetCollection classes private. Let me know if anything else is left.

Copy link
Collaborator

@CPBridge CPBridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @afshinmessiah ,

I managed to go through the rest of the code, and left a bunch of minor things to be addressed. Also please note my replies to a few of the unresolved previous threads.

Additionally, I notice that the tests occur entirely on synthetic data. Ideally there would be some tests that are on something closer to real world data. Fortunately, pydicom has a bunch of test files that we can use because they will already be available on any machine if pydicom is available.

from pydicom.data import get_testdata_file

# Each of these will return a path to a directory containing an MR series
directory = get_testdata_file('MR700')
directory = get_testdata_file('MR1')
directory = get_testdata_file('MR2')

# Each of these will return a path to a directory containing a CT series
directory = get_testdata_file('CT2')
directory = get_testdata_file('CT5N')

I think it would be a really good idea if we used these more realistic files to test the conversion process, and could be done quite quickly.

In my opinion after these are addressed, we should be ok to merge.

Parameters
----------
single_frame_list: List[pydicom.dataset.Dataset]
list of single frames that have equal distinguising attributes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
list of single frames that have equal distinguising attributes
list of single frames that have equal distinguishing attributes

ـDicomHelper.istag_group_length(ttag) and not
self._istag_excluded_from_perframe(ttag) and
ttag != tag_for_keyword('PixelData')):
elem = ds[ttag]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please make a note about this in a comment in the code or someone may try to simplify this in the future and get very confused. Also please change the loop to

for ttag in ds.keys():

since you are not using the elem yielded from the loop

@afshinmessiah

rough_shared[ttag] = [elem.value]
to_be_removed_from_shared = []
for ttag, v in rough_shared.items():
v = rough_shared[ttag]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, same as above, please replace the loop and leave a comment to avoid future problems

self._frame_sets: List[FrameSet] = []
frame_counts = []
frameset_counter = 0
while len(self.mixed_frames_copy) != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Please refactor this somehow, since the behaviour is very unclear and this will make the code hard to maintain: neither the name of _find_all_similar_to_first_datasets nor its docstring suggest that it will remove anything from the list. Probably the easiest thing to do would be to have the function find what to remove and then have the removal actually happen in this loop directly

distinguishing_tags_existing.append(tg)
else:
distinguishing_tags_missing.append(tg)
logger_msg = set()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please type hint with the full type: Set[str]

des.extend(src)

def _add_module_to_mf_pixel_data(self) -> None:
"""Copies/add`s a pixel_data` multiframe module to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Copies/add`s a pixel_data` multiframe module to
"""Copies/adds a `pixel_data` multiframe module to

Comment on lines 2182 to 2185
for i in range(0, len(self._legacy_datasets)):
if kw not in self._legacy_datasets[i]:
continue
pixel_data_a = self._legacy_datasets[i][kw]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid indexing in for loops if possible

Suggested change
for i in range(0, len(self._legacy_datasets)):
if kw not in self._legacy_datasets[i]:
continue
pixel_data_a = self._legacy_datasets[i][kw]
for legacy_ds in self._legacy_datasets:
if kw not in legacy_ds:
continue
pixel_data_a = legacy_ds[kw]

"""
out: tuple = tuple()
if 'SeriesNumber' in x:
out += (x['SeriesNumber'].value, )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think you can just use the simpler syntax here

Suggested change
out += (x['SeriesNumber'].value, )
out += (x.SeriesNumber, )

'ORIGINAL', 'PRIMARY', 'RECON', 'EMISSION']
tmp_dataset.PixelSpacing = [
self._pixel_spacing, self._pixel_spacing]
tmp_dataset.PatientName = 'John^Doe'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not go perpetuating misunderstandings of the DICOM name encoding scheme:

Suggested change
tmp_dataset.PatientName = 'John^Doe'
tmp_dataset.PatientName = 'Doe^John'

Comment on lines 315 to 316
assert sop._DicomHelper.isequal(v1.value, v2.value) is True
assert sop._DicomHelper.isequal(v1.value, v3.value) is False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mis-use of the python is keyword.

Suggested change
assert sop._DicomHelper.isequal(v1.value, v2.value) is True
assert sop._DicomHelper.isequal(v1.value, v3.value) is False
assert sop._DicomHelper.isequal(v1.value, v2.value)
assert not sop._DicomHelper.isequal(v1.value, v3.value)

@afshinmessiah
Copy link
Contributor Author

Thanks, @CPBridge for the comments. I tried to address almost all comments.

I think it would be a really good idea if we used these more realistic files to test the conversion process, and could be done quite quickly.

Well, the good thing about synthetic test cases is that I have control over the number of frames, framesets, and geometry while generating the data. I agree with you on the preference for real data over synthetic data. However, it is complicated to design test cases for real data (to address all modules being added through the conversion process) when the expected multi-frame is unknown. Especially when the whole structure of FrameSet, FrameSetCollection, and GeometryOfSlice classes are subject to change. I'd rather stick with current tests and for this PR since I'm short of time cleaning up the code a getting it ready to merge.

@CPBridge
Copy link
Collaborator

Thanks @afshinmessiah for addressing these issues so quickly. Understood regarding the tests - I will try to address this in a future PR

@fedorov
Copy link
Member

fedorov commented May 24, 2021

@hackermd as part of this PR we should add a statement to the README perhaps that this work was supported by IDC. Let me know if you agree, and I can add the corresponding text to the PR.

@hackermd
Copy link
Collaborator

@fedorov I agree. Please note that I recently added a line to the README (see here) to highlight that the development of the library has been supported by the IDC. Please feel free to either rephrase or add an additional statement.

@malaterre
Copy link
Contributor

@afshinmessiah Would it be possible to rebase your branch on git/master ? thanks

@phistep
Copy link

phistep commented Oct 30, 2023

Hello, can someone summarize the state of this issue? I came here from pydicom/pydicom#1705 (comment), since I want to get rid of our dcm4che dependency and use pure python tools for the jobs.

It seems to me that this feature is mostly done, some minor issues were being addressed and then the PR was abandoned. @afshinmessiah, do you have any intentions of continuing your work here? I might be interested in taking over and getting this PR merged. Do you have any estimate of how long it might take?

@CPBridge
Copy link
Collaborator

Hi @phistep. Thanks for reaching out about this. Ashfin contributed this PR some time ago but the core team (@hackermd and myself) had some disagreements about style and API design, and then he changed roles and could no longer dedicate time to it.

Since then it has always been my intention to pick this up again and get it merged because there are a lot of valuable changes here only a few minor sticking point in the way, but then there has always been something more urgent... However, now is as good a time as any I suppose. I will have to take some time to go back through the comments and remind myself of the points that need to be addressed. But from memory they are along these lines:

  • Unpythonic coding style in several places (grouping functions into classes unnecessarily etc)
  • Making the geometry related objects and code more general and putting into a separate module. This has only grown in importance as I plan to make some changes to seg soon that will require these changes. It may be best if I deal with this part.
  • Backwards compatibility with released versions of the library. This PR badly breaks backwards compatibility, which we would like to avoid without very good reason. We were hoping to find a way to enable the new logic without breaking API compatibility but couldn't agree on one. I would have to review things again to remember the specifics.

It is worth noting that the legacy conversion process in the released version of the library does work, but that this pull request contains more sophisticated sorting of frames and a few fixes.

If you are willing to help, that could certainly be very useful. Let me find some time in the next few days to review properly and follow up with some more detail

@fedorov
Copy link
Member

fedorov commented Oct 31, 2023

It would be great to have this integrated - I am not the one who wrote the code, but together with @dclunie we did supervise Afshin, so should be able to address at least some questions. Happy to meet to chat about this Chris!

@phistep
Copy link

phistep commented Nov 1, 2023

Okay, really cool! Then I will start reading the changeset in-depth and maybe get back to you with some questions and a roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants