Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IndirectArray(seg) #76

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add IndirectArray(seg) #76

wants to merge 1 commit into from

Conversation

timholy
Copy link
Member

@timholy timholy commented Sep 2, 2021

This makes it easy to visualize the results of segmentation. After this, we can replace the recommendation

imshow(map(i->segment_mean(segments,i), labels_map(segments)))

in the documentation with

imshow(IndirectArray(segments, segment_mean(segments)))

which should be easier to remember.

This makes it easy to visualize the results of segmentation.
Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure how I feel about this. I get your point but perhaps it's better to have a function (e.g., compose, synthesize) to generate an IndirectArray output instead of overriding the IndirectArray constructor.

The more concerning question (at least for me) is that I'm not even sure if the current SegmentedImage is the most appropriate struct; I was actually thinking about making it an array type a long time ago; it's quite intuitive to think of segmented images as pair of values and indices, so is it possible to remove segment_means and segmented_pixel_count from its field? Can these two properties be calculated only from segment_labels and image_indexmap? Or can we assume that the segmented image is always an IndirectArray?

@timholy
Copy link
Member Author

timholy commented Sep 3, 2021

it's quite intuitive to think of segmented images as pair of values and indices

But what do you choose for the values? The label index, an arbitrary color, or the mean color of all pixels assigned the same label? For visualization in particular, the last two are relevant (an arbitrary color emphasizes the identities of the segments, even when two segments have similar mean color). One goal here was to make it easy for the user to choose: IndirectArray(seg) uses the label index (assigning each an arbitrary color), and IndirectArray(seg, segment_means(seg)) assigns each the mean color.

Can these two properties be calculated only from segment_labels and image_indexmap?

Yes, we could do that, and it's not a crazy thought. But it's also worth understanding that they are generated by many algorithms during segmentation, so this is essentially a way of preserving your work. If you are, say, segmenting a 1TB electron microscopy connectomics data set, it might be a bit awkward to make another trip through a 1TB file just to calculate these (given that you calculated them once already).

@johnnychen94
Copy link
Member

johnnychen94 commented Sep 5, 2021

My main concern is that SegmentedImage is not a useful abstraction. Except for the image_indexmap, all other fields can be calculated by walking through the image or indexmap for a second round. Yes, there will be performance concerns, but that isn't a valid argument on why SegmentedImage should be constructed like this. What advantages does it bring to us by making such an abstraction rather than a simple Tuple or NamedTuple? Also, if we see the bool-valued array as a two-value array, then it's also an index map and then we don't have the issue when considering active contour #67 (comment).

I think we should separate the need for visualization from how we should represent a segmented image. A segmented image can also be represented as an array of pixels with coordinates and labels:

struct LabeledPixel{T<:Colorant,N}
    coord::Dims{N}
    value::T
    label::Int
end

# or
struct Pixel{T<:Colorant, N} <: AbstractVector{T}
    coord::Dims{N}
    value::T
end
struct LabeledPixel{T<:Pixel}
    value::T
    label::Int
end

IMHO, segment_labels, segment_means and segment_pixel_count should be considered as optional extra information.

Take LabeledPixel into account, we don't need to use IndirectArray to display it, so if we want to provide a convenient visualization tool for segmented results, we should provide a function instead of type constructor IndirectArray. My design on SLIC is to see segmentation as an analysis process (analyze) that breaks things down into parts and see visualization as a synthesis process (synthesize) that composes parts together.

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

Ah, now I think I understand your point better, and I agree it's a good one. I guess the best argument in favor of the current system is that it bundles together a number of different results. An example of a package that does something similar is Optim (which, I'll agree, is not the most beautifully-designed package...), for which results are returned in a giant structure for which many of the fields could be computed from others (e.g., minimum can be computed from f(minimizer)) and others could be returned as independent arguments.

The best argument against it is that, indeed, we don't really do anything with a SegmentedImage, and indeed we provide lots of accessors whose names are hard to remember as an attempt to avoid having people access the fields directly. This just feels like bloat.

Would you be happier if algorithms just returned the individual fields (those that they compute along the way)? For example, seeded_region_growing computes region_means and region_pix_count along the way (other than for the 0 label). It could return those in addition to image_indexmap.

With regards to how one labels an image, an array of integer labels is pretty common. Is your concern that it's disconnected from the array values? This is basically an array-of-structs vs struct-of-arrays issue. If you might use the image on its own again, there is no reason to push both into the same array, and the cost in memory is potentially significant. I guess I don't see an advantage in your specific proposals for representation/visualization. Some of what you're proposing seems to be aiming at the question of dense or sparse representations, but we can make that decision in the array type: segment_me!(dest, src, args...) would allow the user to pass in either an Array{Int} (for a dense representation of segment labels) or a Dict-backed array as in https://github.com/Jutho/SparseArrayKit.jl (for sparse representation). That's exactly how I designed my recent flood_fill! PR.

@johnnychen94
Copy link
Member

johnnychen94 commented Sep 5, 2021

Would you be happier if algorithms just returned the individual fields (those that they compute along the way)? For example, seeded_region_growing computes region_means and region_pix_count along the way (other than for the 0 label). It could return those in addition to image_indexmap.

Yes, I think this is how it should be; it should be sufficient to return a Tuple or NamedTuple if we want a bundled object. And if we do this then we don't need these two IndirectArray(::SegmentedImage) methods, we can have a convenient function to build IndirectArray (or maybe other array types) instead.

I guess I don't see an advantage in your specific proposals for representation/visualization. Some of what you're proposing seems to be aiming at the question of dense or sparse representations, but we can make that decision in the array type: segment_me!(dest, src, args...) would allow the user to pass in either an Array{Int} (for a dense representation of segment labels) or a Dict-backed array as in https://github.com/Jutho/SparseArrayKit.jl (for sparse representation). That's exactly how I designed my recent flood_fill! PR.

I'm still thinking about what's the "best" way to represent a sparse image (or an colorful object/surface in an euclidean space), the one I proposed in the comment is just one possibility that goes against the current SegmentedImage abstraction, I agree that it takes extra memories to store the data. Maybe a Dict-based array would be enough for the purpose.

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

For a multidmensional array, I am not aware of anything better than a Dict(index=>value). SparseMatrixCSC is a disaster for inserting new values; the only thing it's good for is matrix multiplication.

One good way to figure this out would be to think about what the documentation would look like given different options. I frequently find that if I don't like how the docs come out, it's a sign that there's something wrong with the API. My IndirectArray proposal is part of trying to ameliorate some of my unease, but a bigger change may be called for. With #74 we'll have to make a breaking release anyway.

@johnnychen94
Copy link
Member

johnnychen94 commented Sep 5, 2021

For a multidmensional array, I am not aware of anything better than a Dict(index=>value). SparseMatrixCSC is a disaster for inserting new values; the only thing it's good for is matrix multiplication.

This is off-topic and not in the scope of this package at all: if we relax from typical generic sparse representation and checking some image-specific formats, there might be more ideas. For instance, any graphic objects (balls, spheres, lines), meshes, quadtree, R-tree, or even constructing a SVG from image

One good way to figure this out would be to think about what the documentation would look like given different options. I frequently find that if I don't like how the docs come out, it's a sign that there's something wrong with the API.

How about just extending distinguishable_color? Maybe this is enough to save the typings?

distinguishable_color(labels::Vector) = Dict(zip(labels, distinguishable_colors(length(labels))))

I could imagine usage like the following and it's not hard to intuit or understand.

index_map, labels = some_algorithm(img)
IndirectArray(index_map, distinguishable_color(labels))

index_map, labels, mean_values = another_algorithm(img)
IndirectArray(index_map, mean_values)

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

That works for me too. If you want I can take a stab at this. I propose we get a new non-breaking release out the door, finishing #72, and then perhaps we can tackle the API changes. I'm willing to take the lead here unless you'd rather do it.

@johnnychen94
Copy link
Member

johnnychen94 commented Sep 5, 2021

I'll become super busy this semester so you can take the lead here. I plan to spend more of my free time on the JuliaImages code organization, documentation, test and benchmark systems...

Edit:
FWIW, I'm not sure if you're interested, I have some plans on designing a computational graphic system to replace our old ImageDraw, and the first step is johnnychen94/CoordinateSystems.jl#1, when that one is finished, I plan to attach color information and add AD support for it, and figure there might be something interesting when applied to image reconstruction or registration tasks... If this goes smoothly then it will be part of my Ph.D. thesis so this will definitely take a large amount of my time.

@johnnychen94
Copy link
Member

I know you're reaching for some new better status, I hope I don't block your development workflows by commenting here and there.

I propose we get a new non-breaking release out the door, finishing #72, and then perhaps we can tackle the API changes.

This sounds good to me.

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

No, I really appreciate and benefit from your perspective. It's great to discuss API with someone who has such a good sense of design!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants