Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating CV Ontologies with the DataStore format #844

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Pushkar-Bhuse
Copy link
Collaborator

This PR fixes #840 .

Description of changes

This PR mainly includes fine-tuning CV ontologies to be used correctly with the DataStore setup. Once this was completed, we included test cases (both from the users point of view and also from the DataStore's perspective) to validate the working of these ontologies.

Possible influences of this PR.

After this PR, DataStore can also include CV ontologies.

Test Conducted

Apart from data store specific functions, we tested addition, deletion and updating annotations, compute_iou and check_overlap method. These tests were conducted on the following ontologies

  • ImageAnnotation
  • Region
  • Box
  • BoundingBox

@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #844 (db8a5d9) into master (8e03abe) will decrease coverage by 0.09%.
The diff coverage is 69.14%.

@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
- Coverage   80.75%   80.65%   -0.10%     
==========================================
  Files         254      254              
  Lines       19364    19684     +320     
==========================================
+ Hits        15637    15877     +240     
- Misses       3727     3807      +80     
Impacted Files Coverage Δ
forte/data/ontology/core.py 69.69% <32.55%> (-7.23%) ⬇️
forte/data/ontology/top.py 73.31% <53.40%> (-2.45%) ⬇️
tests/forte/image_annotation_test.py 99.22% <99.08%> (-0.78%) ⬇️
forte/common/constants.py 100.00% <100.00%> (ø)
forte/data/entry_converter.py 83.95% <100.00%> (+0.61%) ⬆️
tests/forte/data/data_store_test.py 95.75% <100.00%> (+0.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e03abe...db8a5d9. Read the comment docs.

@hepengfe
Copy link
Collaborator

Thanks for adding more tests for the Box ontology. There are more features and complete design for Box implemented in https://github.com/asyml/forte/pull/827/files
Could you pause this PR for a while and edit it after PR837 is merged?

tests/forte/data/data_store_test.py Outdated Show resolved Hide resolved
forte/data/ontology/top.py Outdated Show resolved Hide resolved
tests/forte/image_annotation_test.py Show resolved Hide resolved
"""
try:
self._image_payload_idx = self.pack.get_entry_raw(self.tid)[
PAYLOAD_INDEX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't use too many special const, we should simply store the payload details as attributes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Payload Index is the only main attribute for ImageAnnotation type entries (just like how Annotation has start and end indices, payload_index is stored in the first two indices of the DataStore entry. That's why I felt that could be the only constant we could add for ImageAnnotation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just try not to add additional consts here. DataStore has a system to add attributes and datastore can manage the attribute position. Can we simply consider Payload_index to be a regular attribute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Every Entry has two main attributes which occupy the first and second position in the datastore (for Annotation it is begin and end index). Similar, for ImageAnnotation and its subclasses, the two main attributes are payload_idx and None (Implying that it only has one main attribute, the extra None is just for consistency). That is the reason I stored the Payload index value in constants

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the None here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Payload Index is the only main attribute for ImageAnnotation type entries (just like how Annotation has start and end indices, payload_index is stored in the first two indices of the DataStore entry. That's why I felt that could be the only constant we could add for ImageAnnotation

Ok, so I think the reason I have the confusion is that: once Payload is in our system, then all other payload-dependent entries (specifically, Annotation, ImageAnnotation, AudioAnnotation) should have Payload associated with it too. Now we only modify it for ImageAnnotation in this PR, it feels a bit inconsistent to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I understand now. In that case, do you suggest we make the payload index a data class attribute? In this way we can have it in all three.

Also, None here has no special significance. The DataStore entry structure is that the first two elements are the two main attributes, followed by tid, entry type and data class attributes (if any). So in the case of ImageAnnotation, there is only one main attribute, so just to keep the structure consistent we have the second attribute as None. This is the method followed for other entries that have less than two main attributes as well (eg. Generics)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. So we should make all the three have a Payload field. I am not sure of what data class attribute means yet, but we need to make all of the consistent, but with two considerations:
    i. maybe it will make this PR too large?
    ii. we don't want to change any existing behavior, i.e., previously we don't have multiple payloads for each modality, thus all entries are retrieved without any payload index. To keep the same behavior, we should assume any data operation that does not use a payload index default to 0, i.e., get from the first payload or add to the first payload when payload index is not specified.
  2. Can you remind me what are two main attributes? I just wonder why it could be None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1(i) I feel that changing the structure for all entry types to include payload index would be out of the scope of this PR.
1(ii) I believe this behavior is already being displayed atleast in the ImageAnnotation type entries.

2 So this is a structure that is followed by all DataStore entries. The first two elements of the list are the two most common attributes of that entry (eg. begin and end index for Annotation). This is then followed by the remaining attributes like tid, type_name, etc. The structure is such that we need strictly TWO elements of common attributes in the beginning. But in the case of ImageAnnotation, there is only one such element (payload_index). Thus, to keep the structure consistent, we keep the second element as None.

Copy link
Member

@hunterhector hunterhector Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1(i) I feel that changing the structure for all entry types to include payload index would be out of the scope of this PR.

Sure, as long as we can ensure the payload index is handled consistently, and we should create issues to track this for the next step.

1(ii) I believe this behavior is already being displayed at least in the ImageAnnotation type entries.

Yeah this is more like a reminder.

2 So this is a structure that is followed by all DataStore entries. The first two elements of the list are the two most common attributes of that entry (eg. begin and end index for Annotation). This is then followed by the remaining attributes like tid, type_name, etc. The structure is such that we need strictly TWO elements of common attributes in the beginning. But in the case of ImageAnnotation, there is only one such element (payload_index). Thus, to keep the structure consistent, we keep the second element as None.

We actually can have different internal list structures for different entry types, I hope we don't have to strictly need two elements for each item.

forte/data/ontology/top.py Outdated Show resolved Hide resolved
forte/data/ontology/top.py Outdated Show resolved Hide resolved
forte/data/ontology/top.py Outdated Show resolved Hide resolved
forte/data/ontology/top.py Outdated Show resolved Hide resolved
tests/forte/data/data_store_test.py Outdated Show resolved Hide resolved
grid_width: int,
grid_cell_h_idx: int,
grid_cell_w_idx: int,
height: int = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only grid move to the class level? Note that it is not just about moving things to data store, but need to have a consistent approach too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that was the original implementation of BoundingBox. Do you feel that we should not create the Grid object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean you moved _grid_id to be a class variable. Do you understand how these class variable affect the attribute list?

I don't know what you mean by creeating the Grid object, but that should not be a problem for this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested, I am using the new implementation of CV ontologies (from @hepengfe 's branch) and making my changes on that so that merging is easier. I believe he has handled the Grid implementation there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, probably this PR is a bit intertwined with the Grid which makes me hard to judge the correctness right now.

Copy link
Collaborator

@mylibrar mylibrar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#827 is closed so the implementation should no longer be based on that PR. You can consider reverting or rebasing some of the commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate DataStore with CV ontologies
4 participants