-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating CV Ontologies with the DataStore format #844
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #844 +/- ##
==========================================
- Coverage 80.75% 80.65% -0.10%
==========================================
Files 254 254
Lines 19364 19684 +320
==========================================
+ Hits 15637 15877 +240
- Misses 3727 3807 +80
Continue to review full report at Codecov.
|
Thanks for adding more tests for the |
8b8d74d
to
9f4e022
Compare
forte/data/ontology/top.py
Outdated
""" | ||
try: | ||
self._image_payload_idx = self.pack.get_entry_raw(self.tid)[ | ||
PAYLOAD_INDEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't use too many special const, we should simply store the payload details as attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Payload Index is the only main attribute for ImageAnnotation
type entries (just like how Annotation
has start
and end
indices, payload_index
is stored in the first two indices of the DataStore
entry. That's why I felt that could be the only constant we could add for ImageAnnotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just try not to add additional consts here. DataStore has a system to add attributes and datastore can manage the attribute position. Can we simply consider Payload_index
to be a regular attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Every Entry
has two main attributes which occupy the first and second position in the datastore (for Annotation it is begin and end index). Similar, for ImageAnnotation
and its subclasses, the two main attributes are payload_idx and None (Implying that it only has one main attribute, the extra None is just for consistency). That is the reason I stored the Payload index value in constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Payload Index is the only main attribute for
ImageAnnotation
type entries (just like howAnnotation
hasstart
andend
indices,payload_index
is stored in the first two indices of theDataStore
entry. That's why I felt that could be the only constant we could add forImageAnnotation
Ok, so I think the reason I have the confusion is that: once Payload
is in our system, then all other payload-dependent entries (specifically, Annotation, ImageAnnotation, AudioAnnotation) should have Payload
associated with it too. Now we only modify it for ImageAnnotation
in this PR, it feels a bit inconsistent to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I understand now. In that case, do you suggest we make the payload index a data class attribute? In this way we can have it in all three.
Also, None
here has no special significance. The DataStore
entry structure is that the first two elements are the two main attributes, followed by tid
, entry type
and data class attributes (if any). So in the case of ImageAnnotation
, there is only one main attribute, so just to keep the structure consistent we have the second attribute as None
. This is the method followed for other entries that have less than two main attributes as well (eg. Generics
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- So we should make all the three have a
Payload
field. I am not sure of whatdata class attribute
means yet, but we need to make all of the consistent, but with two considerations:
i. maybe it will make this PR too large?
ii. we don't want to change any existing behavior, i.e., previously we don't have multiple payloads for each modality, thus all entries are retrieved without any payload index. To keep the same behavior, we should assume any data operation that does not use apayload index
default to 0, i.e., get from the first payload or add to the first payload when payload index is not specified. - Can you remind me what are
two main attributes
? I just wonder why it could beNone
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1(i) I feel that changing the structure for all entry types to include payload index would be out of the scope of this PR.
1(ii) I believe this behavior is already being displayed atleast in the ImageAnnotation
type entries.
2 So this is a structure that is followed by all DataStore
entries. The first two elements of the list are the two most common attributes of that entry (eg. begin
and end
index for Annotation
). This is then followed by the remaining attributes like tid
, type_name
, etc. The structure is such that we need strictly TWO elements of common attributes in the beginning. But in the case of ImageAnnotation
, there is only one such element (payload_index
). Thus, to keep the structure consistent, we keep the second element as None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1(i) I feel that changing the structure for all entry types to include payload index would be out of the scope of this PR.
Sure, as long as we can ensure the payload
index is handled consistently, and we should create issues to track this for the next step.
1(ii) I believe this behavior is already being displayed at least in the
ImageAnnotation
type entries.
Yeah this is more like a reminder.
2 So this is a structure that is followed by all
DataStore
entries. The first two elements of the list are the two most common attributes of that entry (eg.begin
andend
index forAnnotation
). This is then followed by the remaining attributes liketid
,type_name
, etc. The structure is such that we need strictly TWO elements of common attributes in the beginning. But in the case ofImageAnnotation
, there is only one such element (payload_index
). Thus, to keep the structure consistent, we keep the second element as None.
We actually can have different internal list structures for different entry types, I hope we don't have to strictly need two elements for each item.
forte/data/ontology/top.py
Outdated
grid_width: int, | ||
grid_cell_h_idx: int, | ||
grid_cell_w_idx: int, | ||
height: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only grid
move to the class level? Note that it is not just about moving things to data store, but need to have a consistent approach too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that was the original implementation of BoundingBox
. Do you feel that we should not create the Grid
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean you moved _grid_id
to be a class variable. Do you understand how these class variable affect the attribute list?
I don't know what you mean by creeating the Grid
object, but that should not be a problem for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you suggested, I am using the new implementation of CV ontologies (from @hepengfe 's branch) and making my changes on that so that merging is easier. I believe he has handled the Grid implementation there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, probably this PR is a bit intertwined with the Grid
which makes me hard to judge the correctness right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#827 is closed so the implementation should no longer be based on that PR. You can consider reverting or rebasing some of the commits.
This PR fixes #840 .
Description of changes
This PR mainly includes fine-tuning CV ontologies to be used correctly with the
DataStore
setup. Once this was completed, we included test cases (both from the users point of view and also from theDataStore
's perspective) to validate the working of these ontologies.Possible influences of this PR.
After this PR, DataStore can also include CV ontologies.
Test Conducted
Apart from data store specific functions, we tested
addition
,deletion
andupdating
annotations,compute_iou
andcheck_overlap
method. These tests were conducted on the following ontologiesImageAnnotation
Region
Box
BoundingBox