Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datastore Integration #814

Merged
merged 36 commits into from
May 27, 2022
Merged

Datastore Integration #814

merged 36 commits into from
May 27, 2022

Conversation

mylibrar
Copy link
Collaborator

@mylibrar mylibrar commented May 27, 2022

This PR fixes #573 .

Description of changes

  • Update the _pending_entries mechanism
    • Remove BasePack.regret_creation and its corresponding test cases.
    • For DataPack, _pending_entries will store entry.tid instead of entry itself.
      • Update typing annotations on several related methods including add_entry, record_entry, _add_entry, __add_entry_with_check, etc.
      • TODO: After DataStore is integrated with MultiPack, we should only maintain a mapping from entry's tid to the corresponding component, i.e., Dict[int, Optional[str]].
  • Remove all the SortedList containers of entries in DataPack
    • Entries will be stored in DataPack._data_store
    • All the CRUD operations are directed to DataStore
    • Entry class object will be generated on the fly. EntryConverter is created to facilitate the conversion between entry data from DataStore and the class object.
  • Serialization
    • Add pack version checking for deserialization of DataPack
      • We skipped several unit tests for outdated serialization format, including stave_reader_test and entry_data_structures_test
    • Keep _onto_file_path and _dynamically_add_type in DataStore.__getstate__ to avoid failure in test cases.
  • FList/FDict
    • They will no longer use Pointer to resolve entries
    • Their payload will be list/dict reference from DataStore
  • Top level entries: Annotation, Group, Link, Generics, AudioAnnotation
    • All the attributes (including begin, end, parent, child, members, etc.) are now directed to DataStore
  • Some fixes in DataStore
    • all_entries() should generate entries in order, so now we should use co_iterator_annotation_like() for annotation-like entries.
    • get() should not required type_name to exist in __elements when include_sub_type is enabled if it's not Generics.

Possible influences of this PR.

The current integration is probably going to be unstable, hence it's only for 0.3 pre-release (0.3.0dev1). We will need to fix a lot of follow-up issues in 0.3 stable version and 0.3 interface clearance.

Test Conducted

The integration should successfully pass all the current CI tests. Will need more dedicated test cases in future (#808).

Suqi Sun added 30 commits April 19, 2022 21:25
@mylibrar mylibrar added this to the 0.3 pre-release (0.3.0dev1) milestone May 27, 2022
@mylibrar mylibrar self-assigned this May 27, 2022
@mylibrar mylibrar marked this pull request as draft May 27, 2022 07:16
@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

Merging #814 (b62d0f5) into master (e683cc1) will decrease coverage by 0.73%.
The diff coverage is 87.71%.

@@            Coverage Diff             @@
##           master     #814      +/-   ##
==========================================
- Coverage   81.31%   80.58%   -0.74%     
==========================================
  Files         250      250              
  Lines       19067    19153      +86     
==========================================
- Hits        15505    15434      -71     
- Misses       3562     3719     +157     
Impacted Files Coverage Δ
forte/data/container.py 77.50% <ø> (-3.90%) ⬇️
forte/processors/nlp/ner_predictor.py 18.00% <0.00%> (ø)
tests/forte/data/audio_annotation_test.py 99.18% <ø> (-0.01%) ⬇️
forte/data/ontology/core.py 75.61% <75.00%> (-8.33%) ⬇️
forte/data/ontology/top.py 77.58% <84.12%> (-7.90%) ⬇️
tests/forte/data/entry_data_structures_test.py 85.92% <85.71%> (-13.57%) ⬇️
forte/data/data_pack.py 84.49% <87.86%> (-2.67%) ⬇️
forte/data/base_pack.py 75.61% <100.00%> (-1.98%) ⬇️
forte/data/data_store.py 92.03% <100.00%> (+0.49%) ⬆️
forte/data/multi_pack.py 78.78% <100.00%> (-0.34%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e683cc1...b62d0f5. Read the comment docs.

@mylibrar mylibrar marked this pull request as ready for review May 27, 2022 18:44
Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set up the versions (https://github.com/asyml/forte/blob/master/forte/version.py#L23) to make sure Forte will detect that old serialization won't be supported.

@mylibrar mylibrar merged commit c2e6ca5 into master May 27, 2022
@mylibrar mylibrar deleted the datastore-datapack-integration branch June 29, 2022 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement entry_to_tuple and tuple_to_entry functions
2 participants