-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic/cv2 5050 text vectorization via presto #2030
Epic/cv2 5050 text vectorization via presto #2030
Conversation
* CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review
* CV2-5080 update request model alegre calls to use presto-based alegre querying * move to sync * update for bypassing async calls in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it all ready to be merged with develop
, Devin? 🤔
Sorry, no, I was opening this to test something - will set to draft! |
* Cv2 5082 article indexing to presto (#1994) * CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review * CV2-5086 second attempt on clean Smooch NLU to Presto branch * fix broken test stubs * fix typo brought over from previous PR * alias and rename per caio review * fix syntax * Mayyyyybe its the alias? * fix old reference * move another stale function reference * Replace with alias * symbolize aliased method names * Revert to proper function
* Cv2 5082 article indexing to presto (#1994) * CV2-5087 move Articles side effecting saves to to it via presto * CV2-5082 move article indexing to presto * resolve test errors * updates for broken tests * small tweak * set to sync * more fixes * rename function and revert request * add response suppression and move to specific path for side effecting requests * extend similar media to allow for temporary texts * fix broken test fixture * revert back to async * fix another test * fixes per PR review * fixes per PR review * more fixes after review * Cv2 5080 request model to presto (#2015) * CV2-5080 update request model alegre calls to use presto-based alegre querying * move to sync * update for bypassing async calls in tests * CV2-5085 move get_items_from_similar_text calls to use sync endpoint * review and resolve broken tests * update stub
* CV2-5084 update reindexing strategy to use singular requests for now * switch to async * mrege in latest changes on epic branch * update fixture * fix stub path * Update reindex_alegre_workspace.rb
* CV2-5081 switch text to presto-based querying * more test stub updates * fix more stubs * update stub and typo
…zation-via-presto
* Making sure that the way the feed Cluster.last_request_date field is calculated is the same as the ClusterTeam.last_request_date * Make sure that if a parent item is tagged, all child items are also included in the cluster, even if, individually, they are not tagged Reference: CV2-5331.
Context While looking into a tags issue I noticed a few things: - when we made a request with duplicate tags: - we got an error, so the job was retried - the tag was added twice to the FactCheck - the tag is added once to the ProjectMedia - when we made a request with a tag with a # - we got an error, so the job was retried - there seem to have been two errors related to this: - ActiveRecord::RecordInvalid: Tag already exists - ActiveRecord::RecordInvalid: Text has already been taken - the tag with the # is added to the FactCheck - the tag is not added to the ProjectMedia What was happening There are a few things happening at the same time: - Creation of a ProjectMedia with tags - Creation of a FactCheck with tags - Tags are an Object from the Tag Class for ProjectMedia, but are a simple array for FactCheck For the ProjectMedia: - It would create the tag, then it would try to create the same tag again, and then it would fail and retry again, and so on - This happen because we have validations in place for the Tag class For FactCheck: - It would just create the tags twice - Because we have no validations for the tags array TLDR: There were some issues in the tags clean-up before we create them for ProjectMedia and FactCheck, and there was a mismatch between them. How it was fixed - I created a helper to clean up the tags before creating them - We need to make sure tags are: - stripped - unique - don't have a prepending `#` - We use this helper both in FactCheck.rb and Tag.rb References: 5120 PR: 2054
* CV2-5005: limit ES date to updated fields only * CV2-5005: fix tests * CV2-5005: test coverage
This change tries to avoid a single case reported by Sentry: A race condition situation where the related object was still not fully persisted in the database when the job executed. Setting a longer retry interval which should avoid this case. Fixes: CV2-5459.
Adding a log line for the outgoing Smooch requests. This can help debugging some issues. Reference: CV2-5378.
I noticed this regression introduced by CV2-5451. Now that `confirmed_by` and `confirmed_at` are set for all relationships, even the ones created as confirmed matches, we have a regression here. In order to know if a report should be sent for an accepted suggestion, we can't rely solely on the existence of a value for `confirmed_by` or `confirmed_at`. We know that a suggestion will happen after the relationship was created, so, if `confirmed_at` happens before or at the same time as `created_at`, we know that this is not a suggestion that was accepted, but a relationship that was already created as confirmed match. Reference: CV2-5451.
…ct (#2083) * CV2-5348: set retry_on_conflict to zero and pass id instead of the object * CV2-5348: skip blank obj * CV2-5348: apply PR comments
…oth link and long text (#2084) * CV2-5190: create a link and claim from tipline message that contaion link and long text * CV2-5190: handle link and short text * CV2-5190: fix tests * CV2-5190: fix CC
* Add Tagalog hardcoded strings * Bump rails-i18n
* Add Tagalog hardcoded strings * Bump rails-i18n * Bump rails-i18n again. Changed long date format instead of the default one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DGaffney , thanks for all the work here! Can you please make sure that we have tests that cover these lines (or confirm if these lines should be removed, or if the fact that these lines are not covered means that there is something wrong in the code paths):
app/models/concerns/alegre_v2.rb
: 41, 325, 326 and 327
Easy! The first line (41) is a TemporaryProjectMedia function that's just uncovered and I can add, and the 325-327 lines are because we don't check for an error in requests during deletes. Should be easy to mock, stand by! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DGaffney for completing the 100% code coverage - no many changes since my last review, so this time I just left a question for something I didn't think about before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @DGaffney
I left a small refactor suggestions, please check my comments.
Description
Epic branch roll-up of all the changes required to shift to presto-based text vectorization.
References: CV2-5050
How has this been tested?
Tested extensively locally and in integration tests.
Things to pay attention to during code review
Help me make sure we're not accidentally bypassing some important rule-based logic for matches!
Checklist