feat: Add an API to allow for querying events under a subscription with additional filtering #1452

SamMayWork · 2024-01-23T10:45:48Z

Adds a new fancy API: subscriptions/{subid}/events

Internally iterates through all of the events, finding those that match a subscription
Allows for normal query filtering against the filtered list of subscriptions

My current concern is that this query is an expensive operation because by necessity, it needs to be O(n) as we have to go through all of the events to find those that match the subscription type. Additionally, skip provided by the user does not help us here as we don't know how many of the subscribed events actually match our filter. If the provided filter does not match any events, we will iterate through all of the events in the DB.

Local benchmark for processing ~93,000 events in FireFly was ~30 seconds on this API with a subscription that matches no events (I.e. going through all of the events and running checks on them). An internal limit of <100 takes roughly 30-45 seconds to complete, higher figures tend more towards ~30 seconds. Increasing the internal limit past 200 does not substantially decrease the time required for the query.

Signed-off-by: SamMayWork <[email protected]>

codecov-commenter · 2024-01-23T13:23:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fd542c0) 99.99% compared to head (45fc854) 99.99%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1452   +/-   ##
=======================================
  Coverage   99.99%   99.99%           
=======================================
  Files         321      322    +1     
  Lines       23175    23270   +95     
=======================================
+ Hits        23173    23268   +95     
  Misses          1        1           
  Partials        1        1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: SamMayWork <[email protected]>

SamMayWork · 2024-01-23T14:37:00Z

WRT O(n) performance:

Since there is no existing store of events that match subscriptions, when we get a request, we must index all of the events to find which of those match the subscription that has been provided
A user-provided skip/limit does not significantly reduce the amount of processing that we internally have to do since we don't know how many records would actually match the subscription filter in order to be able to do a skip
The most expensive possible query is a filter that matches no records, as this would require us indexing the entire DB

To my knowledge the only way around these problems would be:

Make a change to keep a record of events that match subscriptions, and then use this a source of truth for this API, however we then get into challenges around storing this information like ensuring that sequencing is done properly
Implement Regex DB semantics at a lower level (however we'd still need to index a lot of records)

peterbroadhurst

This is great @SamMayWork - very helpful addition.
Returned with some thoughts on what i hope are small additions before it merges.

peterbroadhurst · 2024-01-23T19:53:47Z

internal/events/event_manager.go

@@ -303,3 +305,104 @@ func (em *eventManager) EnrichEvent(ctx context.Context, event *core.Event) (*co
 func (em *eventManager) QueueBatchRewind(batchID *fftypes.UUID) {
 	em.aggregator.queueBatchRewind(batchID)
 }
+
+func (em *eventManager) FilterEventsOnSubscription(events []*core.EnrichedEvent, subscription *core.Subscription) []*core.EnrichedEvent {


I see this function added, but I don't see the old code (which I assume was in a place that wasn't already in a function) removed. Can you help me with where you've moved this from?

Want to make sure we don't have a duplicate copy of the same filtering code.

The code was duplicated, I've pulled the code out and pushed it down now. The complexity here is that we've got 3 variations of a subscription and the existing filtering code that is being used here is on the event-delivery kind of subscription (which does not use the underlying fields).

In commit de6d46b, I've pushed the filtering code down into the internal event-delivery kind of subscription, and then parsed the inbound subscription to that type, and then performed the filtering, which now means we've got filtering being done in a single place.

peterbroadhurst · 2024-01-23T19:57:04Z

internal/orchestrator/subscriptions.go

+func (or *orchestrator) GetSubscriptionEventsHistorical(ctx context.Context, subscription *core.Subscription, filter ffapi.AndFilter) ([]*core.EnrichedEvent, *ffapi.FilterResult, error) {
+
+	// Internally we need to know the limit/count from the inbound filter
+	inboundFilterOptions, err := filter.Finalize()


Sorry @SamMayWork - I'm missing where we pass this filter into our DB filter.

I see ssFilter.And(), but it's not ssFilter.And(filter) - so the pre-filtering the user has supplied on which thing they're looking for in the andFilter seems to be missed.

Addressed in 33294f1 but will need a retest quickly.

Edit: Doing some basic filtering using the options seems to work as expected now! 🚀

peterbroadhurst · 2024-01-23T20:03:34Z

internal/orchestrator/subscriptions.go

+	internalSkip := 0
+	ssFilter.Limit(uint64(internalLimit))
+
+	for len(subscriptionFilteredEvents) < int(finalDesiredOffset) {


I think the easiest solution to the o(n) safety issue (which is a significant problem) is to have a server-configurable (not API configurable) limit on the max scan length for this.

I would suggest something like 1000 by default.

If the query reaches this limit before either:

Reaching the end of the table

Reaching the full limit specified

Then it returns with an error result. For example:

Event scan limit reached after matching X events and skipping Y events. Please restrict your query to a narrower range

It's far from a perfect solution, but in the vast majority of cases I think it gives a solution.
Either:

You have lots of events that match your subscription, so your page size will find 25 (or whatever) events in the last 1000 events

You are hunting for one specific event, so the pre-filter (discussed in my comment above) will sub-select to less than 1000 records. Either with a time-range, or with a spear-fishing approach (like looking for one specific event)

The implementation for these changes is split across commits de6d46b and 7e22a2c.

This approach means that:

Supplying a skip/limit in the initial query now applies specifically to the amount of unfiltered events that are processed on the backend and you cannot control how many results you get from the API*1

If the amount of records being indexed exceeds the total amount configured on the namespace, no results are returned to the user, and instead they get an error indicating that they need to refine their query

Crucially, this approach presents a new problem worth noting: how do I as the caller of the API, know how to adjust my skip/limit so that I can receive the next sequentially filtered event?

Suppose 1000 unfiltered events, with 2 matching events at position [0:1], and a matching event at [999]. My first query might be skip:0 limit:100 which would return me events 0 and 1, but how would I get event 2 which is at unfiltered index 999?

Currently the answer to this is: 'Keep adjusting the skip/limit until you find it' which is fair, but depending on the use-case could be a frustrating answer. For us being able to supply information to make this easier for users is difficult because we as FireFly don't have the intelligence required to know where that index is.

*1 - The skip/limit allows the user to configure how many unfiltered events are indexed, and then returns all of the matching events within that bound, we then assume that the user will then limit this down to be the number of events they want to search on their side.

Signed-off-by: SamMayWork <[email protected]>

SamMayWork · 2024-01-25T17:06:30Z

Spotted a problem during testing that might be a bit tricky to solve. Given there are likely to be X number of events on the chain where X is a high number, being restricted to skipping 1000 records (as is the default API behaviour) is super restrictive. There's an entirely valid use-case for skip 90,000 records and then return me 200 records matching my filter but this would not work as of right now.

I've pushed changes in this commit 52c191e which allows for a custom (and ✨ configurable ✨) number of events to be skipped, buuuuuut this doesn't play nicely with the swagger generation, so the generated API documentation isn't updated to reflect the new value. This means there's a disconnect right now between what our backend supports and what we say it supports in the swagger.

Needs more thinking for a workaround...

EDIT: Commit 80ddde2 has a nifty but imperfect workaround...

Nifty: After the swagger is generated by the core libraries, we grab the params and overwrite the description on the skip field with our own description with the actually correct value for the maximum supported sip

Imperfect: If another API has the same issue at this API, they'll need to duplicate the code I've added to change the values on their API. What would be great would be if the values for skip/limit were configurable per route and then that got read in through to the Swagger get, but this is likely a larger item of work.

Signed-off-by: SamMayWork <[email protected]>

SamMayWork · 2024-02-07T10:00:48Z

In discussion w/ @nguyer, we've landed on some further changes for this PR given the fact that we're concerned about the usefulness of this API if we don't solve the N+1 events problem. We've agreed on the following changes:

a starting/end sequence ID can be provided which determines where we start/stop indexing from
a limit of matching records can be provided which will takes precedence over the end sequence ID

This means that assuming that you hit the limit of matching events before you hit the end of the sequence range, you can get the next event in the sequence by calling the API again and providing the starting sequence ID to be the sequence ID of the last record. The hope is that the DB should be way faster indexing by sequence ID so performance should be come significantly less of an issue.

These changes semantically change how we're using some of the values so in the new world this is what we have:

startsequence: Start ID to index from
endsequence: End ID to index from
skip: Unused (we'd expect changes to be made using startsequence)
limit Limit for total count of FILTERED records

EDIT: Will also need to make the new configuration options make sense with the changes listed above, probably going to remove both the new options and have a configurable limit on the difference between the start and end sequence IDs.

Changes on the way 🚀

Signed-off-by: SamMayWork <[email protected]>

SamMayWork · 2024-02-08T14:11:52Z

internal/database/sqlcommon/event_sql.go

+	cols := append([]string{}, eventColumns...)
+	cols = append(cols, s.SequenceColumn())
+
+	query := sq.Select(cols...).FromSelect(sq.Select(cols...).From(eventsTable).Where(sq.GtOrEq{


Calling this out as something potentially worth of discussion, initially I wanted to use the BETWEEN operator referencing the sequence ID but squirrel (the underlying SQL construction lib we're using) does not support building queries with BETWEEN.

The other option was to use a sub-query and then apply a limit and offset but this was not good from a performance perspective, so I've gone for a sub-query and a WHERE which should jump straight to the correct sequence ID. Comparing the use of OFFSET and this approach showed roughly a ~140ms difference in speed.

~440ms for a query using OFFSET, searching for 1000 records between sequence ID 100,000 and 101,000
~300ms for a query using WHERE, searching for 1000 records between sequence ID 100,000 and 101,000

In discussion w/ @nguyer there was no longer any point in using a sub query here (not that it made much sense in the first place) so this is swapped out for a way more performant query in b6cdd32.

Looking now at ~50ms for a response on my machine using postman and SQLite.

SamMayWork · 2024-02-08T14:12:21Z

Comment above about the SQL performance of this query syntax, but I think we're in a good place for a re-review now 🚀

Signed-off-by: SamMayWork <[email protected]>

nguyer · 2024-02-08T17:16:35Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (fd542c0) 99.99% compared to head (a08f80b) 99.94%.

Files Patch % Lines
internal/events/event_manager.go 50.00% 10 Missing ⚠️
Additional details and impacted files
☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.

Looks like codecov is just having trouble talking to GitHub. I confirmed by checking out this branch myself that it does not lower test coverage.

nguyer

Sorry for the all the back and forth on this. I think it's ending up in a really good spot though. One more question though: as the user of this API, if I don't know what the current sequence number (for the most recent event) is, how do I start using this API?

I get how to use it once we have a sequence number as a reference and we can go higher or lower from there. I think most of the time users are going to be interested in the most recent n events though and don't know where to start from.

SamMayWork · 2024-02-09T09:50:11Z

I get how to use it once we have a sequence number as a reference and we can go higher or lower from there. I think most of the time users are going to be interested in the most recent n events though and don't know where to start from.

Some initial thoughts on this here:

The default behaviour could be that if no values are provided, we go to the most recent event and show the last X records
We could return another value in the response object which contains the most recent event ID

I think option 2 here is just bad since we're going to need to make another query to the DB to find out the current highest index and then tag this onto a request that the user might not want to make. I.e. If I was a user, and wanted the most recent event I'd need to make a request I don't care about first to get the index so I can make a request I care about - that doesn't make sense.

Option 1 is the way forward, and it's consistent with the /events API which this API is more or less the same as. With that in mind, I think this becomes the behaviour of the API with respect to defaults for start/end sequence IDs.

Start Sequence ID provided (SS)	End Sequence ID provided (ES)	Calc(startsequence)	Calc(endsequence)
Y	Y	SS	ES
N	Y	ES-1000 (*1)	ES
Y	N	SS	SS+1000 (*1)
N	N	Most recent record sequence ID - 1000	Most recent record sequence ID

So one little wrinkle though is making sure that the Swagger reflects the behaviour and that the user is informed that if they don't provide a value, they'll get the most recent events. The /events API doesn't have this problem since start/end sequence aren't explicitly defined.

(*1) - Default max range is 1000 records, through configuration this value could be changed

EDIT: I think this should be addressed now in 0395957

Signed-off-by: SamMayWork <[email protected]>

SamMayWork · 2024-02-12T09:26:41Z

Right, so it looks like there's a flaky e2e at the moment which I've seen pass and fail on different commits on this branch, I'll dig into it here for a little bit, but I'd be surprised if the changes I have made would be causing this test to fail given the test is failing during multi-party set up, initial look suggests that one of the nodes is failing to register its org.

EDIT: Spent some time looking at this, and it looks like the tests are a bit flaky on main too, I think for now we should re-run the failing E2E on this PR, and it should work. Probably worth a separate item to look into the E2E failures.

Note: Could not get the E2E's running on my M1 mac, looking in the Dockerfile there's a lot of AMD64 stuff, could spend some cycles looking into this if required but not sure it's worth the time given we know this is a flaky test.

All requested changes have been addressed

SamMayWork added 10 commits January 19, 2024 14:45

feat: add new api route for querying events under subscription

bd63fc2

Signed-off-by: SamMayWork <[email protected]>

feat: wipro

016f954

Signed-off-by: SamMayWork <[email protected]>

test: coverage

9382bc5

Signed-off-by: SamMayWork <[email protected]>

test: coverage

6522970

Signed-off-by: SamMayWork <[email protected]>

test: coverage

b8d97f9

Signed-off-by: SamMayWork <[email protected]>

Merge branch 'main' of github.com:hyperledger/firefly into query-api

d9c58b2

fix: restore settings.json

229d47f

Signed-off-by: SamMayWork <[email protected]>

fix: restore settings.json

d00cc05

Signed-off-by: SamMayWork <[email protected]>

fix: restore settings.json

0a832dd

Signed-off-by: SamMayWork <[email protected]>

fix: tuning

998d18e

Signed-off-by: SamMayWork <[email protected]>

test: tidy tidy tidy

0d91729

Signed-off-by: SamMayWork <[email protected]>

SamMayWork marked this pull request as ready for review January 23, 2024 14:25

SamMayWork requested a review from a team as a code owner January 23, 2024 14:25

SamMayWork mentioned this pull request Jan 23, 2024

Query events by subscription filter #1443

Closed

peterbroadhurst previously requested changes Jan 23, 2024

View reviewed changes

SamMayWork added 5 commits January 25, 2024 11:36

refactor: push down the filtering code

de6d46b

Signed-off-by: SamMayWork <[email protected]>

refactor: configuration tuning for index max count

7e22a2c

Signed-off-by: SamMayWork <[email protected]>

refactor: do filtering properly, and test coverage

33294f1

Signed-off-by: SamMayWork <[email protected]>

fix: various small fixes

120b559

Signed-off-by: SamMayWork <[email protected]>

fix: make API queries less restrictive

52c191e

Signed-off-by: SamMayWork <[email protected]>

fix: change the swagger to reflect how the API works

80ddde2

Signed-off-by: SamMayWork <[email protected]>

SamMayWork requested a review from peterbroadhurst January 29, 2024 09:20

SamMayWork added 4 commits February 7, 2024 17:20

fix: start to work through review comments

9e6c0fd

Signed-off-by: SamMayWork <[email protected]>

refactor: tidy up

8e49a71

Signed-off-by: SamMayWork <[email protected]>

test: coverage

9733406

Signed-off-by: SamMayWork <[email protected]>

fix: make configuration options make sense in the new world

c5c7a7c

Signed-off-by: SamMayWork <[email protected]>

SamMayWork added 5 commits February 8, 2024 10:38

docs: field descriptions

68c6a55

Signed-off-by: SamMayWork <[email protected]>

fix: default values for sequence

41e165f

Signed-off-by: SamMayWork <[email protected]>

docs: reference

a08f80b

Signed-off-by: SamMayWork <[email protected]>

fix: performance tuning for SQL

26ee14d

Signed-off-by: SamMayWork <[email protected]>

test: coverage and shuffling

def007f

Signed-off-by: SamMayWork <[email protected]>

SamMayWork commented Feb 8, 2024

View reviewed changes

fix: literally make it 6x faster

b6cdd32

Signed-off-by: SamMayWork <[email protected]>

nguyer reviewed Feb 8, 2024

View reviewed changes

SamMayWork added 2 commits February 9, 2024 14:46

fix: make it so

0395957

Signed-off-by: SamMayWork <[email protected]>

fix: wiped away some pretty important functionality, oops.

45fc854

Signed-off-by: SamMayWork <[email protected]>

nguyer approved these changes Feb 12, 2024

View reviewed changes

nguyer merged commit 662feec into hyperledger:main Feb 12, 2024

feat: Add an API to allow for querying events under a subscription with additional filtering #1452

feat: Add an API to allow for querying events under a subscription with additional filtering #1452

Uh oh!

Conversation

SamMayWork commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SamMayWork commented Jan 23, 2024

Uh oh!

peterbroadhurst left a comment

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

SamMayWork Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Jan 23, 2024

Choose a reason for hiding this comment

Uh oh!

SamMayWork Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbroadhurst Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamMayWork Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

SamMayWork commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamMayWork commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamMayWork Feb 8, 2024

Choose a reason for hiding this comment

Uh oh!

SamMayWork Feb 8, 2024

Choose a reason for hiding this comment

Uh oh!

SamMayWork commented Feb 8, 2024

Uh oh!

nguyer commented Feb 8, 2024

Codecov Report

Uh oh!

nguyer left a comment

Choose a reason for hiding this comment

Uh oh!

SamMayWork commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamMayWork commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

SamMayWork commented Jan 23, 2024 •

edited

Loading

codecov-commenter commented Jan 23, 2024 •

edited

Loading

SamMayWork Jan 25, 2024 •

edited

Loading

peterbroadhurst Jan 23, 2024 •

edited

Loading

SamMayWork commented Jan 25, 2024 •

edited

Loading

SamMayWork commented Feb 7, 2024 •

edited

Loading

SamMayWork commented Feb 9, 2024 •

edited

Loading

SamMayWork commented Feb 12, 2024 •

edited

Loading