Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Stream Firestore to BigQuery] Events stop streaming from firestore to bigquery, but fixed through extension update? #2198

Open
leighajarett opened this issue Oct 21, 2024 · 6 comments
Labels
type: bug Something isn't working

Comments

@leighajarett
Copy link

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.55

Steps to reproduce:

Several months ago the extension started randomly stopping streaming records into BigQuery. This seems to be nearly completely stopped until we upgrade the extension to a new version. We don't see any errors in the logs or anything. We have one version of the extension that streams into a non-partitioned table and one that streams into a partitioned table. This only seems to affect the partitioned table.

Expected result

Records continuously stream into BigQuery without interruption.

Actual result

Records are omitted from the BigQuery table until we upgrade the version.

@leighajarett leighajarett added the type: bug Something isn't working label Oct 21, 2024
@puf
Copy link

puf commented Oct 21, 2024

Hey folks, I'm working with @leighajarett on this problem. What we see is that the extension works fine for is for a while, and stops writing most events (our Firestore write volume is pretty constant). When we install a new version of the extension, it works again - until it stops later.

image

Any idea what could be going on to cause this, or even how we can troubleshoot it?

@pr-Mais
Copy link
Member

pr-Mais commented Oct 22, 2024

@puf does this chart represent exports count in BigQuery?

@leighajarett
Copy link
Author

It represents the number of events per day, its a count of the records in the table

@leighajarett
Copy link
Author

Just to add some more information here - we pinpointed a specific event that is missing from the bigquery table.

In the logs, we can see this error
Screenshot 2024-11-14 at 1 48 18 PM

We're wondering if things are timing out somewhere? Maybe from an overload of events?

@puf
Copy link

puf commented Nov 14, 2024

We (Leigha, myself and our team) have been analyzing a bit further, and these metrics from the Cloud Run task queue associated with one of our extension instances seems pretty conclusive:

CleanShot 2024-11-14 at 11 45 25@2x

In the top chart you can see that:

  • We're adding tasks (green line) at a rate of 4-6 million per time slot of 3 hours, which is about 500 per second.
  • Tasks are being processed (blue line) at a rate of 1.1 million per 3 hours, so about 100 per second.
  • Tasks are initially completed (orange line), but then quickly start failing all (purple line).

In the bottom chart you see the size of the task queue, which grows to 500 million, which is presumably its maximum. So... the queue is just not able to process the tasks that the extension is adding to it.

We've just changed the configuration of this queue to have a Max rate of 500/s (the maximum we can set) to see if that allows it to drain the backlog of tasks, but given the rate at which we're adding tasks that likely won't be enough for long.

We've also upgraded one of our instances of this extension to the new 0.1.56 version, and no longer see the same errors in our logs for that instance.

@puf
Copy link

puf commented Nov 19, 2024

Five days in, we're still seeing the events being streamed into BigQuery, so 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants