Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline 🤝 replay ingestion changes #23395

Open
3 of 11 tasks
pauldambra opened this issue Jul 2, 2024 · 1 comment
Open
3 of 11 tasks

pipeline 🤝 replay ingestion changes #23395

pauldambra opened this issue Jul 2, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@pauldambra
Copy link
Member

pauldambra commented Jul 2, 2024

Feature request

pipeline and replay teams are going to trade time to improve heatmap_data and $exception event ingestion

## why change heatmap data ingestion?

  1. we get many heatmap_data items per event that carries them, so if we're under heavy load we automatically take a multiple of that load and it's hard to scale/react because the magnification is happening inside main event processing
  2. we want to make these changes without breaking main event ingestion. Also improves development speed by proxy.
  3. failure isolation, e.g. incident time we have more easier leavers we can pull
  4. not slowing down analytics ingestion
  5. cost we can optimize the independent clearly different work

TODO

  • move $heatmap_data from being a passenger on other events to on its own $$heatmap event @pauldambra
  • update ingestion runner to make sure heatmap data keeps flowing (if necessary) @pauldambra
  • create kafka topic for heatmap raw topic - team pipeline
  • add a new plugin-server role and deployment for heatmap (running dupe of historical ingestion code, i.e. analytics without overflow) - team pipeline
  • update capture-rs to send $$heatmap events are written to dedicated kafka topic (we won't be changing capture-py as it's going to die soon) @xvello
  • Update plugin-server code to optimize heatmap role so it only does validation and writing to the heatmap ingestion topic. It can't do any other processing (it can do no other processing no $set, no exports etc ( heatmaps are free so we'll keep processing cheap) - team pipeline
    - team look-up for token resolution & are heatmaps enabled or not
    - no PG (no persons, groups, ...)
    - no processEvent plugins
    - event written to a dedicated table
  • investigate writing one message to CH kafka topic which is exploded in the materialized view that ingests them instead of sending one kafka message per heatmap data item @pauldambra ??

why change $exception data ingestion

  1. we want to add more processing to these events, that will require changes to speed of processing, infra requirements, etc, we want to make these changes without breaking main event ingestion. Also improves development speed by proxy.
  2. failure isolation, e.g. incident time we have more easier leavers we can pull
  3. not slowing down analytics ingestion
  4. cost we can optimize the independent clearly different work

TODO

  • setup /i/v0/x ingestion route
  • make sure any $exception event can be configured to be sent to the new route @pauldambra
  • add topic, capture-rs, new plugin-server role & deployment todo's as above - team pipeline
  • Update plugin-server code to optimize exception role so it only does things it needs - team pipeline
    - only person lookup (no writes)
    - keep processEvent plugins
    - keep groups resolution and PoE

Debug info

No response

@pauldambra pauldambra added the enhancement New feature or request label Jul 2, 2024
@pauldambra
Copy link
Member Author

@tiina303 dumped my thoughts here since we've probably gone beyond slack

i'd be happy to discover i'm wrong so feel free to say what tasks i'm missing / or can delete / or shouldn't be trying to do etc etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant