Skip to content

Commit e3a21d1

Browse files
authored
Add some docs (#93)
* Add some content * Link
1 parent b166a9d commit e3a21d1

File tree

10 files changed

+213
-42
lines changed

10 files changed

+213
-42
lines changed

README.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,36 @@
22

33
The Sentry Streaming Platform
44

5-
This repo contains two libraries: `sentry_streams` and `sentry_flink`.
5+
Sentry Streams is a distributed platform that, like most streaming platforms,
6+
is designed to handle real-time unbounded data streams.
67

7-
The first contain all the streaming api and an Arroyo based adapter to run
8-
the streaming applications on top of Arroyo.
8+
This is built primarily to allow the creation of Sentry ingestion pipelines
9+
though the api provided is fully independent from the Sentry product and can
10+
be used to build any streaming application.
911

10-
The second contains the Flink adapter to run streaming applications on
11-
Apache Flink. This part is in a separate library because, until we will not
12-
be able to make it run on python 3.13 and produce wheels for python 3.13,
13-
it will require Java to run even in the dev environment.
12+
The main features are:
1413

15-
## Quickstart
14+
- Kafka sources and multiple sinks. Ingestion pipeline take data from Kafka
15+
and write enriched data into multiple data stores.
1616

17-
We are going to run a streaming application on top of Arroyo.
17+
- Dataflow API support. This allows the creation of streaming application
18+
focusing on the application logic and pipeline topology rather than
19+
the underlying dataflow engine.
1820

19-
1. Run `make install-dev`
21+
- Support for stateful and stateless transformations. The state storage is
22+
provided by the platform rather than being part of the application.
2023

21-
2. Go to the sentry_streams directory
24+
- Distributed execution. The primitives used to build the application can
25+
be distributed on multiple nodes by configuration.
2226

23-
3. Activate the virtual environment: `source .venv/bin/activate`
27+
- Hide the Kafka details from the application. Like commit policy and topic
28+
partitioning.
2429

25-
4. Run one of the examples
30+
- Out of the box support for some streaming applications best practices:
31+
DLQ, monitoring, health checks, etc.
2632

27-
```
28-
python sentry_streams/runner.py \
29-
-n test \
30-
-b localhost:9092 \
31-
-a arroyo \
32-
sentry_streams/examples/transformer.py
33-
```
33+
- Support for Rust and Python applications.
3434

35-
This will start an Arroyo consumer that runs the streaming application defined
36-
in `sentry_streams/examples/transformer.py`.
35+
- Support for multiple runtimes.
3736

38-
there is a number of examples in the `sentry_streams/examples` directory.
37+
[Streams Documentation](https://getsentry.github.io/streams/)
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
Building a Pipeline
2+
===================
3+
4+
Pipelines are defined through a python DSL (more options will be provided) by
5+
chaining dataflow primitives.
6+
7+
Chaining primitives means sending a message from one operator to the following
8+
one.
9+
10+
Pipelines start with `StreamingSource` which represent a Kafka consumer. They
11+
can fork and broadcast messages to multiple branches. Each branch terminates
12+
with a Sink.
13+
14+
As of now only Python operations can be used. Soon we will have Rust as well.
15+
16+
Distribution is not visible at this level as it only defines the topology of
17+
the application, which is basically its business logic. The distribution is
18+
defined via the deployment descriptor so the operators can be distributed
19+
differently in different environments.
20+
21+
The DSL operators are in the `chain.py` module.
22+
23+
.. automodule:: sentry_streams.pipeline.chain
24+
:members:
25+
:undoc-members:
26+
:show-inheritance:

sentry_streams/docs/source/conf.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
import os
2+
import sys
3+
14
# Configuration file for the Sphinx documentation builder.
25
#
36
# For the full list of built-in configuration values, see the documentation:
@@ -11,10 +14,17 @@
1114
author = "blank"
1215
release = "0.1"
1316

17+
sys.path.insert(0, os.path.abspath("../.."))
18+
1419
# -- General configuration ---------------------------------------------------
1520
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
1621

17-
extensions = ["sphinxcontrib.mermaid"]
22+
extensions = [
23+
"sphinxcontrib.mermaid",
24+
"sphinx.ext.autodoc",
25+
]
26+
27+
always_document_param_types = True
1828

1929
templates_path = ["_templates"]
2030
exclude_patterns = ["build"]
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Runner Configuration
2+
========================
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Deploying on Kuberentes
2+
=================================

sentry_streams/docs/source/index.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,9 @@
66
.. toctree::
77
:maxdepth: 2
88

9+
what_for
910
architecture
11+
build_pipeline
12+
configure_pipeline
13+
runtime/arroyo
14+
deployment

sentry_streams/docs/source/intro.rst

Lines changed: 123 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,123 @@
1-
This is Sentry sterams
1+
Sentry Streams is a distributed platform that, like most streaming platforms,
2+
is designed to handle real-time unbounded data streams.
3+
4+
This is built primarily to allow the creation of Sentry ingestion pipelines
5+
though the api provided is fully independent from the Sentry product and can
6+
be used to build any streaming application.
7+
8+
The main features are:
9+
10+
* Kafka sources and multiple sinks. Ingestion pipeline take data from Kafka
11+
and write enriched data into multiple data stores.
12+
13+
* Dataflow API support. This allows the creation of streaming application
14+
focusing on the application logic and pipeline topology rather than
15+
the underlying dataflow engine.
16+
17+
* Support for stateful and stateless transformations. The state storage is
18+
provided by the platform rather than being part of the application.
19+
20+
* Distributed execution. The primitives used to build the application can
21+
be distributed on multiple nodes by configuration.
22+
23+
* Hide the Kafka details from the application. Like commit policy and topic
24+
partitioning.
25+
26+
* Out of the box support for some streaming applications best practices:
27+
DLQ, monitoring, health checks, etc.
28+
29+
* Support for Rust and Python applications.
30+
31+
* Support for multiple runtimes.
32+
33+
Design principles
34+
=================
35+
36+
This streaming platform, in the context of Sentry ingestion, is designed
37+
with a few principles in mind:
38+
39+
* Fully self service to speed up the time to reach production when building pipelines.
40+
* Abstract infrastructure aspect away (Kafka, delivery guarantees, schemas, scale, etc.) to improve stability and scale.
41+
* Opinionated in the abstractions provided to build ingestion to push for best practices and to hide the inner working of streaming applications.
42+
* Pipeline as a system for tuning, capacity management and architecture understanding
43+
44+
Getting Started
45+
=================
46+
47+
In order to build a streaming application and run it on top of the Sentry Arroyo
48+
runtime, follow these steps:
49+
50+
1. Run locally a Kafka broker.
51+
52+
2. Create a new Python project and a dev environment.
53+
54+
3. Import sentry streams
55+
56+
.. code-block::
57+
58+
pip install sentry_streams
59+
60+
61+
4. Create a new Pyhon module for your streaming application:
62+
63+
.. code-block:: python
64+
:linenos:
65+
66+
from json import JSONDecodeError, dumps, loads
67+
from typing import Any, Mapping, cast
68+
69+
from sentry_streams.pipeline import Filter, Map, streaming_source
70+
71+
def parse(msg: str) -> Mapping[str, Any]:
72+
try:
73+
parsed = loads(msg)
74+
except JSONDecodeError:
75+
return {"type": "invalid"}
76+
77+
return cast(Mapping[str, Any], parsed)
78+
79+
80+
def filter_not_event(msg: Mapping[str, Any]) -> bool:
81+
return bool(msg["type"] == "event")
82+
83+
pipeline = (
84+
streaming_source(
85+
name="myinput",
86+
stream_name="events",
87+
)
88+
.apply("mymap", Map(function=parse))
89+
.apply("myfilter", Filter(function=filter_not_event))
90+
.apply("serializer", Map(function=lambda msg: dumps(msg)))
91+
.sink(
92+
"myoutput",
93+
stream_name="transformed-events",
94+
)
95+
)
96+
97+
This is a simple pipeline that takes a stream of JSON messages, parses them,
98+
filters out the ones that are not events, and serializes them back to JSON
99+
and produces the result to another topic.
100+
101+
5. Run the pipeline
102+
103+
.. code-block::
104+
105+
python -m sentry_streams.runner \
106+
-n Batch \
107+
--broker localhost:9092 \
108+
--adapter arroyo \
109+
<YOUR PIELINE FILE>
110+
111+
112+
6. Produce events on the `events` topic and consume them from the `transformed-events` topic.
113+
114+
.. code-block::
115+
116+
echo '{"type": "event", "data": {"foo": "bar"}}' | kcat -b localhost:9092 -P -t events
117+
118+
.. code-block::
119+
120+
kcat -b localhost:9092 -G test transformed-events
121+
122+
123+
7. Look for more examples in the `sentry_streams/examples` folder of the repository.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Arroyo Runtime
2+
=================
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
The rationale
2+
===================

sentry_streams/sentry_streams/pipeline/chain.py

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -151,24 +151,25 @@ class ExtensibleChain(Chain):
151151
Other steps manage the pipeline topology: sink, broadcast, route.
152152
153153
Example:
154-
```
155-
pipeline = (
156-
streaming_source("myinput", "events") # Starts the pipeline
157-
.apply("transform1", Map(lambda msg: msg)) # Performs an operation
158-
.route( # Branches the pipeline
159-
"route_to_one",
160-
routing_function=routing_func,
161-
routes={
162-
Routes.ROUTE1: segment(name="route1") # Creates a branch
163-
.apply("transform2", Map(lambda msg: msg))
164-
.sink("myoutput1", "transformed-events-2"),
165-
Routes.ROUTE2: segment(name="route2")
166-
.apply("transform3", Map(lambda msg: msg))
167-
.sink("myoutput2", "transformed-events3"),
168-
},
154+
155+
.. code-block:: python
156+
157+
pipeline = streaming_source("myinput", "events") # Starts the pipeline
158+
.apply("transform1", Map(lambda msg: msg)) # Performs an operation
159+
.route( # Branches the pipeline
160+
"route_to_one",
161+
routing_function=routing_func,
162+
routes={
163+
Routes.ROUTE1: segment(name="route1") # Creates a branch
164+
.apply("transform2", Map(lambda msg: msg))
165+
.sink("myoutput1", "transformed-events-2"),
166+
Routes.ROUTE2: segment(name="route2")
167+
.apply("transform3", Map(lambda msg: msg))
168+
.sink("myoutput2", "transformed-events3"),
169+
}, \
170+
) \
169171
)
170-
)
171-
```
172+
172173
"""
173174

174175
def __init__(self, name: str) -> None:

0 commit comments

Comments
 (0)