Word Count is a dataflow for window processing that analyzes sentences and computes the three most frequently used words in 20-second intervals.
The dataflow uses the following primitives:
- window
- assign-timestamp
- assign-key
- update-state
- flush
- flat-map
- states
Take a look at the dataflow.yaml to get an idea of what we're doing.
Make sure to Install SDF and start a Fluvio cluster.
Use sdf
command line tool to run the dataflow:
Run the dataflow:
sdf run --ui
Use --ui
to open the Studio.
Produce sentences to in sentence
topic:
echo "behind every great man is a woman rolling her eyes" | fluvio produce sentence
echo "the eyes reflect what is in the heart and soul" | fluvio produce sentence
echo "keep your eyes on the stars and your feet on the ground" | fluvio produce sentence
echo "obstacles are those frightful things you see when you take your eyes off your goal" | fluvio produce sentence
Note:
The watermark closes the window on event. Hence, if you stop adding events before the watermark triggers, the window stays open, and nothing gets produced. This is the expected behavior as, in most cases, external entities will continuously produce data, and you won’t be left with an open window. To force the window to flush, produce one more record after the 20-second mark.
In subsequent releases, we’ll add an idle watermark trigger to cover manual testing.
Consume from most-used-words
topic:
fluvio consume most-used-words -B -O json
Depending when the 20 second window triggered, you'll see the top 3 words:
[
{
"count": 2,
"word": "is"
},
{
"count": 2,
"word": "the"
},
{
"count": 2,
"word": "eyes"
}
]
To watch how the window is gradually populated:
show state
Then show the window state:
show state word-processing-window/count-per-word/state
Note: The dataflow stops processing records when you close the intractive editor. To resume processing, run sdf run
again.
Congratulations! You've successfully built and run a dataflow!
Exit sdf
terminal and clean-up. The --force
flag removes the topics:
sdf clean --force
You may also start an http-source connector that continuously feed data to the sentence
topic. For instructions, checkout connectors.