Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

SQL (join)

In this example, we'll count the occurrences of words in two topics. We will use the CLI interface to run some SQL queries and then use a JOIN query on a service to find the number of occurrences of a word in both topics.

Run DataFlow

With the dataflow.yaml file in the current directory, run the following commands:

sdf run

Test DataFlow

The sample data files used to run this test are in ./sample-data.

Produce the data to the source-a topic:

fluvio produce source-a -f ./sample-data/source-a.txt

Checkout the data in source-a topic:

fluvio consume source-a -Bd

Produce the data to the source-b topic:

fluvio produce source-b -f ./sample-data/source-b.txt

Checkout the data in source-b topic:

fluvio consume source-b -Bd

Run SQL commands

On the sdf runtime terminal, first enter the SQL mode:

>> sql

List the tables using show tables command:

sql >> show tables
shape: (2, 1)
┌──────────────────┐
│ name             │
│ ---              │
│ str              │
╞══════════════════╡
│ count_per_word_a │
│ count_per_word_b │
└──────────────────┘

Query the count_per_word_a table:

sql >> select * from count_per_word_a order by occurrences desc limit 6
shape: (6, 2)
┌──────┬─────────────┐
│ _key ┆ occurrences │
│ ---  ┆ ---         │
│ str  ┆ i32         │
╞══════╪═════════════╡
│ the  ┆ 16          │
│ of   ┆ 14          │
│ was  ┆ 13          │
│ it   ┆ 11          │
│ his  ┆ 7           │
│ in4           │
└──────┴─────────────┘

Perform a JOIN query using the count_per_word_a and count_per_word_b tables:

select a._key, a.occurrences as count_a, b.occurrences as count_b from count_per_word_a a join count_per_word_b b on a._key = b._key order by count_a desc limit 6
shape: (6, 3)
┌──────┬─────────┬─────────┐
│ _key ┆ count_a ┆ count_b │
│ ---  ┆ ---     ┆ ---     │
│ str  ┆ i32     ┆ i32     │
╞══════╪═════════╪═════════╡
│ the  ┆ 1613      │
│ of   ┆ 148       │
│ it   ┆ 112       │
│ his  ┆ 74       │
│ in42       │
│ to   ┆ 34       │
└──────┴─────────┴─────────┘

Exit the SQL mode:

sql >> .exit

After running the SQL commands, you can see the results of the JOIN query in the look-up-word service. The count_per_word_a and count_per_word_b tables are joined on the _key column to find the number of occurrences of a word in both topics.

Produce to probe-word topic to see the result:

fluvio produce probe-word -f ./sample-data/probe-word.txt

Checkout the data in probe-word topic:

fluvio consume probe-word -Bd

Check the output in the word-count topic:

$ fluvio consume word-count -Bd
"key: `the` count: 29"
"key: `in` count: 6"
"key: `into` count: 3"
"key: `love` count: 2"
"key: `fluvio` count: 0"

Clean-up

Exit sdf terminal and clean-up. The --force flag removes the topics:

sdf clean --force