Document the clustering #302

hanahmily · 2023-07-18T03:53:37Z

tldr

Meta nodes hold active nodes and shard mapping info.
Liaison nodes shard data based on real-time shard mapping.
Query nodes retrieve data from all active nodes without shard mapping info.
Update the CHANGES log.

Signed-off-by: Gao Hongtao <[email protected]>

codecov-commenter · 2023-07-18T03:58:41Z

Codecov Report

Merging #302 (e88d47a) into main (c06b5e1) will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #302      +/-   ##
==========================================
- Coverage   39.96%   39.94%   -0.02%     
==========================================
  Files         100      100              
  Lines       10868    10868              
==========================================
- Hits         4343     4341       -2     
- Misses       6107     6109       +2     
  Partials      418      418

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

docs/concept/clustering.md

lujiajing1126

lgtm

docs/concept/clustering.md

sollhui · 2023-07-18T05:12:27Z

docs/concept/clustering.md

+-----------------  -----------------  -----------------
+|  Data Node 1  |  |  Data Node 2  |  |  Data Node 3  |
+|  (Shard 1)    |  |  (Shard 2)    |  |  (Shard 3)    |
+-----------------  -----------------  -----------------


Instead, it delegates the task of replication to these underlying storage systems.

I think diagram is not complete, should we add more detail diagram about how to delegates the task of replication to these underlying storage systems?

~~Make sense. You could push a "suggestion" to update the text diagram.~~

Sorry. The data node doesn't necessarily have to be built on "shared" storage. It just requires a robust one in case of a potential single point of failure. Therioally, the whole architecture is shared-nothing instead of shared storage.

docs/concept/clustering.md

wu-sheng · 2023-07-18T07:09:57Z

The design is good enough for now.

I recommend to add principles/philosophy for why the arch looks like this.

Such as

Much lower query traffic compared with reading. So, all nodes oriented query works
With highly adopted cloud native and public cloud vendors tech stack, network add storage is reliable and popular, so replication counts on that.
.... More you discussed and confirmed

These are fundamentals of why we did these choices, which are more important. We could have countless optimizations with time and experiences, but these are rarely to be changed as they only close to our use cases, which is skywalking itself.

Signed-off-by: Gao Hongtao <[email protected]>

hanahmily · 2023-07-18T14:10:45Z

The design is good enough for now.

I recommend to add principles/philosophy for why the arch looks like this.

Such as

Much lower query traffic compared with reading. So, all nodes oriented query works

With highly adopted cloud native and public cloud vendors tech stack, network add storage is reliable and popular, so replication counts on that.
.... More you discussed and confirmed

These are fundamentals of why we did these choices, which are more important. We could have countless optimizations with time and experiences, but these are rarely to be changed as they only close to our use cases, which is skywalking itself.

done

hailin0 · 2023-07-18T14:36:36Z

LGTM

hailin0 · 2023-07-18T14:46:27Z

docs/concept/clustering.md

+
+### 6.1 Query Routing
+
+Query Nodes differ from Liaison Nodes in that they do not store shard mapping information from Meta Nodes. Instead, they access all Data Nodes to retrieve the necessary data for queries. As the query load is lower, it is practical for query nodes to access all data nodes for this purpose. It may increase network traffic, but simplifies scaling out of the cluster.


It may be necessary to explain the impact of the datanodes scale on the long tail of the query

It could. Side effects of this solution is there for sure. No matter we wrote or not.

This pattern is good at batching and aggregation. Some scenarios can leverage it:

Ad-hoc "TopN" is a classic aggregation operation. OAP's service and instance TopN metric query will benefit from this pattern.

"MultiGet" is a batching operation on several entities, which is natural to retrieve data from all data nodes.

A Skywalking UI's classic Dashboard will fetch several metrics belonging to an entity. The OAP generates several query operations to fetch data from DB. If it can combine them to issue a batch operation, BanyanDB will perform better than the one with several separate queries.

The scenarios serve as examples to illustrate the potential of "stateless query". As core contributors, we must possess a thorough understanding of this concept.

Document the clustering

7eb7f7a

Signed-off-by: Gao Hongtao <[email protected]>

hanahmily added the documentation Improvements or additions to documentation label Jul 18, 2023

hanahmily added this to the 0.5.0 milestone Jul 18, 2023

hanahmily requested review from lujiajing1126, wu-sheng and hailin0 July 18, 2023 03:53

lujiajing1126 requested changes Jul 18, 2023

View reviewed changes

docs/concept/clustering.md Outdated Show resolved Hide resolved

docs/concept/clustering.md Show resolved Hide resolved

hanahmily commented Jul 18, 2023

View reviewed changes

docs/concept/clustering.md Outdated Show resolved Hide resolved

Update docs/concept/clustering.md

ad95310

lujiajing1126 previously approved these changes Jul 18, 2023

View reviewed changes

sollhui reviewed Jul 18, 2023

View reviewed changes

docs/concept/clustering.md Show resolved Hide resolved

sollhui reviewed Jul 18, 2023

View reviewed changes

hanahmily commented Jul 18, 2023

View reviewed changes

docs/concept/clustering.md Outdated Show resolved Hide resolved

hanahmily commented Jul 18, 2023

View reviewed changes

docs/concept/clustering.md Show resolved Hide resolved

hanahmily dismissed lujiajing1126’s stale review via 4ae89a9 July 18, 2023 13:08

hanahmily added 3 commits July 18, 2023 21:08

Update docs/concept/clustering.md

4ae89a9

Merge branch 'main' into doc

f222f9e

Explain the rationale behind the chosen architectural strategies.

e88d47a

Signed-off-by: Gao Hongtao <[email protected]>

wu-sheng approved these changes Jul 18, 2023

View reviewed changes

wu-sheng merged commit 056624a into main Jul 18, 2023
13 checks passed

hailin0 reviewed Jul 18, 2023

View reviewed changes

hanahmily deleted the doc branch July 19, 2023 03:25

sollhui mentioned this pull request Jul 29, 2023

Add file system architecture doc #309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the clustering #302

Document the clustering #302

hanahmily commented Jul 18, 2023

codecov-commenter commented Jul 18, 2023 •

edited

Loading

lujiajing1126 left a comment

sollhui Jul 18, 2023

sollhui Jul 18, 2023 •

edited

Loading

hanahmily Jul 18, 2023 •

edited

Loading

hanahmily Jul 18, 2023

wu-sheng commented Jul 18, 2023

hanahmily commented Jul 18, 2023

hailin0 commented Jul 18, 2023

hailin0 Jul 18, 2023

wu-sheng Jul 18, 2023

hanahmily Jul 19, 2023 •

edited

Loading


		### 6.1 Query Routing

		Query Nodes differ from Liaison Nodes in that they do not store shard mapping information from Meta Nodes. Instead, they access all Data Nodes to retrieve the necessary data for queries. As the query load is lower, it is practical for query nodes to access all data nodes for this purpose. It may increase network traffic, but simplifies scaling out of the cluster.

Document the clustering #302

Document the clustering #302

Conversation

hanahmily commented Jul 18, 2023

tldr

codecov-commenter commented Jul 18, 2023 • edited Loading

Codecov Report

lujiajing1126 left a comment

Choose a reason for hiding this comment

sollhui Jul 18, 2023

Choose a reason for hiding this comment

sollhui Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

hanahmily Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

hanahmily Jul 18, 2023

Choose a reason for hiding this comment

wu-sheng commented Jul 18, 2023

hanahmily commented Jul 18, 2023

hailin0 commented Jul 18, 2023

hailin0 Jul 18, 2023

Choose a reason for hiding this comment

wu-sheng Jul 18, 2023

Choose a reason for hiding this comment

hanahmily Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Jul 18, 2023 •

edited

Loading

sollhui Jul 18, 2023 •

edited

Loading

hanahmily Jul 18, 2023 •

edited

Loading

hanahmily Jul 19, 2023 •

edited

Loading