Skip to content

Commit 65ccb8f

Browse files
authored
blog: synching-data-between-two-clusters-using-gateway (#3)
1 parent b9c5a3d commit 65ccb8f

File tree

8 files changed

+116
-0
lines changed

8 files changed

+116
-0
lines changed

assets/images/avatar/Frank.jpg

23.8 KB
Loading
Loading
Loading
Loading
Loading
Loading

content/english/authors/Frank.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: Frank
3+
4+
image: "/images/avatar/Frank.jpg"
5+
description: Solutions Architect
6+
social:
7+
- name: twitter
8+
icon: fa-brands fa-twitter
9+
link: https://x.com/yangfan_infini
10+
11+
---
12+
13+
With over a decade of experience in the financial industry, I am familiar with Linux, databases, and networking. Currently, I'm mainly engaged in Elasticsearch technical support.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
title: "Synching data between two Elasticsearch clusters using a gateway"
3+
meta_title: "Synching data between two ES clusters"
4+
description: "Introducing the request replication feature of the NFINI Gateway."
5+
date: 2024-12-21T09:00:00Z
6+
image: "/images/posts/2024/synching-data-between-two-clusters-using-gateway/cover.jpg"
7+
categories: ["Elasticsearch", "Gateway"]
8+
author: "Frank"
9+
tags: ["Elasticsearch", "Gateway", "Search"]
10+
draft: false
11+
---
12+
How would you sync data between two Elasticsearch clusters, regardless of their versions, if you wanted to? Would you make the application write data to both Elasticsearch clusters separately? Or would you use a message queue,the application can write data to the message queue, and then have consumers for each cluster that index the data into their respective Elasticsearch instances.
13+
14+
Each approach has its own considerations in terms of data consistency, latency, and complexity. The choice of method would depend on your specific use case requirements and the trade-offs you are willing to make.
15+
16+
## Introduction
17+
The INFINI Gateway serves as a reverse proxy for Elasticsearch clusters, offering a range of functionalities including traffic control, query result caching, request logging and analysis, as well as traffic replication.
18+
19+
Traffic replication involves INFINI Gateway duplicating the incoming traffic to multiple clusters, which may consist of various versions of Elasticsearch or even OpenSearch, presenting an impressive capability.
20+
21+
## Download & Deployment
22+
23+
Choose the suitable installation package according to your operating system and platform.
24+
- [Download](https://release.infinilabs.com/gateway/stable/)
25+
26+
Extract the tarball to the specified directory:
27+
```shell
28+
mkdir gateway
29+
tar -zxf xxx.gz -C gateway
30+
```
31+
### Modify the configuration file
32+
Download the gateway configuration [here](https://github.com/infinilabs/testing/blob/main/setup/gateway/cases/replication/replication_via-disk.yml). By default, the gateway will load the configuration file gateway.yml. If you want to specify another configuration file, use the -config option.
33+
34+
The gateway configuration file contains a wealth of information; here, we present the key sections.
35+
```shell
36+
#primary
37+
PRIMARY_ENDPOINT: http://192.168.56.3:7171
38+
PRIMARY_USERNAME: elastic
39+
PRIMARY_PASSWORD: password
40+
PRIMARY_MAX_QPS_PER_NODE: 10000
41+
PRIMARY_MAX_BYTES_PER_NODE: 104857600 #100MB/s
42+
PRIMARY_MAX_CONNECTION_PER_NODE: 200
43+
PRIMARY_DISCOVERY_ENABLED: false
44+
PRIMARY_DISCOVERY_REFRESH_ENABLED: false
45+
#backup
46+
BACKUP_ENDPOINT: http://192.168.56.3:9200
47+
BACKUP_USERNAME: admin
48+
BACKUP_PASSWORD: admin
49+
BACKUP_MAX_QPS_PER_NODE: 10000
50+
BACKUP_MAX_BYTES_PER_NODE: 104857600 #100MB/s
51+
BACKUP_MAX_CONNECTION_PER_NODE: 200
52+
BACKUP_DISCOVERY_ENABLED: false
53+
BACKUP_DISCOVERY_REFRESH_ENABLED: false
54+
```
55+
PRIMARY_ENDPOINT: Specify the endpoint for the PROD cluster.
56+
57+
BACKUP_ENDPOINT: Specify the endpoint for the BACKUP cluster.
58+
59+
PRIMARY_USERNAME, PRIMARY_PASSWORD: The credentials required to access the PROD cluster.
60+
61+
BACKUP_USERNAME, BACKUP_PASSWORD: The credentials required to access the BACKUP cluster.
62+
63+
To initiate the Infini Gateway, just execute the gateway program as shown below.
64+
```shell
65+
./gateway-linux-amd64
66+
```
67+
The INFINI Gateway can operate in service mode; however, for simpler log monitoring, I ran it in the foreground. If your saved configuration file uses a different filename than the default, make sure to add the -config flag to indicate the configuration file.
68+
69+
## Functional Testing
70+
Submit bulk requests to the gateway, which will subsequently write the data to two Elasticsearch clusters.
71+
```shell
72+
# INFINI Gateway's endpoint & credentials of PROD cluster
73+
curl -X POST "localhost:18000/_bulk?pretty" -H 'Content-Type: application/json' -uelastic:password -d'
74+
{ "index" : { "_index" : "test", "_id" : "1" } }
75+
{ "field1" : "value1" }
76+
{ "create" : { "_index" : "test", "_id" : "2" } }
77+
{ "field2" : "value2" }
78+
'
79+
```
80+
![synching-data-between-two-clusters](/images/posts/2024/synching-data-between-two-clusters-using-gateway/pic-1.jpg)
81+
Query data from the PROD cluster.
82+
```shell
83+
# Endpoint and credentials of the PROD cluster
84+
curl 192.168.56.3:7171/test/_search?pretty -uelastic:password
85+
```
86+
![synching-data-between-two-clusters](/images/posts/2024/synching-data-between-two-clusters-using-gateway/pic-2.jpg)
87+
Query data from the BACKUP cluster.
88+
```shell
89+
# Endpoint and credentials of the BACKUP cluster
90+
curl 192.168.56.3:9200/test/_search?pretty -uadmin:admin
91+
```
92+
![synching-data-between-two-clusters](/images/posts/2024/synching-data-between-two-clusters-using-gateway/pic-3.jpg)
93+
Query data from the INFINI Gateway.
94+
```shell
95+
# INFINI Gateway's endpoint & credentials of PROD cluster
96+
curl 192.168.56.3:18000/test/_search?pretty -uelastic:password
97+
```
98+
![synching-data-between-two-clusters](/images/posts/2024/synching-data-between-two-clusters-using-gateway/pic-4.jpg)
99+
Whether you are querying data from the PROD cluster or the BACKUP cluster, you will receive the same data. You also have the option to query data directly from the INFINI Gateway, which by default forwards search requests to the PROD cluster.
100+
101+
In case the PROD cluster is not accessible, it will automatically redirect search requests to the BACKUP cluster. The gateway not only supports replicating write data requests but also supports delete and update data requests.
102+
103+
If you want to synchronize the traffic from the BACKUP cluster to the PROD cluster, deploy another gateway that acts as a proxy for the BACKUP cluster.

0 commit comments

Comments
 (0)