|
| 1 | +--- |
| 2 | +title: "Synching data between two Elasticsearch clusters using a gateway" |
| 3 | +meta_title: "Synching data between two ES clusters" |
| 4 | +description: "Introducing the request replication feature of the NFINI Gateway." |
| 5 | +date: 2024-12-21T09:00:00Z |
| 6 | +image: "/images/posts/2024/synching-data-between-two-clusters-using-gateway/cover.jpg" |
| 7 | +categories: ["Elasticsearch", "Gateway"] |
| 8 | +author: "Frank" |
| 9 | +tags: ["Elasticsearch", "Gateway", "Search"] |
| 10 | +draft: false |
| 11 | +--- |
| 12 | +How would you sync data between two Elasticsearch clusters, regardless of their versions, if you wanted to? Would you make the application write data to both Elasticsearch clusters separately? Or would you use a message queue,the application can write data to the message queue, and then have consumers for each cluster that index the data into their respective Elasticsearch instances. |
| 13 | + |
| 14 | +Each approach has its own considerations in terms of data consistency, latency, and complexity. The choice of method would depend on your specific use case requirements and the trade-offs you are willing to make. |
| 15 | + |
| 16 | +## Introduction |
| 17 | +The INFINI Gateway serves as a reverse proxy for Elasticsearch clusters, offering a range of functionalities including traffic control, query result caching, request logging and analysis, as well as traffic replication. |
| 18 | + |
| 19 | +Traffic replication involves INFINI Gateway duplicating the incoming traffic to multiple clusters, which may consist of various versions of Elasticsearch or even OpenSearch, presenting an impressive capability. |
| 20 | + |
| 21 | +## Download & Deployment |
| 22 | + |
| 23 | +Choose the suitable installation package according to your operating system and platform. |
| 24 | +- [Download](https://release.infinilabs.com/gateway/stable/) |
| 25 | + |
| 26 | +Extract the tarball to the specified directory: |
| 27 | +```shell |
| 28 | +mkdir gateway |
| 29 | +tar -zxf xxx.gz -C gateway |
| 30 | +``` |
| 31 | +### Modify the configuration file |
| 32 | +Download the gateway configuration [here](https://github.com/infinilabs/testing/blob/main/setup/gateway/cases/replication/replication_via-disk.yml). By default, the gateway will load the configuration file gateway.yml. If you want to specify another configuration file, use the -config option. |
| 33 | + |
| 34 | +The gateway configuration file contains a wealth of information; here, we present the key sections. |
| 35 | +```shell |
| 36 | + #primary |
| 37 | + PRIMARY_ENDPOINT: http://192.168.56.3:7171 |
| 38 | + PRIMARY_USERNAME: elastic |
| 39 | + PRIMARY_PASSWORD: password |
| 40 | + PRIMARY_MAX_QPS_PER_NODE: 10000 |
| 41 | + PRIMARY_MAX_BYTES_PER_NODE: 104857600 #100MB/s |
| 42 | + PRIMARY_MAX_CONNECTION_PER_NODE: 200 |
| 43 | + PRIMARY_DISCOVERY_ENABLED: false |
| 44 | + PRIMARY_DISCOVERY_REFRESH_ENABLED: false |
| 45 | + #backup |
| 46 | + BACKUP_ENDPOINT: http://192.168.56.3:9200 |
| 47 | + BACKUP_USERNAME: admin |
| 48 | + BACKUP_PASSWORD: admin |
| 49 | + BACKUP_MAX_QPS_PER_NODE: 10000 |
| 50 | + BACKUP_MAX_BYTES_PER_NODE: 104857600 #100MB/s |
| 51 | + BACKUP_MAX_CONNECTION_PER_NODE: 200 |
| 52 | + BACKUP_DISCOVERY_ENABLED: false |
| 53 | + BACKUP_DISCOVERY_REFRESH_ENABLED: false |
| 54 | +``` |
| 55 | +PRIMARY_ENDPOINT: Specify the endpoint for the PROD cluster. |
| 56 | + |
| 57 | +BACKUP_ENDPOINT: Specify the endpoint for the BACKUP cluster. |
| 58 | + |
| 59 | +PRIMARY_USERNAME, PRIMARY_PASSWORD: The credentials required to access the PROD cluster. |
| 60 | + |
| 61 | +BACKUP_USERNAME, BACKUP_PASSWORD: The credentials required to access the BACKUP cluster. |
| 62 | + |
| 63 | +To initiate the Infini Gateway, just execute the gateway program as shown below. |
| 64 | +```shell |
| 65 | +./gateway-linux-amd64 |
| 66 | +``` |
| 67 | +The INFINI Gateway can operate in service mode; however, for simpler log monitoring, I ran it in the foreground. If your saved configuration file uses a different filename than the default, make sure to add the -config flag to indicate the configuration file. |
| 68 | + |
| 69 | +## Functional Testing |
| 70 | +Submit bulk requests to the gateway, which will subsequently write the data to two Elasticsearch clusters. |
| 71 | +```shell |
| 72 | +# INFINI Gateway's endpoint & credentials of PROD cluster |
| 73 | +curl -X POST "localhost:18000/_bulk?pretty" -H 'Content-Type: application/json' -uelastic:password -d' |
| 74 | +{ "index" : { "_index" : "test", "_id" : "1" } } |
| 75 | +{ "field1" : "value1" } |
| 76 | +{ "create" : { "_index" : "test", "_id" : "2" } } |
| 77 | +{ "field2" : "value2" } |
| 78 | +' |
| 79 | +``` |
| 80 | + |
| 81 | +Query data from the PROD cluster. |
| 82 | +```shell |
| 83 | +# Endpoint and credentials of the PROD cluster |
| 84 | +curl 192.168.56.3:7171/test/_search?pretty -uelastic:password |
| 85 | +``` |
| 86 | + |
| 87 | +Query data from the BACKUP cluster. |
| 88 | +```shell |
| 89 | +# Endpoint and credentials of the BACKUP cluster |
| 90 | +curl 192.168.56.3:9200/test/_search?pretty -uadmin:admin |
| 91 | +``` |
| 92 | + |
| 93 | +Query data from the INFINI Gateway. |
| 94 | +```shell |
| 95 | +# INFINI Gateway's endpoint & credentials of PROD cluster |
| 96 | +curl 192.168.56.3:18000/test/_search?pretty -uelastic:password |
| 97 | +``` |
| 98 | + |
| 99 | +Whether you are querying data from the PROD cluster or the BACKUP cluster, you will receive the same data. You also have the option to query data directly from the INFINI Gateway, which by default forwards search requests to the PROD cluster. |
| 100 | + |
| 101 | +In case the PROD cluster is not accessible, it will automatically redirect search requests to the BACKUP cluster. The gateway not only supports replicating write data requests but also supports delete and update data requests. |
| 102 | + |
| 103 | +If you want to synchronize the traffic from the BACKUP cluster to the PROD cluster, deploy another gateway that acts as a proxy for the BACKUP cluster. |
0 commit comments