Skip to content

Latest commit

 

History

History
211 lines (168 loc) · 8.34 KB

README.md

File metadata and controls

211 lines (168 loc) · 8.34 KB

Docker Swarm support for elasticsearch

Automatically configures elasticsearch to connect other nodes inside docker swarm cluster.

Prerequisites

Test and run

  • gitclone this repo and go to clone folder
  • run vagrant up then you will get a docker swarm environment
  • run vagrant ssh elastic0
    • sudo -u docker docker stack deploy --compose-file /vagrant/docker-prod-stack.yml elasticsearch
  • check the service: sudo -u docker docker service ls
  • go to http://192.168.100.20:5601/ to see kibana. user/pass is elastic/changeme.
  • remove the stack: sudo -u docker docker stack rm elasticsearch

Run nfs service version

  • gitclone this repo and go to clone folder
  • run vagrant up then you will get a docker swarm environment
  • run vagrant ssh elastic0
    • sudo -u docker docker stack deploy --compose-file /vagrant/docker-prod-nfs-stack.yml elasticsearch
  • check the service: sudo -u docker docker service ls
  • go to http://192.168.100.20:5601/ to see kibana. user/pass is elastic/changeme.
  • remove the stack: sudo -u docker docker stack rm elasticsearch
  • nfs server is built on 192.168.100.20:/var/esdata

Affects elasticsearch parameters:

  • network.host - an IP address of the container
  • network.publish_host - an IP address of the container
  • discovery.zen.ping.unicast.hosts - a list of IP addresses other nodes inside docker swarm service

In order to run docker swarm service from this image it is REQUIRED to set environment variable SERVICE_NAME to the name of service in docker swarm. Please avoid to manually configure parameters listed above.

Example:

docker network create --driver overlay --subnet 10.0.10.0/24 \
  --opt encrypted elastic_cluster

docker service create --name elasticsearch --network=elastic_cluster \
  --replicas 3 \
  --env SERVICE_NAME=elasticsearch \
  --env bootstrap.memory_lock=true \
  --env "ES_JAVA_OPTS=-Xms512m -Xmx512m -XX:-AssumeMP" \
  --publish 9200:9200 \
  --publish 9300:9300 \
  youngbe/docker-swarm-elasticsearch:5.5.0

docker service create --name kibana --network=elastic_cluster \
  --replicas 1 \
  --env ELASTICSEARCH_URL="http://192.168.100.20:9200" \
  --publish 5601:5601 \
  docker.elastic.co/kibana/kibana:5.5.0

After started, you can go to http://192.168.100.20:5601/ to see the kibana and connect to elasticsearch cluster http://192.168.100.20:9200.

Parameters

  • "-XX:-AssumeMP" : If you encountered "-XX:ParallelGCThreads=N" error and stop elasticsearch service, this is because some JavaSDK with -XX:+AssumeMP enabled by default. So, you should turn it off. Reference issue

  • "bootstrap.memory_lock=true" : Production mode need to lock memory to avoid elasticsearch swap to file. It's will cause performance issue. If you encountered memory lock issue in developing, set "bootstrap.memory_lock=false".

  • production mode: max_map_count and ulimit elasticsearch docker production mode requires:

    • vm.max_map_count=262144
    • elasticsearch using uid:gid 1000:1000
    • ulimit for /etc/security/limits.conf
      • nofile 65536 (open file)
      • nproc 65535 (process thread)
      • memlock unlimited (max memory lock)

You need to setup it BEFORE Docker service up. On CentOS7.0, you can reference the script: es-require-on-host.sh.

Since elasticsearch requires vm.max_map_count to be at least 262144 but docker service create does not support sysctl management you have to set vm.max_map_count on all your nodes to proper value BEFORE starting service. On Linux Ubuntu: sysctl -w "vm.max_map_count=262144". Or echo "vm.max_map_count=262144" >> /etc/sysctl.conf to set it permanently.

To access elasticsearch cluster connect to any docker swarm node to port 9200 using default credentials: curl http://elastic:[email protected]:9200.

To change default elasticsearch parameters use environment variables. See https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html for more details.

Search Chinese

  • run curl --user elastic:changeme -XPUT "http://192.168.100.20:9200/product" -H 'Content-Type: application/json' -d @./product_mapping.json;echo

  • run curl --user elastic:changeme -XPOST "http://192.168.100.20:9200/_bulk" -H 'Content-Type: application/json' --data-binary @./productsData.json;echo

  • in kibana devtool run

GET product/goods/_search
{
  "query": {
    "match": {
      "GOOD_NM": "單機身"
    }
  },
  "highlight" : {
    "pre_tags" : ["<tag1>", "<tag2>"],
    "post_tags" : ["</tag1>", "</tag2>"],
    "fields" : {
        "GOOD_NM" : {}
    }
  }
}
  • credit ik chinese search to medcl
update dictionary

after update the dictionary at the folder plugins/ik/config/custom/zhTW. Need to restart the plugins and rebuild index to get effects. Suggest use the index aliases to smoothly change the reindex process.

sudo bin/elasticsearch-plugin remove ik
sudo bin/elasticsearch-plugin install ik

Elasticsearch Design Priciples

It's search engine so keep "approximation" in mind.

It's a search engine not the sql engine. When you "select count(*)", MUST keep in mind "it is approximate". MUST read this article Document counts are approximate to avoid some stupid error.

In SQL world, it is very easy to use statement join. But, in elasticsearch no-sql world, you have to use the following four common techniques to handle relational data:

MUST read Handle relations and Designing for Scale

Date search in Elasticsearch

In Elasticsearch, the date format is ISO8601, which is datetime with time zone (yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm) eg. 2017-08-04T10:30:00+08:00. Elasticsearch provides very useful range search for the date. Must make sure your data mapping format is "date".

For exmaple:

PUT test/campaign/1
{
  "campaignID": 1,
  "startTime": "2017-08-04T10:30:00+08:00",
  "endTime": "2017-08-05T10:30:00+08:00"
}

When you check the mapping GET test/campaign/_mapping, you will get

{ 
  "test": {
    "mappings": {
      "campaign": {
        "properties": {
          "campaignID": {
            "type": "long"
          },
          "endTime": {
            "type": "date"
          },
          "startTime": {
            "type": "date"
          }
        }
      }
    }
  }
}

You can use the "range" and "bool" to get the current running campaign: For exmaple:

GET test/campaign/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "startTime": {
              "lte": "now" 
            }
          }
        },
        {
          "range": {
            "endTime": {
              "gte": "now"
            }
          }
        }
      ]
    }
  }
}

The detail information, please reference the