Adding cluster nodes, expanding storage - Elasticsearch

This is a copy/paste/censor of our wiki doc (that I wrote, and have permission to publish; the censoring might make stuff look off, sorry).  I left in the generic stuff because maybe we're doing something horribly wrong and a nice person will point it out.  Or maybe we're doing something that someone else will find inspiration in.  Who knows!  Gives some context anyways.  Hope this helps someone...

Overview of adding a new node to the ES cluster.
1.    Deploy the current CentOS template
2.    Assign an ip address from IPadmin
3.    Create the A record in AD DNS now
4.    Add a 500GB disk, located on one of the VMFS-ES-x datastores
5.    Change the networking:
§  /etc/sysconfig/network-scripts/ifcfg-eth0 (change IP)
§  /etc/sysconfig/network (change hostname)
6.    Run updates: yum update
7.    Reboot
ES Big Disk config
If VM is already in place
This will allow you to add the disk without rebooting:
echo "- - -" > /sys/class/scsi_host/host0/scan
echo "- - -" > /sys/class/scsi_host/host1/scan
echo "- - -" > /sys/class/scsi_host/host2/scan
Continue the disk setup
fdisk -l
fdisk /dev/sdb
***  n p 1 w
pvcreate /dev/sdb1
vgextend vg_name /dev/sdb1
lvcreate -L 490G -n lv_elasticsearch vg_name
vi /etc/fstab
*** copy the root line, change to elasticsearch
mkdir /elasticsearch
mkfs.ext4 -m 0 /dev/vg_name/lv_elasticsearch
mount -a
chown -R elasticsearch:elasticsearch /elasticsearch/
# If the node was already active...
rsync -va --progress /srv/elasticsearch/ /elasticsearch/

Elasticsearch setup
#Add the repo and install - note, all cluster nodes should have similar versions...
vi /etc/elasticsearch/elasticsearch.yml
*** /srv/elasticsearch/data
*** /elasticsearch/data
service elasticsearch start
tail -f /var/log/elasticsearch/site.elk.elasticsearch.log
# ES config site.elk.elasticsearch "ESNODE04" /elasticsearch/data
New node - add plugins
/usr/share/elasticsearch/bin/plugin -install karmi/elasticsearch-paramedic
/usr/share/elasticsearch/bin/plugin -install royrusso/elasticsearch-HQ
/usr/share/elasticsearch/bin/plugin -install lmenezes/elasticsearch-kopf
/usr/share/elasticsearch/bin/plugin -install lukas-vlcek/bigdesk
/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head

Set the heap size
vi /etc/sysconfig/elasticsearch
*** set the ES_HEAP_SIZE here

Almost there...
chkconfig elasticsearch on
service elasticsearch start
tail -f /var/log/elasticsearch/site.elk.elasticsearch.log
At this point, just watch for errors, it should join and be happy.
[2014-12-19 07:29:17,353][INFO ][node                     ] [ESNODE04] version[1.3.7], pid[2232], build[3042293/2014-12-16T13:59:32Z]
[2014-12-19 07:29:17,353][INFO ][node                     ] [ESNODE04] initializing ...
[2014-12-19 07:29:17,361][INFO ][plugins                  ] [ESNODE04] loaded [], sites [head, bigdesk, HQ, kopf, paramedic]
[2014-12-19 07:29:21,058][INFO ][node                     ] [ESNODE04] initialized
[2014-12-19 07:29:21,058][INFO ][node                     ] [ESNODE04] starting ...
[2014-12-19 07:29:21,232][INFO ][transport                ] [ESNODE04] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
[2014-12-19 07:29:21,268][INFO ][discovery                ] [ESNODE04] site.elk.elasticsearch/gz8pDlu9TTymibM1LToetA
[2014-12-19 07:29:24,590][INFO ][cluster.service          ] [ESNODE04] detected_master [ESNODE02][YDtom9MJSGiveG_MDS4QqQ][][inet[/]], added {[ESNODE03][1xEs3izRRxinKwCqMKjO8A][][inet[/]],[][0P3kPIyyR42pH-jypmaViw][][inet[/]]{client=true, data=false},[ESNODE02][YDtom9MJSGiveG_MDS4QqQ][][inet[/]],[ESNODE01][S9nv_0MEQMaHH2Jbw5jEGA][][inet[/]],}, reason: zen-disco-receive(from master [[ESNODE02][YDtom9MJSGiveG_MDS4QqQ][][inet[/]]])
[2014-12-19 07:29:24,931][INFO ][http                     ] [ESNODE04] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
[2014-12-19 07:29:24,931][INFO ][node                     ] [ESNODE04] started

Issues w. plugins loading
If you have timeouts loading plugins, but the cluster health is ok - it's most likely a client-side thing (i.e. cache). Had some problems with Chrome that magically cleared...
# Verify cluster health...


  1. As an update to this, we've just added our fifth ES node. If anyone is curious, our nodes are virtual machines - 8GB RAM (4g heap), 2vCPU, 500GB ES data disk.

    On my new set of goals:
    1. What is the best practice for index creation/data separation?
    2. What are the recommended optimization/cleanup scripts/tools that should be running? (beyond curator)


Post a Comment

Popular posts from this blog

DFSR - eventid 4312 - replication just won't work

Fixing duplicate SPNs (service principal name)

Logstash to Nagios - alerting based on Windows Event ID