- Is the ElasticSearch server remotely accessible from the frontend servers?
- If it does not, the first thing to do is to bring back the connexion between the two.
- Is the ElasticSearch request valid?
- If it does not, maybe:
- we are doing the request on the wrong indice/alias?
- the query body is malformed?
- Does it yield results?
- If it does not
- but we should get some, then either the query is invalid or elasticsearch have an issue
- If it does
- but from the app point of view we don't get anything
- it's related to the app itself and we will have to look at the code
To verify the first 3 points in one shot, I started tcpdump on the ElasticSearch server using the command below:
tcpdump -A -nn -s 0 'tcp dst port 9200 and (((ip[2:2] - ((ip&0xf)<<2 -="" tcp="" xf0="">>2)) != 0)' -i eth12>
Note: don't forget to change the interface you want to listen on.
Looking at the result I discovered that the app was doing an elasticsearch query on a missing alias. That explains it! Once the alias created, the application feature was working again in production.
The final step was to setup jenkins (or rundeck) to run daily elastic/curator in order to refresh the alias otherwise :
docker run -it --rm bobrik/curator:3.5.1 --host "elasticsearch.domain.com" --port 80 alias --name plugin-xxx-log indices --prefix plugin-xxx-log- --prefix plugin-salesforce-log- --timestring %Y.%m --time-unit months
Now we will be proactively alerted if anything goes wrong. One less thing to worry about!
[Update] I now also use tcpflow to better display content (too bad it's not maintained anymore):
tcpdump -A -l -nn -s 0 'tcp port 9200' -i eth0 | tcpflow -c -e