Cluster does not initialize, javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment

Hi all,

at the moment, I am trying to create a OpenDistro 1.7 ElasticSearch Cluster with 3 nodes. After testing with the demo certificates on a single node, I am using my own PKI for managing the node and client certificates.

On a single node server, everything is running fine.
In cluster mode, all nodes come up with the following error in high frequency:

[2020-05-19T14:48:15,794][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [xxxx.yyy.zzz.net] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

Note: I have read here that there is a know java issue and this message does not affect the operation… but for me, it does… (reference: https://opendistro.github.io/for-elasticsearch-docs/docs/troubleshoot/ )

In between, I can see that the master node election was not yet done:

[2020-05-19T15:09:00,369][WARN ][o.e.c.c.ClusterFormationFailureHelper] [xxxx.yyy.zzz.net] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [xxxx.yyy.zzz.net, yyyy.yyy.zzz.net, zzzz.yyy.zzz.net] to bootstrap a cluster: have discovered [{xxxx.yyy.zzz.net}{vpOXuYYNRkeqoMQ8kbv8cw}{F3vzvzkgSrqlMj_qhklEtw}{172.17.0.6}{172.17.0.6:29300}{dim}]; discovery will continue using [xx.aa.b.54:29300, xx.aa.b.55:29300, xx.aa.b.56:29300] from hosts providers and [{xxxx.yyy.zzz.net}{vpOXuYYNRkeqoMQ8kbv8cw}{F3vzvzkgSrqlMj_qhklEtw}{172.17.0.6}{172.17.0.6:29300}{dim}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

I am running 3 docker containers on 3 different VMs.

  • discovery.seed_hosts and cluster.initial_master_nodes are set to the 3 host names.
  • node.name is the FQDN of each server
  • transport.profiles.default.port is set to 29300

The Certificate chain seems to be fine since I can use securityadmin.sh and my client certificate.

When I do a TLS test connection with my node certificate, everything seems also be fine:

openssl s_client -connect xxx.yyy.zzz.net:29300 -cert ./xxx.yyy.zzz.net.crt.pem -key ./xxx.yyy.zzz.key.pem
CONNECTED(00000003)

=> no works
leaving out the client cert/key:
139939766322832:error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate:s3_pkt.c:1498:SSL alert number 42
=> fails (as expected)

I am using a JKS keystore and JKS truststore for OpenDistro.
Checking the stores with keytool, everything seems to be fine.
PKI has been created using SearchGuards PKI scripts.

opendistro_security.nodes_dn is also configured to the DNs of the node certs.

My “feeling” is that OpenDistro does not use the node certificate as a client certificate when trying to negiotiate with the other nodes?

Any help would be highly appreciated!

Thanks
Chris

Finally got that working now. Must have been one the following parameters that was not set right.

discovery.seed_hosts: "{{ ansible_play_hosts_all|join(',') }}" cluster.initial_master_nodes: "{{ ansible_play_hosts_all[0] }}" transport.profiles.default.port: "{{ group.elasticsearch.transport_port }}" transport.port: "{{ group.elasticsearch.transport_port }}" http.port: "{{ group.elasticsearch.rest_port }}" network.publish_host: "{{ ansible_eth0.ipv4.address }}"

Troubleshooting is really hard if only the following error occurs:

javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

Is there any way to debug/trace the Elastic node-to-node communication better?