We had the same situation. The ES version is 7.10.2.
The master node show “data node node-left”, but the data node show “master not discovered yet”
Server layout
172.16.22.153 esnode1
172.16.22.154 esnode2
172.16.22.155 esnode3
172.16.22.190 esnode4
172.16.22.191 esnode5
172.16.22.192 esnode6
172.16.22.193 esnode7
172.16.22.194 esnode8
172.16.22.195 esnode9
Related logs:
Master node(esnode1)
[2021-03-25T13:58:31,547][INFO ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [10s] publication of cluster state version [4502] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:58:51,549][WARN ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [30s] publication of cluster state version [4502] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:58:51,552][INFO ][o.e.c.r.a.AllocationService] [esnode1] updating number_of_replicas to [4] for indices [.opendistro_security]
[2021-03-25T13:58:51,556][INFO ][o.e.c.s.MasterService ] [esnode1] node-left[{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir} reason: followers check retry count exceeded], term: 316, version: 4503, delta: removed {{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir}}
[2021-03-25T13:59:01,558][INFO ][o.e.c.c.C.CoordinatorPublication] [esnode1] after [10s] publication of cluster state version [4503] is still waiting for {esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode6}{MsHrFuhtR2yp0JGSRsqS5w}{lA8_OIWuQm2gZxdXsEDEdA}{172.16.22.192}{172.16.22.192:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode7}{Yj-61cgOQ–50f8cila67Q}{BSFsIQiSSKSPFUCQ5g65dQ}{172.16.22.193}{172.16.22.193:9300}{dir} [SENT_PUBLISH_REQUEST], {esnode4}{1jH5VI7PQbuLNVWtxhvw8Q}{RpBC6FawS82nBTCjvkh9LQ}{172.16.22.190}{172.16.22.190:9300}{dir} [SENT_PUBLISH_REQUEST]
[2021-03-25T13:59:21,558][INFO ][o.e.c.s.ClusterApplierService] [esnode1] removed {{esnode8}{HeEjBS5JSCSYeP2zr2MPWA}{nHAb6LFYT6-YjtXO72CO2g}{172.16.22.194}{172.16.22.194:9300}{dir}}, term: 316, version: 4503, reason: Publication{term=316, version=4503}
Data node(esnode5)
[2021-03-25T14:00:24,436][INFO ][o.e.c.c.Coordinator ] [esnode5] master node [{esnode1}{raDLHjOiTYaY_5ckIjnLVA}{VlAg-gG5Q72y0KTORWm-uQ}{172.16.22.153}{172.16.22.153:9300}{imr}] failed, restarting discovery
org.elasticsearch.ElasticsearchException: node [{esnode1}{raDLHjOiTYaY_5ckIjnLVA}{VlAg-gG5Q72y0KTORWm-uQ}{172.16.22.153}{172.16.22.153:9300}{imr}] failed [3] consecutive checks
at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:293) ~[elasticsearch-7.10.2.jar:7.10.2]
Caused by: org.elasticsearch.transport.RemoteTransportException: [esnode1][172.16.22.153:9300][internal:coordination/fault_detection/leader_check]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: rejecting leader check since [{esnode5}{RPzC_iENSOiSynpEvT0zag}{T4D5QV6jRvu43I_puZ2iXA}{172.16.22.191}{172.16.22.191:9300}{dir}] has been removed from the cluster
Pls help identify root cause. Thanks,
TM