[ERROR] Can't start cross cluster replication

I have installed the cross cluster plugin on both the clusters, and my ES version is 7.10.2, opendistron version is 1.13.2.
I am follew to cross cluster connectivity as mentioned in step (cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub)

When I try the steps to “Start replication”, I get the following error
curl -k -u testuser:testuser -XPUT “https://${FOLLOWER}/_opendistro/_replication/follower-01/_start?pretty” -H ‘Content-type: application/json’ -d’{“remote_cluster”: “leader-cluster”, “remote_index”: “leader-01”}’
{
“error” : {
“root_cause” : [
{
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
}
],
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
},
“status” : 500
}

Please help me on this issue please.Thanks,
Cindy

@ccr-devs Any thoughts here?

Hi Cindy,

Apologies for the delay. It looks like you are missing the cross-cluster-replication plugin. You can confirm this by running the following command.

curl -k -u testuser:testuser -XGET “https://${FOLLOWER}/_cat/plugins

curl -k -u testuser:testuser -XGET “https://${LEADER}/_cat/plugins

The current CCR plugin is experimental and need to be installed explicitly. Can you try the instructions if you haven’t already?

Please let us know if these steps didn’t help.

Dear @krishna_ggk

I use internal user-testuser will show no permission, I get the following error
curl -k -u testuser:testuser -XGET https://${FOLLOWER}/_cat/plugins
{
“error” : {
“root_cause” : [
{
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
}
],
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
},
“status” : 403
}

So I run the following command by admin, I get the following response
curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
LEADER

I find LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitly installed explicitly.
So i run command to check in /usr/share/elasticsearch/bin/elasticsearch-plugin, here is showing already install.
q01-list

Thanks for your response!
Cindy

Hi @krishna_ggk

Update my experiment!
LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitlyinstalled explicitly with using this command to find out curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
I stopped LEADER ndoe1 and FOLLOWER node3,then CCR was successful.

I added doc with leader-01 index, i didn’t why follower-03 don’t replicated from leader-01 doc.
(leader and follower cluster doc count as shown below)

1 Like

I guess we called it is a feature, not a bug :joy:

BTW I test CCR in the dev tool in 2 clusters, each cluster has 3 nodes. ( i used admin user so I guess it already has all required permission )

[2021-07-15T11:59:08,719][WARN ][o.e.s.InternalSnapshotsInfoService] [adt-sys-kienlt-dev-92-67] failed to retrieve shard size for [snapshot=opendistro-remote-repo-leader-cluster:opendistro-remote-snapshot/262b3eb7-92a2-3e1c-b326-4b314730ed32, index=[leader-01/Zl5NOk4RRMuIpzWDrZ5hYw], shard=[follower-01][0]]
org.elasticsearch.ElasticsearchSecurityException: No user found for indices:monitor/stats

[2021-07-15T11:59:09,242][WARN ][o.e.p.PersistentTasksClusterService] [adt-sys-kienlt-dev-92-67] persistent task replication:index:follower-01 failed
org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: replication_exception: Remote restore failed: shard could not be allocated to any of the nodes
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask.waitForRestore(IndexReplicationTask.kt:277) ~[?:?]
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask$waitForRestore$1.invokeSuspend(IndexReplicationTask.kt) ~[?:?]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56) ~[?:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-07-15T11:59:09,262][INFO ][o.e.c.m.MetadataDeleteIndexService] [adt-sys-kienlt-dev-92-67] [follower-01/lxS3ZMJkQLGAUulx6lkMag] deleting index

since the log above appears, no index in the follower cluster appears.
Source guide: cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub

Edit: nvm. Add those line into elastiscearch.yml :smiley:

opendistro_security.unsupported.inject_user.enabled: true
opendistro_security.nodes_dn_dynamic_config_enabled: true
node.remote_cluster_client: true

Yes it doesn’t replicate new data to replicated cluster.
My example:
Create test_ccr index in main cluster.
Start in replicate cluster like:

PUT _opendistro/_replication/test_ccr/_start?pretty
{
  "remote_cluster": "leader-cluster",
  "remote_index": "test_ccr"
}

It does replicates all document from main cluster. But when i keep insert more document, in replicate cluster doesn’t change.

I tried to stop and start again but it says

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
  },
  "status" : 400
}