[ERROR] Can't start cross cluster replication

Cindy · July 8, 2021, 6:54am

I have installed the cross cluster plugin on both the clusters, and my ES version is 7.10.2, opendistron version is 1.13.2.
I am follew to cross cluster connectivity as mentioned in step (cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub)

When I try the steps to “Start replication”, I get the following error
curl -k -u testuser:testuser -XPUT “https://${FOLLOWER}/_opendistro/_replication/follower-01/_start?pretty” -H ‘Content-type: application/json’ -d’{“remote_cluster”: “leader-cluster”, “remote_index”: “leader-01”}’
{
“error” : {
“root_cause” : [
{
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
}
],
“type” : “action_not_found_transport_exception”,
“reason” : “No handler for action [internal:indices/admin/opendistro/replication/index/start]”
},
“status” : 500
}

Please help me on this issue please.Thanks,
Cindy

searchymcsearchface · July 8, 2021, 4:59pm

@ccr-devs Any thoughts here?

krishna_ggk · July 13, 2021, 8:32am

Hi Cindy,

Apologies for the delay. It looks like you are missing the cross-cluster-replication plugin. You can confirm this by running the following command.

curl -k -u testuser:testuser -XGET “https://${FOLLOWER}/_cat/plugins

curl -k -u testuser:testuser -XGET “https://${LEADER}/_cat/plugins

The current CCR plugin is experimental and need to be installed explicitly. Can you try the instructions if you haven’t already?

Please let us know if these steps didn’t help.

Cindy · July 14, 2021, 10:09am

Dear @krishna_ggk

I use internal user-testuser will show no permission, I get the following error
curl -k -u testuser:testuser -XGET https://${FOLLOWER}/_cat/plugins
{
“error” : {
“root_cause” : [
{
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
}
],
“type” : “security_exception”,
“reason” : “no permissions for [cluster:monitor/nodes/info] and User [name=testuser, backend_roles=, requestedTenant=null]”
},
“status” : 403
}

So I run the following command by admin, I get the following response
curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
LEADER

I find LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitly installed explicitly.
So i run command to check in /usr/share/elasticsearch/bin/elasticsearch-plugin, here is showing already install.
q01-list

Thanks for your response!
Cindy

Cindy · July 15, 2021, 2:41am

Hi @krishna_ggk

Update my experiment!
LEADER ndoe1 and FOLLOWER node3 CCR plugin aren’t installed explicitlyinstalled explicitly with using this command to find out curl -k -u admin:admin -XGET https://${LEADER}/_cat/plugins?pretty
I stopped LEADER ndoe1 and FOLLOWER node3,then CCR was successful.

I added doc with leader-01 index, i didn’t why follower-03 don’t replicated from leader-01 doc.
(leader and follower cluster doc count as shown below)

BlackMetalz · July 15, 2021, 4:29am

I guess we called it is a feature, not a bug

BlackMetalz · July 15, 2021, 5:03am

BTW I test CCR in the dev tool in 2 clusters, each cluster has 3 nodes. ( i used admin user so I guess it already has all required permission )

[2021-07-15T11:59:08,719][WARN ][o.e.s.InternalSnapshotsInfoService] [adt-sys-kienlt-dev-92-67] failed to retrieve shard size for [snapshot=opendistro-remote-repo-leader-cluster:opendistro-remote-snapshot/262b3eb7-92a2-3e1c-b326-4b314730ed32, index=[leader-01/Zl5NOk4RRMuIpzWDrZ5hYw], shard=[follower-01][0]]
org.elasticsearch.ElasticsearchSecurityException: No user found for indices:monitor/stats

[2021-07-15T11:59:09,242][WARN ][o.e.p.PersistentTasksClusterService] [adt-sys-kienlt-dev-92-67] persistent task replication:index:follower-01 failed
org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: replication_exception: Remote restore failed: shard could not be allocated to any of the nodes
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask.waitForRestore(IndexReplicationTask.kt:277) ~[?:?]
	at com.amazon.elasticsearch.replication.task.index.IndexReplicationTask$waitForRestore$1.invokeSuspend(IndexReplicationTask.kt) ~[?:?]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56) ~[?:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-07-15T11:59:09,262][INFO ][o.e.c.m.MetadataDeleteIndexService] [adt-sys-kienlt-dev-92-67] [follower-01/lxS3ZMJkQLGAUulx6lkMag] deleting index

since the log above appears, no index in the follower cluster appears.
Source guide: cross-cluster-replication/HANDBOOK.md at main · opendistro-for-elasticsearch/cross-cluster-replication · GitHub

Edit: nvm. Add those line into elastiscearch.yml

opendistro_security.unsupported.inject_user.enabled: true
opendistro_security.nodes_dn_dynamic_config_enabled: true
node.remote_cluster_client: true

BlackMetalz · July 15, 2021, 10:06am

Yes it doesn’t replicate new data to replicated cluster.
My example:
Create test_ccr index in main cluster.
Start in replicate cluster like:

PUT _opendistro/_replication/test_ccr/_start?pretty
{
  "remote_cluster": "leader-cluster",
  "remote_index": "test_ccr"
}

It does replicates all document from main cluster. But when i keep insert more document, in replicate cluster doesn’t change.

I tried to stop and start again but it says

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cant use same index again for replication. Either close or delete the index:test_ccr"
  },
  "status" : 400
}

rivanshu · April 19, 2022, 11:27am

I have tried restarting the replication after closing the index but still getting the same error as above

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Cant use same index again for replication. Either close or delete the index:follower-test"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Cant use same index again for replication. Either close or delete the index:follower-test"
  },
  "status" : 400
}

but If I delete the index then it works fine. Can somebody help with the close thing not working?

soosinha · April 25, 2022, 6:24am

@rivanshu I believe you are using the old opendistro version of the replication plugin.
In the opensearch replication plugin this has been changed and you need to delete the index (closing the index will not work).
Code reference: cross-cluster-replication/TransportReplicateIndexClusterManagerNodeAction.kt at main · opensearch-project/cross-cluster-replication · GitHub

rivanshu · April 25, 2022, 10:36am

@soosinha yes I am using the old opendistro version which I guess supports the replication on closed indices but is not working for me. Do you have any leads on what could be the issue?

soosinha · April 25, 2022, 11:15am

As per the code, it checks for the cluster state for the presence of the index before starting replication. But the index will be present in the cluster state even if the index is closed. So it needs to be deleted before starting replication. Although, the validation messaging may be incorrect.
So I guess you will need to delete the index if you want to used the same index name.
Note that the opendistro CCR plugin was experimental and there was no actual release for the plugin. I would recommend you to use the OpenSearch CCR plugin

rivanshu · April 25, 2022, 12:44pm

Is this new plugin compatible with the regular Elasticsearch or only OpenSearch?

soosinha · April 25, 2022, 12:46pm

It is compatible with OpenSearch only

Topic		Replies	Views
Error on resume replication Cross-Cluster Replication	6	711	December 16, 2021
Replication of k-NN indices doesn't work Cross-Cluster Replication	5	399	May 20, 2023
Remote cluster between es and opensearch OpenSearch troubleshoot	10	359	April 6, 2023
Replication failed Cross-Cluster Replication troubleshoot	7	1119	January 28, 2022
Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest Cross-Cluster Replication	2	712	December 15, 2021

[ERROR] Can't start cross cluster replication

Related Topics