Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest

stdmje · December 10, 2021, 8:36am

Hi,

Using opensearch 1.2.0 i see several warns in the logs like the following:

[2021-12-10T08:32:44,016][WARN ][o.o.r.t.s.ShardReplicationTask] [opensearch-replica-master-1] [cadence-visibility][4] Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest@78ef1be8. Retrying in 10 seconds.
org.opensearch.OpenSearchTimeoutException: global checkpoint not synced. Retry after a few miliseconds...
	at org.opensearch.replication.action.changes.TransportGetChangesAction$asyncShardOperation$1.invokeSuspend(TransportGetChangesAction.kt:93) ~[opensearch-cross-cluster-replication-1.2.0.0.jar:1.2.0.0]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.3.72.jar:1.3.72-release-468 (1.3.72)]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56) [kotlinx-coroutines-core-1.3.5.jar:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:733) [opensearch-1.2.0.jar:1.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
	at org.opensearch.replication.action.changes.TransportGetChangesAction$asyncShardOperation$1.invokeSuspend(TransportGetChangesAction.kt:93) ~[opensearch-cross-cluster-replication-1.2.0.0.jar:1.2.0.0]
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.3.72.jar:1.3.72-release-468 (1.3.72)]
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56) [kotlinx-coroutines-core-1.3.5.jar:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

Apparently the replication is working fine

{
  "status" : "SYNCING",
  "reason" : "User initiated",
  "leader_alias" : "master",
  "leader_index" : "cadence-visibility",
  "follower_index" : "cadence-visibility",
  "syncing_details" : {
    "leader_checkpoint" : 307987,
    "follower_checkpoint" : 307984,
    "seq_no" : 307986
  }
}

Any suggestions about what is happening here?

Thanks in advance

saikaranam · December 14, 2021, 5:14pm

@stdmje thanks for flagging the issue.
Tasks at the follower cluster waits for certain duration for the operations to sync at the leader cluster. If those operations are not synced within the duration, this exception is logged.

These are not replication failures. Logging should be minimal here to avoid the noise.
opened an issue to track this.

stdmje · December 15, 2021, 9:21am

Great, thank you for the information.

Topic		Replies	Views
Replication failed Cross-Cluster Replication troubleshoot	7	1113	January 28, 2022
Error on resume replication Cross-Cluster Replication	6	707	December 16, 2021
[ERROR] Can't start cross cluster replication Cross-Cluster Replication troubleshoot	13	1758	April 25, 2022
Opendistro for elasticsearch to OpenSearch Upgrade Issues OpenDistro	1	665	December 30, 2021
Replication of k-NN indices doesn't work Cross-Cluster Replication	5	396	May 20, 2023

Encountered a failure while executing in org.opensearch.replication.action.changes.GetChangesRequest

Related Topics