Security plugin performance

I have a strange issue which I’m struggling to explain and am hoping someone can help.

I’ve just built a few OpenDistro clusters to replace a legacy Elasticsearch cluster. The old cluster had a dozen dedicated coordinating nodes and with the new clusters I started off with a similar number of coordinating nodes, split across the clusters. While building up the load on the cluster some of the coordinating nodes (which are receiving bulk write requests from a few thousand forwarders) started to max their CPU while others were relatively sleepy.

The hot_threads API is telling me that these nodes are spending the majority of their time on the following:

java.base@13.0.1/java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntrySetSpliterator.forEachRemaining(Collections.java:1601)
   java.base@13.0.1/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
   java.base@13.0.1/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
   java.base@13.0.1/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
   java.base@13.0.1/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   java.base@13.0.1/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer.resolveIndexPatterns(IndexResolverReplacer.java:236)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer.access$300(IndexResolverReplacer.java:110)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer$2.provide(IndexResolverReplacer.java:331)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer.getOrReplaceAllIndices(IndexResolverReplacer.java:775)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer.getOrReplaceAllIndices(IndexResolverReplacer.java:668)
   com.amazon.opendistroforelasticsearch.security.resolver.IndexResolverReplacer.resolveRequest(IndexResolverReplacer.java:326)
   com.amazon.opendistroforelasticsearch.security.privileges.PrivilegesEvaluator.evaluate(PrivilegesEvaluator.java:186)
   com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:252)
   com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119)
   app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:151)

I have scanned through the code and suspect the looping that will be performed by the following line is playing poorly when there are a reasonable number of indices and/or aliases - in each cluster there is ~1.5k aliases (each pointing to the head of an index series).

The thing that is confusing me however is why the CPU load is so unevenly distributed across the coordinating nodes. They are all reporting the above at the hot path however as I said some of them are maxed on CPU and others are very far from it. Can anyone explain this? I could probably live with things if the load was split evenly across the nodes but this is making things difficult to scale.