Performance impact using client certificate authentication

Hello,

I post this question here because I suppose this also concerns opensearch.

I noticed some strange behavior when multiple clients (like a service scaled up by the autoscaler in kubernetes) connects to a elasticsearch/opendistro cluster. The connections were accepted slowly and were eventually rejected because of the full tcp backlog. After some investigation I noticed that connections using client certificates are slower established compared to the ones without client certificate.

A small test script visualizes the difference. It tries to establish 5000 connections, sends /_cluster/health every 15 seconds and timeouts and then retries after 5 seconds (‘ok’ means /_cluster/health request was successful, ‘connections’ are established connections)

with client certificate

without client certificate

without-clientcert

Is this due to a configuration error? Can the behaviour be improved?

Regards,
Matthias

Hello @sezuan2,

I think this behaviour would be expected as certificate-based connections will need extra time to encrypt and decrypt the packets.

How many ES nodes do you have in the cluster? Have you monitored RAM and Java Heap usage in those nodes?

Hello,

It’s a 6 nodes cluster, the connections are going to just one of the nodes. RAM and Java Heap looked good, gc times are good, too.

What is the version of ODFE?
Do you use Kibana to connect with ES or you have custom app?

Load test and the real application are custom.

What about the ODFE ES version?

@sezuan2

Could you share your config.yml content?

It’s elasticsearch 7.8.0 and Opendistro 1.9.0.

Here it is, with redacted parts:

  dynamic:
    filtered_alias_mode: "warn"
    disable_rest_auth: false
    disable_intertransport_auth: false
    respect_request_indices_options: false
    license: null
    kibana:
      multitenancy_enabled: true
      server_username: "kibanaserver"
      index: ".kibana"
    http:
      anonymous_auth_enabled: true
      xff:
        enabled: true
        internalProxies: "<redacted:regex>"
        remoteIpHeader: "X-Forwarded-For"
    authc:
      clientcert_auth_domain:
        description: "Authenticate via SSL client certificates"
        http_enabled: true
        transport_enabled: false
        order: 3
        http_authenticator:
          type: clientcert
          config:
            username_attribute: cn  #optional, if omitted DN becomes username
          challenge: false
        authentication_backend:
          type: "noop"
      ldap:
        http_enabled: true
        transport_enabled: false
        order: 1
        http_authenticator:
          challenge: false
          type: "basic"
          config: {}
        authentication_backend:
          type: "ldap"
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - "<redacted:ldap-server"
            bind_dn: "<redacted:bind_dn>"
            password: "<redacted:password>"
            userbase: "<redacted:userbase>"
            usersearch: "(uid={0})"
            username_attribute: "uid"
        description: "Migrated from v6"
      basic_internal_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 2
        http_authenticator:
          challenge: false
          type: "basic"
          config: {}
        authentication_backend:
          type: "intern"
          config: {}
        description: "Migrated from v6"
    authz:
      roles_from_myldap:
        http_enabled: true
        transport_enabled: false
        authorization_backend:
          type: "ldap"
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: true
            hosts:
            - "<redacted:ldap-server>"
            bind_dn: "<redacted:bind_dn>"
            password: "<redacted:password>"
            rolesearch: "(member={0})"
            userroleattribute: null
            userrolename: "disabled"
            rolename: "cn"
            resolve_nested_roles: true
            rolebase: "<redacted:rolebase>"
            usersearch: "(uid={0})"
            skip_users:
            - <redacted:various internal_users>
            - <redacted:*.domain which matches the client certs>
            - "opendistro_security_anonymous"
        description: "Migrated from v6"
    auth_failure_listeners: {}
    do_not_fail_on_forbidden: false
    multi_rolespan_enabled: false
    hosts_resolver_mode: "ip-only"
    transport_userrname_attribute: null
    do_not_fail_on_forbidden_empty: false

@sezuan2

According to that config you’re using LDAP with SSL certificate. As far as I understood, you were testing LDAP with and without a secured connection (SSL cert). Without a secured connection (HTTP port 389) you have no performance issues (no timeouts). With SSL cert enabled (HTTPS port 636) you get timeouts with some requests.

Could you tell me what is your LDAP solution?

I’m testing ssl encrypted connections to elasticsearch, with and without client cert. I assume the ldap server should never be asked, because the client cert names and the anonymous user are in the skip_users list.

            skip_users:
            - <redacted:various internal_users>
            - <redacted:*.domain which matches the client certs>
            - "opendistro_security_anonymous"

@sezuan2

skip_users will work only for authorization. Plug-in will still try to authenticate client certs with LDAP and basic authentication. Could you try to change the authentication order as per the below:

  1. basic_auth
  2. client_cert
  3. ldap

I’ll test the new order. Howerver, during my tests I’ve just tested client-cert vs. without-client-cert. In none of these tests a basic authentication header was sent. I would expect that in this case, the authc ldap part will be ignored.

Hi Pablo!

this was a hint in the right direction. Removing the authz->ldap section made the client certificate requests fast. This is still confusing as no significant amount of ldap requests are visible with tcpdump.

Do you have any idea to limit the ldap role lookup to ldap users?

@sezuan2 if you change the authentication order, with ldap being last, the look up should only be done if ldap is used, meaning basic_auth and client_cert, failed.

Have you tried changing the order?

Yes, but it didn’t help. I also removed the ldap section from authentication, but it didn’t help, too. For unknown reason, it seems to do a ldap role lookup for client certificate users but not basic authenticated users.

I did some more investigation. I observed when the authz.ldap section is configured, elasticsearch spends a lot of time while accessing the cache:

Without the ldap role section

With ldap role section

cert+ldaproles|643x500

@sezuan2 after further looking into this, I can see that the call to ldap is performed by design even for cert users, However I am not able to reproduce the delay that you are experiencing. You should be able to skip users using wildcard (like you have with .domain), this need to match the full cn, can you try to use "" as a starting point to see if this skips the ldap section altogether and work backwards from there?

@Anthony
I tried:

- ""
- "/.*/"

but to no avail.

I think it’s not caused by the ldap lookup itself. When you check the flamegraphs above you see the suspicious large amount of time spend in getEntry and lockedOrGetLoad. It’s strange that this doesn’t happen with basic auth. I also removed all skip_users, retested with basic auth, but the request time was still good.

If it’s really caused by lock issues, a high number of threads is probably required to replicate. I’m testing this on a cluster whose nodes have 52cores/104 threads.

Issue seems to be with caching on authz side, caching on authc works as expected, advised to raise a bug ticket here