Docker container intermittently failing to start in CI

Hi There,

We’re running the official OpenSearch Docker image in our CI pipeline for our automated integration tests, however, the container intermittently fails to start (around 50% of the time).

Looking at the Docker logs, it isn’t clear as to why with the limited info returned (is there a way to get further logs from the container?):

          Name                        Command               State 
-------------------------------------------------------------------------------------------
api_opensearch-testing_1   ./opensearch-docker-entryp ...   Exit 0
                                                                                                                     
opensearch-testing_1     | Killing opensearch process 10
opensearch-testing_1     | Killing performance analyzer process 11
opensearch-testing_1     | OpenSearch exited with code 143
opensearch-testing_1     | Performance analyzer exited with code 143

The service entry in the docker-compose.yml file is dead simple:

version: "3"

services:
  # ...

  opensearch-testing:
    image: opensearchproject/opensearch:1.0.1
    environment:
      discovery.type: single-node

By retrying the build several times, it eventually works, but obviously, this isn’t ideal.

For reference, we’re running this in GitHub Actions against their ubuntu-18.04 environment.

Has anyone else run into this problem, or has any suggestions to try?

I haven’t personally seen the startup issues, but you might want to checkout the docs for logging. For the docker container, I don’t think the default is particularly verbrose, so you’d need to alter the opensearch.yml.

Also, make sure your vm.max_map_count is set correctly.

1 Like

Thanks @searchymcsearchface - the issue seems to have resolved itself (for now at least).

If it reoccurs, I’ll look into the verbose logging and update here.

Sure thing - keep us updated if it crops up again - I’m sure everyone here would be interested.

The error has returned, and I now have some logs.

Running ls /usr/share/opensearch/logs returns only a single file:

performance-analyzer.log

This file has the following contents:

OpenJDK 64-Bit Server VM warning: Option InitialBootClassLoaderMetaspaceSize was deprecated in version 15.0 and will likely be removed in a future release.
13:08:35.100 [rca-controller] ERROR org.opensearch.performanceanalyzer.rca.RcaController - Error reading file /usr/share/opensearch/data/rca_enabled.conf
java.nio.file.NoSuchFileException: /usr/share/opensearch/data/rca_enabled.conf
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:375) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:426) ~[?:?]
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:420) ~[?:?]
	at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
	at java.util.Scanner.<init>(Scanner.java:718) ~[?:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.lambda$readRcaEnabledFromConf$1(RcaController.java:361) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.core.Util.lambda$invokePrivileged$1(Util.java:78) [performance-analyzer-rca-1.0.1.0.jar:?]
	at java.security.AccessController.doPrivileged(AccessController.java:312) [?:?]
	at org.opensearch.performanceanalyzer.core.Util.invokePrivileged(Util.java:74) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.readRcaEnabledFromConf(RcaController.java:359) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.run(RcaController.java:319) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.PerformanceAnalyzerApp.lambda$startRcaTopLevelThread$0(PerformanceAnalyzerApp.java:195) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.threads.ThreadProvider.lambda$createThreadForRunnable$0(ThreadProvider.java:67) [performance-analyzer-rca-1.0.1.0.jar:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
13:08:35.464 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
Sep 17, 2021 1:08:35 PM org.jooq.tools.JooqLogger info
INFO: 
                                      
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@  @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@        @@@@@@@@@@
@@@@@@@@@@@@@@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@  @@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@        @@  @  @  @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@  @@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  Thank you for using jOOQ 3.10.8
                                      
13:08:35.929 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
13:08:37.982 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf
13:08:40.076 [rca-controller] ERROR org.opensearch.performanceanalyzer.rca.RcaController - Error reading file /usr/share/opensearch/data/rca_enabled.conf
java.nio.file.NoSuchFileException: /usr/share/opensearch/data/rca_enabled.conf
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:375) ~[?:?]
	at java.nio.file.Files.newByteChannel(Files.java:426) ~[?:?]
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:420) ~[?:?]
	at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
	at java.util.Scanner.<init>(Scanner.java:718) ~[?:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.lambda$readRcaEnabledFromConf$1(RcaController.java:361) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.core.Util.lambda$invokePrivileged$1(Util.java:78) [performance-analyzer-rca-1.0.1.0.jar:?]
	at java.security.AccessController.doPrivileged(AccessController.java:312) [?:?]
	at org.opensearch.performanceanalyzer.core.Util.invokePrivileged(Util.java:74) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.readRcaEnabledFromConf(RcaController.java:359) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.rca.RcaController.run(RcaController.java:319) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.PerformanceAnalyzerApp.lambda$startRcaTopLevelThread$0(PerformanceAnalyzerApp.java:195) [performance-analyzer-rca-1.0.1.0.jar:?]
	at org.opensearch.performanceanalyzer.threads.ThreadProvider.lambda$createThreadForRunnable$0(ThreadProvider.java:67) [performance-analyzer-rca-1.0.1.0.jar:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
Sep 17, 2021 1:08:40 PM org.jooq.tools.JooqLogger info
INFO: Single batch             : No bind variables have been provided with a single statement batch execution. This may be due to accidental API misuse
13:08:40.603 [pa-reader] ERROR org.opensearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/opensearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/opensearch/data/batch_metrics_enabled.conf

I also checked the max map count setting in the CI runner, and it’s set to 262144.

performance-analyzer.log shouldn’t matter at all. It’s also really not needed in a CI setup. Do you get a similar log file with a successful CI run? I suspect this is not a problem but it could be very safely removed.

I’ve got some logs here for a successful run. Due to the large amount of text, I’ve had to upload it as a gist: OpenSearch CI successful startup logs · GitHub

Do you need to run the securityadmin.sh script? I see the line:

Not yet initialized (you may need to run securityadmin)

Maybe you need to docker exec into the running container to run this script.
Some documentation here

1 Like

I there, I’m encountering the same issue while running an opensearch node in a github actions workflow, using opensearch 2.3.0:

Here is my config:

services:
      opensearch_e2e:
        image: opensearchproject/opensearch:2.3.0
        env:
          discovery.type: single-node
          bootstrap.memory_lock: true
          OPENSEARCH_JAVA_OPTS: -Xms512m -Xmx512m
          DISABLE_INSTALL_DEMO_CONFIG: true
          DISABLE_SECURITY_PLUGIN: true
        options: >-
          --health-cmd="curl http://localhost:9200/_cluster/health"
          --health-interval=10s
          --health-timeout=5s
          --health-retries=10

which result with the following error:

I would say it happens 20% of the time.

Any guess on how to investigate this?