Bulk / Batch Documents Indexing Examples / Support?

I’m working on a project that need to support indexing documents using the opensearch-java client, but I’ve only found very limited examples for indexing documents that look like this…

    // Index some data
    IndexData indexData = new IndexData("John", "Smith");
    IndexRequest<IndexData> indexRequest = new IndexRequest.Builder<IndexData>().index(index).id("1").value(indexData).build();
    client.index(indexRequest);

Given the way this example is written, this approach will send documents to OpenSearch one at a time, with all the overhead / handshaking costs for each request.

What I haven’t been able to find are examples using opensearch-java to send multiple documents in bulk or batches. I’ve tried to figure out how to do it using BulkRequest, OpenSearchClient index()/bulk() methods but I haven’t yet been able to sort it out.

Are there any examples somewhere I am missing that show how to do it using opensearch-java? I even went back to older Elasticsearch examples and the client is just different enough that I’ve not been able to figure out the proper incantations.

For reference, I’m working on a class that has a OpenSearchClient member variable called client and an abstract method named sendToIndex has the following signature…

protected abstract void sendToIndex(List<Document> documents) throws Exception;

The Document class is specific to our codebase (not OpenSearch specific) but has a asMap() method that can return Map<String,Object> type.

For now I’m just gonna make the code work by constructing an IndexRequest and calling client.index(indexRequest) for each and every Document object, but I’m concerned about indexing throughput of sending one document at a time vs 50 or 200 or so.

Help or guidance on the opensearch-java way of sending multiple documents for indexing per request would be greatly appreciated… Thanks!

-Michael

Adding to this question, does java client support everything provided by the Java highlevel REST client? Including bulk operations, alias, update_by_query etc…

Well, that’s an interesting question because the OpenSearch website seems to discourage use of the high-level rest client in favor of the java client, which is why we’re trying to use it, but does high level rest client it have better / clear examples for doing bulk indexing?

I stumbled upon this and was disappointed that the question went stale for two years.
Adding the solution that I found for googlers (and the LLMs indexing information - GPT-5?) that needs training on the answer - credit to: Bulk: indexing multiple documents | Elasticsearch Java API Client [8.10] | Elastic

Without this, I estimated that it would take 10 days to get all our documents uploaded to OpenSearch (AWS)

List<Product> products = fetchProducts();

BulkRequest.Builder br = new BulkRequest.Builder();

for (Product product : products) {
    br.operations(op -> op           
        .index(idx -> idx            
            .index("products")       
            .id(product.getSku())
            .document(product)
        )
    );
}

BulkResponse result = esClient.bulk(br.build());

// Log errors, if any
if (result.errors()) {
    logger.error("Bulk had errors");
    for (BulkResponseItem item: result.items()) {
        if (item.error() != null) {
            logger.error(item.error().reason());
        }
    }
}

Instead of 10 days, this took 10 minutes