Generate CSV creates csv with incorrect rows

Hi,
I’m trying to use Reporting (Generate CSV).
I have huge dataset which I have indexed into Opensearch 1.0 ( on AWS).
I want to search documents based on some criteria and then download the results (around 253 entries) as csv.

First try:
In the discover, I’m adding the filter based on my needs, I can see that number of hits shown are correct (around 253 entries). Now I want to download that into csv , but the “Generate CSV” is greyed out, I cannot click. See attached screenshot.

Second try:
I saved the above as search and opened it again, “Generate CSV” is not greyed out this time around. However, if I go ahead and click the generate csv, the csv file contains 10,000 entries not matching the number of hits(253)

what’s that I’m doing wrong?

I’m using
version": {
“number”: “7.10.2”,
“build_type”: “tar”,
“build_hash”: “unknown”,
“build_date”: “2021-08-20T12:03:05.728738Z”,
“build_snapshot”: false,
“lucene_version”: “8.8.2”,
“minimum_wire_compatibility_version”: “6.8.0”,
“minimum_index_compatibility_version”: “6.0.0-beta1”
},
“tagline”: “The OpenSearch Project: https://opensearch.org/

Regards
Prasanna

Humm. That sounds pretty weird. Two thoughts:

  • Was the data continually being appended to and you only applied a lower time range? (Hard to tell from screen shot)
  • Did you somehow change your time range?

When you save a search the time range is not included in the saving of the search.


To answer your questions:

  • No data was not being appended
  • No change in time range

I have attached another screenshot, I’m looking at data from last 1 year.
From the screenshot, you can see, there are supposed to 209 entries in csv because of 209 hits. However csv has exactly 10,000 entries always, irrespective of what kind of filter is applied and what number of hits shown.

I tried generating report on another index, and there it works fine.
Not able to understand what’s wrong with this index.

Yeah, that’s weird. I tried quite a few methods recreating your issue. The only time I got close was when the time range wasn’t explicitly selected (which is just user error, perhaps a little UX problem) but if this is happening for you consistently I’m not sure what’s going on.

You might be able to see what’s happening more clearly by looking at what’s being sent to the Reporting plugin from the browser. If you use Chrome, go to View > Developer Tools before performing the action to generate the report. Click on “Network”. Then generate the report.

You should see a request called “generateReport?timezone=…” then click on the “Headers” tab and scroll down to “Request Payload” and expand the tree. This is what it looks like for me:

What would be really interesting is to see if the time_from, time_to match up with what you’re expecting (decode the unix timestamps) as well as the time_duration.