Support Apache Arrow Protocol

liorp · June 6, 2021, 6:33pm

Add Arrow (Flight) Endpoint

Apache Arrow (https://arrow.apache.org/) is a popular in-memory columnar storage format. It is to memory what is parquet/ORC are to disk-oriented columnar storage formats.
Arrow standardize in-memory columnar data presentations for all data processing engines (Spark, Drill, Impala, etc.).
This helps with reducing the communication and serialization overheads, increases shared code-base to manage data

Flight , a new general-purpose client-server framework to simplify high performance transport of large datasets over network interfaces

One of the biggest features that sets apart Flight from other data transport frameworks is parallel transfers, allowing data to be streamed to or from a cluster of servers simultaneously.

Supporting this in opensearch will bring large benefits:

In Memory columnar standard data format that can be transported across nodes
Interoperability with standard Big Data tools & formats
Outperform ODBC or JDBC libraries by ten-folds
Support better hash join capability for inter-indexes joins
Horizontal Scalability: Parallel and Partitioned Data Access

searchymcsearchface · June 7, 2021, 2:19pm

Moving this to OpenSearch category.

Topic		Replies	Views
Document Level Security at scale Security discuss	5	216	December 11, 2023
OpenSearch JDBC driver - Error parsing root / intermediate nodes of the JSON - Compatibility with Tableau SQL troubleshoot , feature-request	2	622	February 4, 2022
Bulk / Batch Documents Indexing Examples / Support? OpenSearch Client Libraries opensearch-java	3	2438	October 16, 2023
Migration via Snapshot restore. Mapping roadblocks OpenDistro troubleshoot , upgrade	10	1188	February 16, 2022
Rest client java code to fetch all indices OpenSearch	1	163	October 4, 2023

Support Apache Arrow Protocol

Related Topics