OpenSearch Client Java & Python APIs

Good day everyone,

Any plan for providing/supporting Elasticsearch Java Highlevel REST APIs and Python APIs?

Will currently writtern code that uses these APIs break now that elasticsearch namespaces have changed to reflect OpenSearch?

Will there be such APIs for OpenSearch maintained in repositories such as Maven Central and PyPI?

Regards,
Hasan

1 Like

@asfoorial I’m working on clients code generator in my spare time.
I started here the fork of Elasticsearch code generator GitHub - aparo/opensearch-client-generator: OpenDistro Client code generator to be used with Elasticsearch and now I’m working to providing OpenAPI.
My task list is the following:

  • openAPI descriptor
  • Python (typed) client
  • Java/Scala client
    My idea is also to provide scripts (python or similar) to automatically migrate the code from Elasticsearch to OpenSearch.
2 Likes

@aparo This is good news. However, does it maintain the same Java/Python APIs and signatures? In other words, if we have existing code that uses current elasticsearch Java/Python APIs would it continue to work if we use these new OpenSearch Java/Python clients?

Thanks,
Hasan

@asfoorial I will try to keep as it the API, but the python ones are missing types and in some case also parameters.

@aparo I tried to generate python client from OpenSearch repo. It was successful. I believe types can be added but typing depends what versions the package should support. I would recommend putting up compatibility requirements. IMO, I would drop python 2 support completely. And requires 3.7+ since aiohttp support 3.7+.

The documentation still links to elastic.co. If kept unchanged, I believe that will be difficult to align client users when OpenSearch is released.

python 2 is dead since last year: PEP 373 – Python 2.7 Release Schedule | peps.python.org (we’ll, it’s been dead for much longer, really, but last year marked the official EOL for it)

Can I just say how happy I am that the Python 2/3 thing can be history now?

1 Like

@erickg

Excellent.

Could you please share the code to the generated python API?

Thanks

@ralph elasticsearch-py still declares Python 2 support as long as 3.4, 3.5, 3.6 from its manifest.

@searchymcsearchface Yep, me too.

@asfoorial Code from elastichsearch-py. I changed utils/generate-api.py to get the client code.

1 Like

Niiiice. Looks like a lot of manual work still needs to be done, but it’s a start.

Before discussion work, I need some help with license. The README.md stated Copyright 2021 Elasticsearch B.V. Licensed under the Apache License, Version 2.0. LICENSE file says Apache License. This is confusing to me.

Let me try and find someone to help you!

1 Like

Disclaimer: This is not legal advice, I am not a lawyer. I have worked in the open source legal space for 15+ years and worked closely with Red Hat Legal in my previous job.

@erickg, what you see in the README.md is what I would expect to see when you fork another project. They are attributing the copyright holder on the original work, and documenting the license that the work is under.

As you add/modify code to your fork, their copyright statement (and license for their copyrighted changes) still apply, so you need to be sure to retain that attribution (in README.md and wherever it appears in the code files, probably in the comment header). What you can do is append your own copyright statement, like this:

Copyright 2021 Elasticsearch B.V. 
Copyright 2021 ErickG

Licensed under the Apache License, Version 2.0

You can definitely do this in README.md without issue, and you can make this change to any source files you modify. If you create entirely new files (that do not copy content from existing files), you do not need to include the Elasticsearch copyright attribution statement.

I am assuming, for simplicity, that your fork intends to keep the Apache License, Version 2.0, that you inherited from the upstream fork. It is possible for your changes to be under a different license, but it complicates things (including my answer) quite a bit, so my advice to you would be to keep your fork Apache 2.0.

If you have additional Copyright or License questions, please feel free to ask me, and I will do my best to help you.

1 Like

Thanks, that’s helpful. Of course, I would love to have Apache License.

I have a question about source repo elasticsearch-py’s license. It says

Copyright 2021 Elasticsearch B.V. Licensed under the Apache License, Version 2.0.

This doesn’t align with APL declaration in elasticsearch-py/LICENSE at master · elastic/elasticsearch-py (github.com). Does that mean the code is not APL anymore?

I’m not sure I follow 100% - copyright and license are two separate things. The repo you linked looks like Apache to me.

@spotfoss Thoughts?

The file that you linked to is a copy of the Apache License 2.0. I am also unsure where your confusion is coming from, as this matches the statement in README.md.

IMHO, the actual python API sucks because are very 2.x legacy designed.
They don’t follow the actual approach of using Python Typed for methods and objects.
It’s ok to maintain for old created code, but for new one it should better to move a more python modern approach.
The same if for Jaa API that the High-Level are poor of entity model design.

1 Like

Thanks. It makes sense to me now. Then the library can be licensed as Apache License 2.0 with updated copyright holder when it moves on.

I’d be more than happy to be able to use more python 3.6 features and onwards if possible. To start, I would aim for being able to make API requests to OpenSearch 1.0 without issues.

I spent some time looking at python client codebase. There are different hacks to make it work as of today.
Since elasticsearch-py 7.* should be able to work with the 1.0. I think it doesn’t matter for me to break compatibility between two client projects. I aim to build the client to talk to OpenSearch 1.x and drops Elasticsearch compatibility when not possible. Correct me if I misunderstood.

I see some areas to improve:

  • Use AST to generate functions instead of using templates.
  • See if I can composite API classes after the above point.
  • Then the package can go for native typing hints.
    • Start support py 3.6 at least.
  • Change API function signature to requests style. Basically, drop query_params support which extracts kwargs to params. That confused the most why the client works differently comparing to HTTP requests in Kibana console.
    • Then I don’t need to rename type to doc_type for URL query parameters.
  • Drop XPack API
  • Annotate network modules
  • A lot of renaming in comments and documentations
  • Type hints.
  • Changes with OpenSearch
    • Bulk ingestion errors are very difficult to know
    • lz4 compression (this needs clusters support)
    • Probably better to have a pipeline with OpenSearch sooner than later. There are tests that are done together with Jenkins pipeline for elasticsearch-py that requires a cluster.

What do you think it’s important for you? Any other thoughts?