Alternative to fscrawler in opensearch

I’ve recently moved from Elastic towards opendistro. However if i understood correctly, opensearch is the way forward instead.
I’ve moved almost all our currently used functionalities towards opensearch, however i’m left with 1 gap:
To index SMB/NFS shares in our organisation i’ve been using FSCRAWLER (Welcome to FSCrawler’s documentation! — FSCrawler 2.8-SNAPSHOT documentation), and it’s respective docker (Docker Hub).
Is there an alternative to index files on a smb/nfs share that is compatible with opensearch?
my google-fu seems to not find anything.

Thanks in advance!

Hey @Scarecrow - interesting. I wasn’t even aware of FS Crawler - looks useful. Have you tried it yet with OpenSearch? Glancing at the site I see a couple issues with their 2.8 snapshot:

Tika has explicit support of OpenSearch, but that version of the Java client has OpenSearch blocking code. There is an OpenSearch Java client in the works but in the meantime an older version of FS Crawler should work (one that uses Elasticsearch REST Client 7.13.4 or lower).

Once the OpenSearch Java client is GA, I think we could easily help FS Crawler support OpenSearch - it’s a fairly simple conversion.

@searchymcsearchface i have tried actually, with the same docker config i used for elastic.
It just starts and exits with an error code [0] which says basicly nothing :wink:
I’ll see about using an older version and what that gives, thanks for the suggestion!

wanted to give a final (?) update to this:
When i pull the 2.7 from dockerhub it’s default java rest client version is 14.0 if i understand it correctly, and it ends up refusing the connection:

So i guess i’ll have to wait for the work on the opensearch java client :frowning:

@Scarecrow Actually, you’ll have to go back to one that uses 7.13.4 as per the documentation Compatibility - OpenSearch documentation

Hi @Scarecrow,

I managed to get fscrawler working with OpenSearch, but I had to build it myself with a a few tweaks :

  1. Like what @searchymcsearchface said, it needs version 7.13.4, or you can checkout the last known code that was using 7.13.4 from the git repo… c3d120ea33c3d53fb2182ae72d5634cd15f50593
  2. Now you have to build it, but before this, there is a checkVersion() that needs to be commented out because it will halt fscrawler when it detects that the “7” version is a mismatch with OpenSearch’s “1” version number.
  3. After building, you can try to run it. It will complain that no default settings found for version “1”. So just copy the folder ~/.fscrawler/_default/7/ to ~/.fscrawler/_default/1/

Hope this helps.

hi @HelloWorld ,

you have just been promoted to be my life savior ;). Much thanks for investigating this.

I’m not well versed in git and/or building from a specific version, so I’ll have to investigate. I don’t suppose you have your own repo where this version you’ve build is running in?
I suppose you run it from your local machine where you’ve done your build, as opposed to me needing a docker image, but if I remember correctly there are docker build instructions somewhere aswell related to fscrawler, so (again) i guess I’ll have to investigate.

After the current world-ending-work-crisis (whats in a name :wink: ) has been averted I’ll report back here on my findings and experiences!

A note on #2, you can actually run OpenSearch in compatibility mode (so you don’t have to alter the version check code)

In opensearch.yml

compatibility.override_main_response_version: true

OpenSearch will report as 7.10.2