Skip to content

Conversation

@jplitza
Copy link

@jplitza jplitza commented Jul 23, 2025

The Elasticsearch platform silently ignores search terms that don't match, as long as there are also enough matching terms.

Thus, the search for 'document is a simple test' returns the license file although it doesn't contain the words "simple" or "test" or any variation thereof.

Since the semantics of full text search aren't clearly defined, it should be up to the platform to decide how to handle non-matching search terms. So not returning the license in this case is just as valid!

Case in point is PostgreSQL, which strictly requires every searched term (except for stopwords) to be contained in the search results (after normalization). These changes make the test harness pass on that platform, too.

The Elasticsearch platform silently ignores search terms that don't match, as long as there are also enough matching terms.

Thus, the search for 'document is a simple test' returns the license file although it doesn't contain the words "simple" or "test" or any variation thereof.

Since the semantics of full text search aren't clearly defined, it should be up to the platform to decide how to handle non-matching search terms. So not returning the license in this case is just as valid!

Case in point is PostgreSQL, which strictly requires every searched term (except for stopwords) to be contained in the search results (after normalization). These changes make the test harness pass on that platform, too.

Signed-off-by: Jan-Philipp Litza <janphilipp@litza.de>
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

@ArtificialOwl
Copy link
Member

Hello,

Thanks for your work, but why would we care about search on postgresql here ?

@jplitza
Copy link
Author

jplitza commented Aug 22, 2025

I do because I wrote fulltextsearch_sql. And the test harness of this (search platform independent) app is very useful for testing the platform implementation, too. So I tried to use it, and It's just this interpretation of additional words in the query that seems to differ between platforms. (MariaDB handles them the same way Elasticaearch does, it seems.)

But if you say that the test suite is just supposed to be used with fulltextsesrch_elasticsearch in order to test this app (not the platform), feel free to close this PR.

@ArtificialOwl
Copy link
Member

I am quite happy right now with the fact that some search content can be ignored, this is why you enforce the presence of a word in the result by prefixing with a '+'

I really appreciate your work on this new extension, but can we keep the test as it is and have your parsing of the terms fit it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants