Skip to content

Conversation

@HelloWorld-25
Copy link

@HelloWorld-25 HelloWorld-25 commented Dec 28, 2025

Issue : #230 As highlighted by #229, Searcher's getResults() only yields result's entry path. while convenient for single-archive search, it prevents implementing multi-ZIM search as results would only be path strings from multiple ZIMs.

We should then implement multiple ZIM search properly by

Binding addArche to Searcher (ref impl in #229)
Change Searcher API so that results can be identified


Our Changes :

Allowing multiple archives to be bound to a Searcher

1. Added a _archives list to track all registered archives.

 2. Introduced Searcher.addArchive(archive: Archive) method to register additional archives.

 3. Searcher.search(query) now searches across all bound archives.

Returning identifiable search results

    1. Introduced a new class: SearchResult containing:

              class SearchResult:
                    archive: Archive
                    path: str

    2. Updated SearchResultSet.__iter__() to yield SearchResult objects instead of strings.

   3. Results are now unambiguous and include both archive and entry path.

Updated type hints in search.pyi

   1. SearchResultSet.__iter__() now returns Iterator[SearchResult]

   2. Searcher.addArchive() is added to type hints

   3. Python API fully matches the new Cython implementation.

Benefits :

1.Enables multi-ZIM search

2.Ensures search results are uniquely identifiable

3.Clean, maintainable API, consistent with libzim C++ internals

4.Future-proof for features like ranking, filtering, and deduplication across multiple archives.


Backward Compatibility :

1.The API change from Iterator[str] → Iterator[SearchResult] is intentional to support multi-ZIM search.

2.Users can still access the path via result.path to simplify migration.


Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HelloWorld-25 thank you for your PR! Very sorry about the delay to review it.

Please:

  • Don't mess up with code formatting. We use formatting-only PRs for that to prevent noise.
  • Keep imports at top of file.
  • Add tests for your changes.
  • Since it changes the API, add example usage to the README.

And… of course, make sure it work as intended by testing it. It doesn't compile at the moment because of your non-top imports. I guess you haven't tested.

# Search module #
###############################################################################
from __future__ import annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep imports at top of file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants