Add multi-archive search support and identifiable search results #248

HelloWorld-25 · 2025-12-28T14:40:13Z

Issue : #230 As highlighted by #229, Searcher's getResults() only yields result's entry path. while convenient for single-archive search, it prevents implementing multi-ZIM search as results would only be path strings from multiple ZIMs.

We should then implement multiple ZIM search properly by

Binding addArche to Searcher (ref impl in #229)
Change Searcher API so that results can be identified

Our Changes :

Allowing multiple archives to be bound to a Searcher

1. Added a _archives list to track all registered archives.

 2. Introduced Searcher.addArchive(archive: Archive) method to register additional archives.

 3. Searcher.search(query) now searches across all bound archives.

Returning identifiable search results

    1. Introduced a new class: SearchResult containing:

              class SearchResult:
                    archive: Archive
                    path: str

    2. Updated SearchResultSet.__iter__() to yield SearchResult objects instead of strings.

   3. Results are now unambiguous and include both archive and entry path.

Updated type hints in search.pyi

   1. SearchResultSet.__iter__() now returns Iterator[SearchResult]

   2. Searcher.addArchive() is added to type hints

   3. Python API fully matches the new Cython implementation.

Benefits :

1.Enables multi-ZIM search

2.Ensures search results are uniquely identifiable

3.Clean, maintainable API, consistent with libzim C++ internals

4.Future-proof for features like ranking, filtering, and deduplication across multiple archives.

Backward Compatibility :

1.The API change from Iterator[str] → Iterator[SearchResult] is intentional to support multi-ZIM search.

2.Users can still access the path via result.path to simplify migration.

rgaudin

@HelloWorld-25 thank you for your PR! Very sorry about the delay to review it.

Please:

Don't mess up with code formatting. We use formatting-only PRs for that to prevent noise.
Keep imports at top of file.
Add tests for your changes.
Since it changes the API, add example usage to the README.

And… of course, make sure it work as intended by testing it. It doesn't compile at the moment because of your non-top imports. I guess you haven't tested.

rgaudin · 2026-01-07T08:38:31Z

libzim/libzim.pyx

+#   Search module                                                             #
 ###############################################################################

+from __future__ import annotations


Keep imports at top of file

Add multi-archive search support and identifiable search results

deda750

rgaudin requested changes Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add multi-archive search support and identifiable search results #248

Add multi-archive search support and identifiable search results #248

Uh oh!

HelloWorld-25 commented Dec 28, 2025 •

edited

Loading

Uh oh!

rgaudin left a comment

Uh oh!

rgaudin Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add multi-archive search support and identifiable search results #248

Are you sure you want to change the base?

Add multi-archive search support and identifiable search results #248

Uh oh!

Conversation

HelloWorld-25 commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Our Changes :

Allowing multiple archives to be bound to a Searcher

Returning identifiable search results

Updated type hints in search.pyi

Benefits :

Backward Compatibility :

Uh oh!

rgaudin left a comment

Choose a reason for hiding this comment

Please:

Uh oh!

rgaudin Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HelloWorld-25 commented Dec 28, 2025 •

edited

Loading