From 656a74d960eef4ed082346ce2f901f5bc76c9927 Mon Sep 17 00:00:00 2001 From: Cameron Boehmer Date: Sat, 24 Jan 2026 09:17:34 -0800 Subject: [PATCH 1/2] refine implementation summary --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fede65240..c8d7762c3 100644 --- a/README.md +++ b/README.md @@ -51,8 +51,9 @@ Are you frustrated with vector database retrieval accuracy for long professional Inspired by AlphaGo, we propose **[PageIndex](https://vectify.ai/pageindex)** — a **vectorless**, **reasoning-based RAG** system that builds a **hierarchical tree index** from long documents and uses LLMs to **reason** *over that index* for **agentic, context-aware retrieval**. It simulates how *human experts* navigate and extract knowledge from complex documents through *tree search*, enabling LLMs to *think* and *reason* their way to the most relevant document sections. PageIndex performs retrieval in two steps: -1. Generate a “Table-of-Contents” **tree structure index** of documents -2. Perform reasoning-based retrieval through **tree search** +1. Indexing: LLM generates an outline (tree structure) of the document with a summary for each node +2. Querying: LLM traverses the tree summaries in the same way agent harnesses unpack skills, by reading summaries descriptions and loading the section +
From 29b78c0fc9d713b176ee37eab53a069851078d05 Mon Sep 17 00:00:00 2001 From: Cameron Boehmer Date: Sat, 24 Jan 2026 13:46:44 -0800 Subject: [PATCH 2/2] Revise indexing and querying steps in README Updated the indexing and querying steps for clarity and detail. --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c8d7762c3..96044b4a3 100644 --- a/README.md +++ b/README.md @@ -51,9 +51,8 @@ Are you frustrated with vector database retrieval accuracy for long professional Inspired by AlphaGo, we propose **[PageIndex](https://vectify.ai/pageindex)** — a **vectorless**, **reasoning-based RAG** system that builds a **hierarchical tree index** from long documents and uses LLMs to **reason** *over that index* for **agentic, context-aware retrieval**. It simulates how *human experts* navigate and extract knowledge from complex documents through *tree search*, enabling LLMs to *think* and *reason* their way to the most relevant document sections. PageIndex performs retrieval in two steps: -1. Indexing: LLM generates an outline (tree structure) of the document with a summary for each node -2. Querying: LLM traverses the tree summaries in the same way agent harnesses unpack skills, by reading summaries descriptions and loading the section - +1. Indexing: LLM turns the document into a tree of semantically-delineated sections (e.g. doc -> chapter -> section), with each node having a summary of its subtree +2. Querying: LLM traverses the tree of summaries in the same way agent harnesses unpack skills—e.g., by having read the TOC into the context window, and then loading chapters' (and so on) children as necessary