Skip to content

Commit 94644e6

Browse files
fix(docs): correct critical API documentation inaccuracies for v2.3.1
BREAKING DOCUMENTATION FIXES: - Fix API methods: Changed pipeline.query() and pipeline.ingest() to pipeline.run() - Fix response structure: Updated from { text, sources, metadata } to { success, query, results, error? } - Add accessibility indicators: Mark features as Public API, CLI Tool, or CLI/Internal - Add warnings for non-exported features (JWTValidator, InputSanitizer, Federated Learning, etc.) Changes: - API-Reference.md: Complete rewrite of pipeline methods to match actual exports - Introduction.md: Added feature availability legend with ✅📦🔧 indicators - Usage.md: Updated all code examples to use pipeline.run() with correct signatures - Added DOCUMENTATION_ACCURACY_AUDIT.md with full audit report Impact: Users following old docs would get "function not found" errors. These fixes prevent that. Related: Addresses discrepancies found in codebase audit against documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent e495f37 commit 94644e6

File tree

4 files changed

+459
-152
lines changed

4 files changed

+459
-152
lines changed

DOCUMENTATION_ACCURACY_AUDIT.md

Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
# Documentation Accuracy Audit Report
2+
3+
## RAG Pipeline Utils v2.3.1 Documentation Site
4+
5+
**Audit Date:** 2025-11-09
6+
**Audited Version:** 2.3.1
7+
**Auditor:** Automated codebase verification
8+
9+
---
10+
11+
## Executive Summary
12+
13+
This report assesses the accuracy, truthfulness, and honesty of the RAG Pipeline Utils documentation site against the actual codebase implementation. The audit identifies **critical discrepancies** between documented APIs and actual exports, as well as feature claims that are not publicly accessible.
14+
15+
**Overall Assessment:** ⚠️ **MAJOR ISSUES IDENTIFIED**
16+
17+
---
18+
19+
## Critical Issues
20+
21+
### 1. **API Method Names Mismatch** ❌ CRITICAL
22+
23+
**Documentation Claims:**
24+
25+
- `pipeline.query(query, options)` - Execute queries against pipeline
26+
- `pipeline.ingest(source, options)` - Ingest documents into pipeline
27+
28+
**Actual Implementation (src/core/create-pipeline.js:98):**
29+
30+
```javascript
31+
return { run: runOnce, cleanup() {} };
32+
```
33+
34+
**Reality:**
35+
36+
- The public API exports `pipeline.run()`, NOT `pipeline.query()` or `pipeline.ingest()`
37+
- All tests use `pipeline.run()` (verified in **tests**/)
38+
- The InstrumentedPipeline wrapper has query/ingest, but createRagPipeline doesn't
39+
40+
**Impact:** Users following documentation will get "function not found" errors.
41+
42+
**Recommendation:** Update all API documentation to use `pipeline.run()` or export a wrapper with query/ingest methods.
43+
44+
---
45+
46+
### 2. **Unaccessible Features Advertised as Available** ❌ CRITICAL
47+
48+
**Documentation Claims (Introduction.md):**
49+
50+
#### "Federated Learning"
51+
52+
- **Claimed:** "Distributed model training with privacy-preserving aggregation"
53+
- **Reality:** Code exists in `src/ai/federation/federated-learning-coordinator.js` but NOT exported in `src/index.js`
54+
- **Accessible:** NO
55+
56+
#### "Model Training Orchestrator"
57+
58+
- **Claimed:** "End-to-end training workflows with hyperparameter tuning"
59+
- **Reality:** Code exists in `src/ai/training/model-training-orchestrator.js` but NOT exported
60+
- **Accessible:** NO
61+
62+
#### "SLO Monitoring System"
63+
64+
- **Claimed:** "Built-in Service Level Objectives tracking with error budgets and alerting"
65+
- **Reality:** Code exists in `src/observability/slo-monitor.js` but NOT exported
66+
- **Accessible:** NO
67+
68+
#### "Plugin Marketplace"
69+
70+
- **Claimed:** "Certified plugin ecosystem with discovery and installation workflows"
71+
- **Reality:** Code exists in `src/core/plugin-marketplace/` but NOT exported
72+
- **Accessible:** NO
73+
74+
**Actual Public Exports (src/index.js:44-86):**
75+
76+
```javascript
77+
module.exports = {
78+
createRagPipeline,
79+
loadConfig,
80+
validateRagrc,
81+
normalizeConfig,
82+
pluginRegistry,
83+
logger,
84+
errorFormatter,
85+
createError,
86+
wrapError,
87+
ERROR_CODES,
88+
MultiModalProcessor,
89+
AdaptiveRetrievalEngine,
90+
DAGEngine,
91+
ParallelProcessor,
92+
eventLogger,
93+
metrics,
94+
AuditLogger,
95+
DataGovernance,
96+
HotReloadManager,
97+
createHotReloadManager,
98+
DevServer,
99+
createDevServer,
100+
};
101+
```
102+
103+
**Impact:** Users cannot access advertised enterprise features. Documentation misleads users about available functionality.
104+
105+
**Recommendation:** Either export these features OR clearly mark them as "Internal/CLI-only" in documentation.
106+
107+
---
108+
109+
## Moderate Issues
110+
111+
### 3. **createRagPipeline Parameter Mismatch** ⚠️ MODERATE
112+
113+
**Documentation (API-Reference.md:35):**
114+
115+
- Claims `pipeline.run()` returns object with `text`, `sources`, `metadata`
116+
117+
**Actual Implementation (src/core/create-pipeline.js:88):**
118+
119+
```javascript
120+
return { success: true, query, results };
121+
```
122+
123+
**Reality:**
124+
125+
- Returns `{ success, query, results }` NOT `{ text, sources, metadata }`
126+
- On error: `{ success: false, error: String(e.message || e) }`
127+
128+
**Impact:** Response structure doesn't match documentation.
129+
130+
---
131+
132+
### 4. **Missing Required Parameters in createRagPipeline** ⚠️ MODERATE
133+
134+
**Documentation Claims:**
135+
136+
- All parameters optional (API-Reference.md:28-33 shows "No" for Required column)
137+
138+
**Actual Implementation:**
139+
140+
- src/core/pipeline-factory.js:12-18 shows required check:
141+
142+
```javascript
143+
const required = ["loader", "embedder", "retriever", "llm"];
144+
const missing = required.filter((name) => !arguments[0] || !arguments[0][name]);
145+
if (missing.length > 0) {
146+
throw new Error(`Required components missing: ${missing.join(", ")}`);
147+
}
148+
```
149+
150+
**Reality:**
151+
152+
- Two implementations exist:
153+
1. `create-pipeline.js` - All optional (the one exported)
154+
2. `pipeline-factory.js` - Required parameters
155+
- Documentation doesn't specify which one applies
156+
157+
**Impact:** Ambiguity about parameter requirements.
158+
159+
---
160+
161+
## Accurate Components
162+
163+
### 5. **Core Exports Verified** ✅ ACCURATE
164+
165+
These exports match documentation:
166+
167+
- `createRagPipeline` - Factory function (exists)
168+
- `pluginRegistry` - Plugin system (exists)
169+
- `DAGEngine` - Workflow engine (exists)
170+
- `MultiModalProcessor` - AI capabilities (exists)
171+
- `AdaptiveRetrievalEngine` - Retrieval engine (exists)
172+
- `AuditLogger` - Enterprise logging (exists)
173+
- `DataGovernance` - Enterprise governance (exists)
174+
- `HotReloadManager` - Development tools (exists)
175+
- `DevServer` - Development server (exists)
176+
177+
---
178+
179+
### 6. **Interactive Tools Code Examples** ✅ MOSTLY ACCURATE
180+
181+
**CodePlayground Examples:**
182+
183+
- Plugin patterns are conceptually correct
184+
- Shows proper plugin contracts (embed, retrieve, generate)
185+
- Example structure matches expected plugin architecture
186+
187+
**Issues:**
188+
189+
- Examples show `pipeline.run()` which is correct ✅
190+
- But API docs show `pipeline.query()` which is wrong ❌
191+
192+
---
193+
194+
### 7. **Performance Calculator Numbers** ✅ REASONABLE ESTIMATES
195+
196+
**Latency Estimates:**
197+
198+
- OpenAI Embedder: ~120ms - Realistic for API calls
199+
- HNSW Retrieval: ~45ms - Reasonable for in-memory search
200+
- GPT-3.5: ~800ms - Typical for generation
201+
- GPT-4: ~1500ms - Reasonable for larger model
202+
203+
**Cost Estimates:**
204+
205+
- OpenAI Embedding: $0.13 per 1M tokens - Matches official pricing
206+
- GPT-3.5: $1.50 per 1M tokens - Approximate combined input/output cost
207+
- GPT-4: $30 per 1M tokens - Approximate pricing
208+
209+
**Verdict:** Estimates are reasonable ballpark figures, properly labeled as "Estimated Performance"
210+
211+
---
212+
213+
### 8. **CHANGELOG Accuracy** ✅ HONEST
214+
215+
**Current CHANGELOG (versioned_docs/version-2.3.1/CHANGELOG.md):**
216+
217+
- Lists only verifiable features actually present in v2.3.1
218+
- Removed fabricated version history
219+
- Removed unverifiable performance benchmarks
220+
- Honest about what's available
221+
222+
**Verdict:** CHANGELOG is accurate after recent corrections.
223+
224+
---
225+
226+
## Security Concerns
227+
228+
### 9. **Security Features Exist But Not Exported** ⚠️
229+
230+
**Available in Codebase:**
231+
232+
- `JWTValidator` (src/security/jwt-validator.js) - NOT exported
233+
- `InputSanitizer` (src/utils/input-sanitizer.js) - NOT exported
234+
235+
**Impact:** Security utilities exist but users can't access them via public API.
236+
237+
---
238+
239+
## Node.js Version Claims
240+
241+
### 10. **Node.js Support** ✅ ACCURATE
242+
243+
**Documentation Claims:**
244+
245+
- Node.js 18.x, 20.x, 22.x
246+
247+
**package.json (line 19-21):**
248+
249+
```json
250+
"engines": {
251+
"node": ">=18"
252+
}
253+
```
254+
255+
**Verdict:** Accurate - Supports Node 18+
256+
257+
---
258+
259+
## Recommendations
260+
261+
### Immediate Actions Required
262+
263+
1. **Fix API Method Names (CRITICAL)**
264+
265+
- Option A: Change all `pipeline.query()` / `pipeline.ingest()` references to `pipeline.run()`
266+
- Option B: Export a wrapper that provides query/ingest methods
267+
- Timeline: URGENT - This breaks user code
268+
269+
2. **Clarify Feature Availability (CRITICAL)**
270+
271+
- Add "Accessibility" column to feature lists:
272+
- ✅ Public API
273+
- 🔧 CLI Only
274+
- 📦 Internal
275+
- Remove claims about unavailable features OR export them
276+
277+
3. **Fix Response Structure Documentation (HIGH)**
278+
279+
- Update API docs to show actual return values: `{ success, query, results }`
280+
- Document error response format
281+
282+
4. **Add Disclaimer to Performance Calculator (MEDIUM)**
283+
- Current label: "Estimate throughput, latency, and costs"
284+
- Add: "Note: Estimates are approximate. Actual performance varies by workload, network conditions, and API rate limits."
285+
286+
### Long-term Improvements
287+
288+
5. **API Contract Testing**
289+
290+
- Add tests that verify documentation examples actually work
291+
- CI/CD check to prevent docs/code drift
292+
293+
6. **Export Missing Features**
294+
295+
- Decide which internal features should be public
296+
- Export valuable features like FederatedLearning, SLOMonitor
297+
298+
7. **Versioning Documentation**
299+
- Document breaking changes between internal implementations
300+
- Clarify when create-pipeline.js vs pipeline-factory.js should be used
301+
302+
---
303+
304+
## Conclusion
305+
306+
The documentation site contains **significant inaccuracies** that will confuse and frustrate users:
307+
308+
**Critical Problems:**
309+
310+
- ❌ API methods `query()` and `ingest()` don't exist on public pipeline
311+
- ❌ Major features advertised but not accessible via public API
312+
- ⚠️ Response structure mismatch
313+
314+
**What's Accurate:**
315+
316+
- ✅ Core exports match codebase
317+
- ✅ Performance estimates are reasonable
318+
- ✅ CHANGELOG is honest after recent cleanup
319+
- ✅ Node.js version requirements accurate
320+
321+
**Overall Grade:** **D+ (60/100)**
322+
323+
- Documentation exists and is well-structured
324+
- But contains critical functional inaccuracies
325+
- Advertises features users cannot access
326+
327+
**Priority:** Fix API method names immediately to prevent broken user code.

0 commit comments

Comments
 (0)