Skip to content

Commit aa67371

Browse files
Added README.md for vector-search tool
1 parent 998669a commit aa67371

File tree

2 files changed

+152
-10
lines changed

2 files changed

+152
-10
lines changed

README.md

Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ A Model Context Protocol server for interacting with MongoDB Databases and Mongo
1818
- [📄 Supported Resources](#supported-resources)
1919
- [⚙️ Configuration](#configuration)
2020
- [Configuration Options](#configuration-options)
21+
- [Vector Search & Embeddings](#vector-search-and-embeddings)
2122
- [Atlas API Access](#atlas-api-access)
2223
- [Configuration Methods](#configuration-methods)
2324
- [Environment Variables](#environment-variables)
@@ -320,6 +321,7 @@ NOTE: atlas tools are only available when you set credentials on [configuration]
320321
- `collection-storage-size` - Get the size of a collection in MB
321322
- `db-stats` - Return statistics about a MongoDB database
322323
- `export` - Export query or aggregation results to EJSON format. Creates a uniquely named export accessible via the `exported-data` resource.
324+
- `vector-search` - Execute a vector similarity search ($vectorSearch) over a collection. See [Vector Search & Embeddings](#vector-search--embeddings).
323325

324326
## 📄 Supported Resources
325327

@@ -361,6 +363,13 @@ The MongoDB MCP Server can be configured using multiple methods, with the follow
361363
| `exportTimeoutMs` | `MDB_MCP_EXPORT_TIMEOUT_MS` | 300000 | Time in milliseconds after which an export is considered expired and eligible for cleanup. |
362364
| `exportCleanupIntervalMs` | `MDB_MCP_EXPORT_CLEANUP_INTERVAL_MS` | 120000 | Time in milliseconds between export cleanup cycles that remove expired export files. |
363365
| `atlasTemporaryDatabaseUserLifetimeMs` | `MDB_MCP_ATLAS_TEMPORARY_DATABASE_USER_LIFETIME_MS` | 14400000 | Time in milliseconds that temporary database users created when connecting to MongoDB Atlas clusters will remain active before being automatically deleted. |
366+
| `vectorSearchPath` | `MDB_MCP_VECTOR_SEARCH_PATH` | <not set> | Default vector field path used by `vector-search` (V2 mode). If set together with `vectorSearchIndex`, the V2 vector search tool variant is enabled. |
367+
| `vectorSearchIndex` | `MDB_MCP_VECTOR_SEARCH_INDEX` | <not set> | Default vector search index name used by `vector-search` (V2 mode). Must be set with `vectorSearchPath` to enable V2 mode. |
368+
| `embeddingModelProvider` | `MDB_MCP_EMBEDDING_MODEL_PROVIDER` | azure-ai-inference | Embedding model provider identifier. Currently only `azure-ai-inference` is supported. |
369+
| `embeddingModelEndpoint` | `MDB_MCP_EMBEDDING_MODEL_ENDPOINT` | <not set> | Endpoint for the embedding model provider. Required for vector search. |
370+
| `embeddingModelApikey` | `MDB_MCP_EMBEDDING_MODEL_APIKEY` | <not set> | API key/credential for the embedding model provider. Required for vector search. |
371+
| `embeddingModelDeploymentName` | `MDB_MCP_EMBEDDING_MODEL_DEPLOYMENT_NAME` | <not set> | Deployment/model name to use when requesting embeddings. Required for vector search. |
372+
| `embeddingModelDimension` | `MDB_MCP_EMBEDDING_MODEL_DIMENSION` | <not set> | (Optional) Expected embedding dimension for validation (provider specific). |
364373

365374
#### Logger Options
366375

@@ -482,6 +491,140 @@ You can disable telemetry using:
482491

483492
> **💡 Platform Note:** For Windows users, see [Environment Variables](#environment-variables) for platform-specific instructions.
484493
494+
### Vector Search and Embeddings
495+
496+
The `vector-search` tool lets you run semantic similarity queries against a MongoDB collection using the `$vectorSearch` aggregation stage. This capability is disabled unless a valid embedding configuration is supplied (see below).
497+
498+
#### Overview
499+
500+
Two internal variants of the `vector-search` tool may register depending on configuration:
501+
502+
1. V1 (argument-driven): You supply `path` and optionally `index` as tool arguments each call.
503+
2. V2 (config-driven): You preconfigure both `vectorSearchPath` and `vectorSearchIndex` in server config; the tool omits those arguments and always searches that path/index.
504+
505+
Variant selection rules:
506+
507+
- If BOTH `MDB_MCP_VECTOR_SEARCH_PATH` and `MDB_MCP_VECTOR_SEARCH_INDEX` are set at startup → V2 registers.
508+
- If NEITHER (or only one) of those is set → V1 registers, and you must provide a `path` argument per invocation (and may provide `index`).
509+
- If embedding config is incomplete, the tool is not registered (you will see a warning in logs).
510+
511+
#### Required MongoDB Setup
512+
513+
1. A collection with a vector field (array of float/number values) containing stored embeddings.
514+
2. A vector search index created on that field (e.g. Atlas Search vector index) when you want to leverage indexing for performance/recall.
515+
516+
#### Embedding Configuration (Required)
517+
518+
You must configure an embedding provider so the server can transform the `queryText` you pass in into a numeric embedding vector. Current provider support:
519+
520+
- `azure-ai-inference` (default if none specified)
521+
522+
Set the following environment variables (or CLI args) for Azure AI Inference:
523+
524+
```bash
525+
export MDB_MCP_EMBEDDING_MODEL_ENDPOINT="https://your-azure-resource.services.ai.azure.com/models/embeddings?api-version=2024-05-01-preview"
526+
export MDB_MCP_EMBEDDING_MODEL_APIKEY="<azure-api-key>"
527+
export MDB_MCP_EMBEDDING_MODEL_DEPLOYMENT_NAME="text-embedding-3-large" # or your deployed embedding model
528+
# (Optional) if you want to assert embedding size
529+
export MDB_MCP_EMBEDDING_MODEL_DIMENSION=3072
530+
```
531+
532+
Without these, `vector-search` will not register.
533+
534+
#### Optional Vector Search Defaults (Enable V2 Mode)
535+
536+
To eliminate passing `path` (and optionally `index`) each call, set both:
537+
538+
```bash
539+
export MDB_MCP_VECTOR_SEARCH_PATH="embedding" # e.g. field path storing embeddings
540+
export MDB_MCP_VECTOR_SEARCH_INDEX="myVectorIndex" # name of the Atlas Search vector index
541+
```
542+
543+
If both are present at startup, the V2 variant is loaded and you no longer pass `path`/`index` arguments at call time. Remove one or both to revert to V1.
544+
545+
#### Usage Examples
546+
547+
##### Example 1: V1 Variant (no defaults configured)
548+
549+
Tool invocation arguments:
550+
551+
```json
552+
{
553+
"name": "vector-search",
554+
"arguments": {
555+
"database": "mydb",
556+
"collection": "articles",
557+
"queryText": "vector databases for personalization",
558+
"path": "embedding",
559+
"limit": 5,
560+
"numCandidates": 200,
561+
"includeVector": false
562+
}
563+
}
564+
```
565+
566+
##### Example 2: V2 Variant (defaults configured)
567+
568+
With `MDB_MCP_VECTOR_SEARCH_PATH=embedding` and `MDB_MCP_VECTOR_SEARCH_INDEX=myVectorIndex` set at startup:
569+
570+
```json
571+
{
572+
"name": "vector-search",
573+
"arguments": {
574+
"database": "mydb",
575+
"collection": "articles",
576+
"queryText": "vector databases for personalization",
577+
"limit": 5,
578+
"numCandidates": 200
579+
}
580+
}
581+
```
582+
583+
#### Returned Data
584+
585+
The tool returns an array of matched documents. By default the raw embedding field is excluded (set `includeVector: true` if you need it). Standard result size safeguards (`maxDocumentsPerQuery`, `maxBytesPerQuery`) still apply.
586+
587+
#### Adding a Custom Embedding Provider
588+
589+
You can extend the server to support additional embedding services (e.g. OpenAI, Hugging Face, Vertex AI) by implementing the `EmbeddingProvider` interface:
590+
591+
`src/embedding/embeddingProvider.ts`:
592+
593+
```ts
594+
export interface EmbeddingProvider {
595+
name: string;
596+
embed(input: string[]): Promise<number[][]>;
597+
}
598+
```
599+
600+
Steps:
601+
602+
1. Create a new file under `src/embedding/`, e.g. `myProviderEmbeddingProvider.ts`, implementing the interface.
603+
2. Add a new case in `EmbeddingProviderFactory.create()` & `isEmbeddingConfigValid()` matching a unique `embeddingModelProvider` string (e.g. `my-provider`).
604+
3. Document required env vars (e.g. `MDB_MCP_EMBEDDING_MODEL_ENDPOINT`, `MDB_MCP_EMBEDDING_MODEL_APIKEY`, etc. or new ones) and update README.
605+
4. (Optional) Support provider‑specific validation (dimension, model name) in `assertEmbeddingConfigValid`.
606+
5. Provide tests (unit + integration if vector search depends on it) ensuring your provider returns deterministic dimensionality.
607+
608+
After adding your provider, users enable it by setting:
609+
610+
```bash
611+
export MDB_MCP_EMBEDDING_MODEL_PROVIDER=my-provider
612+
# plus any provider-specific variables you defined
613+
```
614+
615+
If your provider requires different variable names, follow the existing naming convention: prefix with `MDB_MCP_` and document them.
616+
617+
#### Troubleshooting
618+
619+
| Symptom | Likely Cause | Action |
620+
| ------- | ------------ | ------ |
621+
| `vector-search` tool missing | Incomplete embedding config | Set endpoint, api key, deployment name env vars. Restart client. |
622+
| Error: "Embedding provider returned empty embedding" | Provider/network issue | Check credentials & network; verify model supports embeddings. |
623+
| Error requiring 'path' even though I set env vars | Only one of PATH/INDEX set | Set BOTH `MDB_MCP_VECTOR_SEARCH_PATH` and `MDB_MCP_VECTOR_SEARCH_INDEX` or remove both. |
624+
| High latency | Large `numCandidates` or remote model slowness | Lower `numCandidates`; verify model region proximity. |
625+
626+
---
627+
485628
### Atlas API Access
486629

487630
To use the Atlas API tools, you'll need to create a service account in MongoDB Atlas:
@@ -680,6 +823,6 @@ connecting to the Atlas API, your MongoDB Cluster, or any other external calls
680823
to third-party services like OID Providers. The behaviour is the same as what
681824
`mongosh` does, so the same settings will work in the MCP Server.
682825

683-
## 🤝Contributing
826+
## Contributing
684827

685828
Interested in contributing? Great! Please check our [Contributing Guide](CONTRIBUTING.md) for guidelines on code contributions, standards, adding new tools, and troubleshooting information.

src/common/config.ts

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ const OPTIONS = {
2222
"notificationTimeoutMs",
2323
"telemetry",
2424
"transport",
25-
"httpAuthMode",
26-
"azureManagedIdentityTenantId",
27-
"azureManagedIdentityClientId",
28-
"azureManagedIdentityAudience",
29-
"azureManagedIdentityRequiredRoles",
30-
"azureManagedIdentityAllowedAppIds",
31-
"azureManagedIdentityRoleMatchMode",
25+
"httpAuthMode",
26+
"azureManagedIdentityTenantId",
27+
"azureManagedIdentityClientId",
28+
"azureManagedIdentityAudience",
29+
"azureManagedIdentityRequiredRoles",
30+
"azureManagedIdentityAllowedAppIds",
31+
"azureManagedIdentityRoleMatchMode",
3232
"apiVersion",
3333
"authenticationDatabase",
3434
"authenticationMechanism",
@@ -69,7 +69,6 @@ const OPTIONS = {
6969
"embeddingModelDimension",
7070
"embeddingModelDeploymentName",
7171
"embeddingModelProvider",
72-
// Removed retry tunables (maxRetries & retryInitialDelayMs) now fixed internally
7372
],
7473
boolean: [
7574
"apiDeprecationErrors",
@@ -219,7 +218,7 @@ export interface UserConfig extends CliOptions {
219218
embeddingModelEndpoint?: string; // MDB_MCP_EMBEDDING_MODEL_ENDPOINT
220219
embeddingModelApikey?: string; // MDB_MCP_EMBEDDING_MODEL_APIKEY
221220
embeddingModelDeploymentName?: string; // MDB_MCP_EMBEDDING_MODEL_DEPLOYMENT_NAME
222-
embeddingModelDimension?: number; // MDB_MCP_EMBEDDING_MODEL_DIMENSION
221+
embeddingModelDimension?: number; // [Optional] MDB_MCP_EMBEDDING_MODEL_DIMENSION
223222
}
224223

225224
export const defaultUserConfig: UserConfig = {

0 commit comments

Comments
 (0)