You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+105-2Lines changed: 105 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,6 +57,15 @@ After creating a cluster, you can create a new bucket by following the steps bel
57
57
58
58
Before we can index and search data, we need to transform it into a format that can be used by the vector search engine. We will be using [Couchbase Vector Search](https://docs.couchbase.com/server/current/fts/fts-vector-search.html) for this workshop.
59
59
60
+
There are two options in this workshop to generate vector embeddings from data:
61
+
62
+
1. Use the `/embed` endpoint provided in this repository to transform the data. *You need an OpenAI API key to use this option.*
63
+
2. Import directly the data with *already generated embeddings* into the Couchbase bucket. You can use the data provided in the `./data/individual_items_with_embedding` directory.
64
+
65
+
Follow the instructions below for the option you choose.
66
+
67
+
### Option 1: Use the `/embed` Endpoint
68
+
60
69
Provided in this repository is an Express.js application that will expose a `/embed` endpoint to transform the data.
61
70
62
71
The Codespace environment already has all the dependencies installed. You can start the Express.js application by running the following command:
@@ -65,15 +74,109 @@ The Codespace environment already has all the dependencies installed. You can st
65
74
node server.js
66
75
```
67
76
68
-
The repository also has a sample set of data in the `./data` directory. You can transform this data by making a POST request to the `/embed` endpoint providing the paths to the data files as an array in the request body.
77
+
The repository also has a sample set of data in the `./data/individual_items` directory. You can transform this data by making a POST request to the `/embed` endpoint providing the paths to the data files as an array in the request body.
69
78
70
79
```bash
71
80
curl -X POST http://localhost:3000/embed -H "Content-Type: application/json" -d '["./data/data1.json", "./data/data2.json"]'
72
81
```
73
82
74
83
The data has now been converted into vector embeddings and stored in the Couchbase bucket that you created earlier.
75
84
85
+
### Option 2: Import Data with Embeddings
86
+
87
+
If you choose to import the data directly, you can use the data provided in the `./data/individual_items_with_embedding` directory. The data is already in the format required to enable vector search on it.
88
+
89
+
Once you have opened this repositority in a [GitHub Codespace](https://codespaces.new/hummusonrails/vector-search-nodejs-workshop), you can import the data with the generated embeddings using the [Couchbase shell](https://couchbase.sh/docs/#_importing_data) from the command line.
90
+
91
+
#### Edit the Config File
92
+
93
+
First, edit the `./config_file/config` file with your Couchbase Capella information.
94
+
95
+
Under the `[[cluster]]` section:
96
+
97
+
- Replace the empty string value for `identifier` with the name of the cluster you created earlier.
98
+
- Replace the empty string value for `connstr` with the connection string to your cluster.
Replace the `name_of_your_bucket` with the name of your bucket you created.
177
+
178
+
You can perform a santity check to ensure the index was created by querying forall the indexes and you should see the `vector_search_index`in the list:
0 commit comments