Skip to content

Commit 7da0e20

Browse files
author
Ben Greenberg
committed
make conditional openai api key
1 parent 69149bf commit 7da0e20

File tree

3 files changed

+69
-17
lines changed

3 files changed

+69
-17
lines changed

README.md

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,30 @@ There are two options in this workshop to generate vector embeddings from data:
6262
1. Use the `/embed` endpoint provided in this repository to transform the data. *You need an OpenAI API key to use this option.*
6363
2. Import directly the data with *already generated embeddings* into the Couchbase bucket. You can use the data provided in the `./data/individual_items_with_embedding` directory.
6464

65+
### Using Local Embeddings vs OpenAI API
66+
67+
This workshop gives you the flexibility to choose between generating embeddings locally or using the OpenAI API.
68+
69+
- If you have pre-generated embeddings (provided in the repository), you can use the `useLocalEmbedding` flag to avoid using the OpenAI API.
70+
- If you want to generate embeddings dynamically from the text, you need to provide your OpenAI API key and set the `useLocalEmbedding` flag to `false`.
71+
72+
#### Setting the `USE_LOCAL_EMBEDDING` Flag
73+
74+
In the `.env` file, set the `USE_LOCAL_EMBEDDING` flag to control the mode:
75+
76+
```bash
77+
USE_LOCAL_EMBEDDING=true
78+
```
79+
80+
* `true`: Use pre-generated embeddings (no OpenAI API key required).
81+
* `false`: Use OpenAI API to generate embeddings (OpenAI API key required).
82+
83+
Make sure to set the `OPENAI_API_KEY` in the `.env` file if you set `USE_LOCAL_EMBEDDING` to `false`.
84+
85+
```bash
86+
OPENAI_API_KEY=your_openai_api_key
87+
```
88+
6589
Follow the instructions below for the option you choose.
6690

6791
### Option 1: Use the `/embed` Endpoint
@@ -74,27 +98,27 @@ The Codespace environment already has all the dependencies installed. You can st
7498
node server.js
7599
```
76100

77-
The repository also has a sample set of data in the `./data/individual_items` directory. You can transform this data by making a POST request to the `/embed` endpoint providing the paths to the data files as an array in the request body.
101+
The repository also has a sample set of data in the `./data/individual_items` directory. You can transform this data by making a `POST` request to the `/embed` endpoint providing the paths to the data files as an array in the request body.
78102

79103
```bash
80104
curl -X POST http://localhost:3000/embed -H "Content-Type: application/json" -d '["./data/data1.json", "./data/data2.json"]'
81105
```
82106

83107
The data has now been converted into vector embeddings and stored in the Couchbase bucket that you created earlier.
84108

85-
### Option 2: Import Data with Embeddings
109+
### Option 2: Import Data with Pre-Generated Embeddings
86110

87111
If you choose to import the data directly, you can use the data provided in the `./data/individual_items_with_embedding` directory. The data is already in the format required to enable vector search on it.
88112

89-
Once you have opened this repositority in a [GitHub Codespace](https://codespaces.new/hummusonrails/vector-search-nodejs-workshop), you can import the data with the generated embeddings using the [Couchbase shell](https://couchbase.sh/docs/#_importing_data) from the command line.
113+
Once you have opened this repository in a [GitHub Codespace](https://codespaces.new/hummusonrails/vector-search-nodejs-workshop), you can import the data with the generated embeddings using the [Couchbase shell](https://couchbase.sh/docs/#_importing_data) from the command line.
90114

91115
#### Edit the Config File
92116

93117
First, edit the `./config_file/config` file with your Couchbase Capella information.
94118

95119
You can find a pre-filled config file in the Couchbase Capella dashboard under the "Connect" tab.
96120

97-
Once you click on the "Connect" tab, you will see a section called "Couchbase Shell" among the options on the left-hand menu. You can choose the access credentials for the shell and copy the config file contet provided and paste it in the `./config_file/config` file.
121+
Once you click on the "Connect" tab, you will see a section called "Couchbase Shell" among the options on the left-hand menu. You can choose the access credentials for the shell and copy the config file content provided and paste it in the ./config_file/config file.
98122

99123
<img src="workshop_images/get_cbshell_config.png" alt="Get Couchbase Shell config file data" width="50%">
100124

@@ -109,7 +133,7 @@ cd data/individual_items_with_embedding
109133
Open up Couchbase shell passing in an argument with the location of the config file defining your Couchbase information:
110134

111135
```bash
112-
cbsh --config-dir ../config_file
136+
cbsh --config-dir ../config-file
113137
```
114138

115139
Once in the shell, run the `nodes` command to just perform a sanity check that you are connected to the correct cluster.
@@ -131,13 +155,13 @@ This should output something similar to the following:
131155
Now, import the data into the bucket you created earlier:
132156

133157
```bash
134-
> ls *_with_embedding.json | each { |it| open $it.name | wrap content | insert id $in.content._default.name } | doc upsert
158+
ls *_with_embedding.json | each { |it| open $it.name | wrap content | insert id $in.content._default.name } | doc upsert
135159
```
136160
137161
Once this is done, you can perform a sanity check to ensure the documents were inserted by running a query to select just one:
138162
139163
```bash
140-
> query "select * from name_of_your_bucket._default._default limit 1"
164+
query "select * from name_of_your_bucket._default._default limit 1"
141165
```
142166
143167
Replace the `name_of_your_bucket` with the name of your bucket you created.
@@ -151,15 +175,15 @@ You will use Couchbase Shell to perform this action as well.
151175
Run the following command from inside the shell:
152176
153177
```bash
154-
> vector create-index --bucket name_of_your_bucket --similarity-metric dot_product vector-search-index embedding 1536
178+
vector create-index --bucket name_of_your_bucket --similarity-metric dot_product vector-search-index embedding 1536
155179
```
156180
157181
Replace the `name_of_your_bucket` with the name of your bucket you created.
158182
159183
You can perform a santity check to ensure the index was created by querying for all the indexes and you should see the `vector_search_index` in the list:
160184
161185
```bash
162-
> query indexes
186+
query indexes
163187
```
164188
165189
## Search Data
@@ -178,9 +202,9 @@ Once the server is running, you can either search using the provided query with
178202
179203
### Search with the provided query
180204
181-
You can search for similar items based on the provided query item by making a POST request to the `/search` endpoint.
205+
You can search for similar items based on the provided query item by making a `POST` request to the `/search` endpoint.
182206
183-
Here is an example cURL command to search for similar items based on the provided query item:
207+
Here is an example `cURL` command to search for similar items based on the provided query item:
184208
185209
```bash
186210
curl -X POST http://localhost:3000/search \
@@ -194,12 +218,13 @@ As you can see, we use the `useLocalEmbedding` flag to indicate that we want to
194218
195219
If you want to search for similar items based on your own query item, you can provide the query item in the request body.
196220
197-
The query will be automatically converted into a vector embedding using the OpenAI API. You need to provide your OpenAI API key in the `.env` file before starting the Express.js application.
221+
The query will be automatically converted into a vector embedding using the OpenAI API. You need to provide your OpenAI API key in the `.env file` before starting the Express.js application.
198222
199223
Here is an example cURL command to search for similar items based on your own query item:
200224
201225
```bash
202226
curl -X POST http://localhost:3000/search \
203227
-H "Content-Type: application/json" \
204228
-d '{"q": "your_query_item"}'
205-
```
229+
```
230+

helpers.js

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,24 @@
11
const openai = require('openai');
22
const couchbase = require('couchbase');
3+
require('dotenv').config();
34

4-
const openaiclient = new openai.OpenAI({ apiKey: process.env.OPENAI_API_KEY });
5+
const useLocalEmbedding = process.env.USE_LOCAL_EMBEDDING === 'true';
6+
7+
let openaiclient = null;
8+
if (!useLocalEmbedding) {
9+
// Initialize OpenAI client only if local embedding is not being used
10+
openaiclient = new openai.OpenAI({ apiKey: process.env.OPENAI_API_KEY });
11+
}
512

613
async function generateQueryEmbedding(query) {
14+
if (useLocalEmbedding) {
15+
throw new Error('Local embedding mode is enabled, but no local embedding function is provided here.');
16+
}
17+
18+
if (!openaiclient) {
19+
throw new Error('OpenAI client is not initialized.');
20+
}
21+
722
const response = await openaiclient.embeddings.create({
823
model: 'text-embedding-ada-002',
924
input: query,
@@ -26,7 +41,14 @@ async function init() {
2641
async function storeEmbedding(content, id) {
2742
try {
2843
console.log(`Generating embedding for ${id}...`);
29-
const embedding = await generateQueryEmbedding(content);
44+
45+
let embedding;
46+
if (useLocalEmbedding) {
47+
throw new Error('Local embedding mode is enabled, but storeEmbedding function is not set up for local embedding.');
48+
} else {
49+
embedding = await generateQueryEmbedding(content);
50+
}
51+
3052
console.log(`Embedding generated for ${id}.`);
3153

3254
console.log(`Initializing Couchbase connection for ${id}...`);

server.js

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,13 @@ const app = express();
1010
app.use(express.json());
1111
app.use(cors());
1212

13-
// Initialize OpenAI client
14-
const openaiclient = new openai.OpenAI({ apiKey: process.env.OPENAI_API_KEY });
13+
const useLocalEmbedding = process.env.USE_LOCAL_EMBEDDING === 'true';
14+
15+
let openaiclient = null;
16+
if (!useLocalEmbedding) {
17+
// Initialize OpenAI client only if local embedding is not being used
18+
openaiclient = new openai.OpenAI({ apiKey: process.env.OPENAI_API_KEY });
19+
}
1520

1621
// Import the helper functions
1722
const { generateQueryEmbedding, storeEmbedding } = require('./helpers');

0 commit comments

Comments
 (0)