watson-developer-cloud · zach-shu · Oct 10, 2025 · Oct 10, 2025 · Oct 13, 2025 · Oct 21, 2025
diff --git a/...nowledge/elasticsearch-install-and-setup/ICD_Elasticsearch_install_and_setup.md b/...nowledge/elasticsearch-install-and-setup/ICD_Elasticsearch_install_and_setup.md
@@ -0,0 +1,235 @@
+# How to set up Elasticsearch from IBM Cloud and integrate it with Agent Knowledge in watsonx Orchestrate
+This is a documentation about how to set up Elasticsearch from IBM Cloud and create Agent Knowledge in watsonx Orchestrate using Elasticsearch index.
+
+## Table of contents:
+* [Step 1: Provision an Elasticsearch instance on IBM Cloud](#step-1-provision-an-elasticsearch-instance-on-ibm-cloud)
+* [Step 2: Set up Kibana to connect to Elasticsearch](#step-2-set-up-kibana-to-connect-to-elasticsearch)
+* [Step 3: Create an Elasticsearch index (keyword-search)](#step-3-create-an-elasticsearch-index-keyword-search)
+* [Step 4: Enable semantic search with ELSER](#step-5-enable-semantic-search-with-elser)
+
+
+## Step 1: Provision an Elasticsearch instance on IBM Cloud
+* Create an [IBM Cloud account](https://cloud.ibm.com/registration) if you don't have one.
+* Provision a Databases for Elasticsearch instance from the [IBM Cloud catalog](https://cloud.ibm.com/catalog/databases-for-elasticsearch).  
+  **A platinum plan with at least 4GB RAM is required in order to use the advanced ML features,
+  such as [Elastic Learned Sparse EncodeR (ELSER)](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html)**
+* Create a service credentials from the left-side menu and find the `hostname`, `port`, `username` and `password`.
+  The credentials will be used to connect to Kibana and watsonx Orchestrate at next steps. You can use admin userid and password as well.
+  Please refer to [this doc](https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-user-management&interface=ui#user-management-elasticsearch-ibm-superuser) to learn more about different user roles.
+
+
+## Step 2: Set up Kibana to connect to Elasticsearch
+* Install Docker so that you can pull the Kibana container image later. You can follow the detailed [docker install guide](./how_to_install_docker.md)
+* Create a kibana config folder, for example
+  `mkdir -p ~/.kibana/config`
+* Download the certificate from the Elasticsearch instance overview page, and move the downloaded file to the kibana config folder
+* Under the kibana config folder, create a YAML file called `kibana.yml`. Inside the file, you need the following Kibana configuration settings:
+    ```YAML
+    elasticsearch.ssl.certificateAuthorities: "/usr/share/kibana/config/<your-certificate-file-name>"
+    elasticsearch.username: "<username>"
+    elasticsearch.password: "<password>"
+    elasticsearch.hosts: ["https://<hostname:port>"]
+    server.name: "kibana"
+    server.host: "0.0.0.0"
+    ```
+  Notes:
+    - Find the `hostname`, `port`, `username`, `password` from the service credentials created at Step 1
+    - `elasticsearch.ssl.certificateAuthorities` is the location where the kibana deployment will look for the certificate in the docker container.
+      `/usr/share/kibana/config/` is the default Kibana's config directory in the container
+
+* Verify the Elasticsearch instance endpoint and find its version
+    * Run
+      ```bash
+      curl -u <username>:<password> --cacert <path-to-cert> https://<hostname:port>
+      ```
+    * Find the version number from the output
+
+* Download and start the Kibana container
+  ```bash
+  docker run -it --name kibana --rm \
+  -v <path_to_your_kibana_config_folder>:/usr/share/kibana/config \
+  -p 5601:5601 docker.elastic.co/kibana/kibana:<kibana_version>
+  ```
+  Once Kibana has connected to your Databases for Elasticsearch deployment and is running successfully, you will see the output in your terminal.
+  ```
+  [2024-01-02T16:43:29.378+00:00][INFO ][http.server.Kibana] http server running at http://0.0.0.0:5601
+  [2024-01-02T16:46:13.777+00:00][INFO ][status] Kibana is now available
+  ```
+
+## Step 3: Create an Elasticsearch index (keyword-search)
+This step is to create an Elasticsearch index with default settings for quick testing and verification.
+With default settings, an Elasticsearch index does keyword search.
+
+* Open http://0.0.0.0:5601 in browser and log into Kibana using the `username` and `password` from the service credentials of the Elasticsearch instance
+* Navigate to the indices page http://localhost:5601/app/enterprise_search/content/search_indices
+* Click on `Create a new index`, choose `Use the API`, and follow the steps there to create a new Elasticsearch index with default settings
+* Go to the overview page for your newly created index, follow the steps there to verify your Elasticsearch index.  
+  Notes:
+    * Generate an API key, and you will use the API key for authentication and authorization for this specific Elasticsearch index
+    * Use your `hostname` and `port` from the service credentials of the Elasticsearch instance to build `ES_URL`
+      ```bash 
+      export ES_URL=https://<hostname:port>
+      ```
+    * Append `--cacert <path-to-your-cert>` to the cURL for SSL connection or append `--insecure` to the cURL commands to ignore the certificate
+    * If you are able to run the `Build your first search query` command at the last step, your Elasticsearch index has been set up successfully!
+
+
+## Step 4: Enable semantic search with ELSER
+This step is to enable semantic search using ELSER. Here are the tutorials from Elasticsearch doc:  
+ELSER v1: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/semantic-search-elser.html  
+ELSER v2: https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-elser.html
+
+**IMPORTANT NOTE**: ELSER v2 has become available since Elasticsearch 8.11. It is preferred to use ELSER v2 if it is available.
+
+The following steps are based on ELSER v2 model:
+### Create environment variables for ES credentials
+  ```bash
+  export ES_URL=https://<hostname:port>
+  export ES_USER=<username>
+  export ES_PASSWORD=<password>
+  export ES_CACERT=<path-to-your-cert>
+  ```  
+You can find the credentials from the service credentials of your Elasticsearch instance.
+&nbsp;
+### Enable ELSER model (v2)
+ELSER model is not enabled by default, but you can enable it in Kibana. Please follow the [download-deploy-elser instructions](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#download-deploy-elser) to do it.
+
+Note: `.elser_model_2_linux-x86_64` is an optimized version of the ELSER v2 model and is preferred to use if it is available. Otherwise, use `.elser_model_2` for the regular ELSER v2 model or `.elser_model_1` for ELSER v1.
+
+
+### Load data into Elasticsearch
+In Kibana, you can upload a data file to Elasticsearch cluster using the Data Visualizer in the Machine Learning UI http://localhost:5601/app/ml/filedatavisualizer.  
+
+As an example, you can download [wa-docs-100](./assets/wa_docs_100.tsv) TSV data and upload it to Elasticsearch. 
+This dataset contains documents processed from the watsonx Assistant product documents. There are three columns in this TSV file, 
+`title`, `section_title` and `text`. The columns are extracted from the original documents. Specifically, 
+each `text` value is a small chunk of text split from the original document. 
+
+In Kibana,
+* Select your downloaded file to upload  
+  <img src="assets/upload_file_though_data_visualizer.png" width="463" height="248" />
+* Click `Override settings` and then check `Has header row` checkbox because the example dataset has header row  
+  <img src="assets/override_settings_for_uploaded_file.png" width="553" height="446" />
+* Import the data to a new Elasticsearch index and name it `wa-docs`  
+  <img src="assets/import_data_to_new_index.png" width="509" height="356" />  
+Once finished, you have created an index for the data you just uploaded.
+### Create an index with mappings for ELSER output
+  ```bash
+  curl -X PUT "${ES_URL}/search-wa-docs?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
+  -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
+  {
+    "mappings": {
+      "_source": {
+          "excludes": [
+            "ml.tokens"
+          ]
+      },
+      "properties": {
+        "ml.tokens": {
+          "type": "sparse_vector"
+        },
+        "text": {
+          "type": "text"
+        }
+      }
+    }
+  }'
+  ```
+Notes:
+* `search-wa-docs` will be your index name.
+* `ml.tokens` is the field that will keep ELSER output when data is ingested.
+* `text` is the input filed for the inference processor. In the example dataset, the name of the input field is `text` which will be used by ELSER model to process.
+* `sparse_vector` type is for ELSER v2. For ELSER v1, please use `rank_features` type.
+* Learn more about [elser-mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-elser.html#elser-mappings) from the tutorial.
+
+### Create an ingest pipeline with an inference processor
+Create an ingest pipeline with an inference processor to use ELSER to infer against the data that will be ingested in the pipeline.
+  ```bash
+  curl -X PUT "${ES_URL}/_ingest/pipeline/elser-v2-test?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
+  -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
+  {
+    "processors": [
+      {
+        "inference": {
+          "model_id": ".elser_model_2_linux-x86_64",
+          "target_field": "ml",
+          "field_map": {
+            "text": "text_field"
+          },
+          "inference_config": {
+            "text_expansion": {
+              "results_field": "tokens"
+            }
+          }
+        }
+      }
+    ]
+  }'
+  ```
+Notes:
+* `elser-v2-test` is the name of the ingest pipeline with an inference processor using ELSER v2 model.
+* `.elser_model_2_linux-x86_64` is an optimized version of the ELSER v2 model and is preferred to use if it is available. Otherwise, use `.elser_model_2` for the regular ELSER v2 model or `.elser_model_1` for ELSER v1.
+* `"text": "text_field"` maps the `text` field from an index to the input field of the ELSER model. `text_field` is the default input field of the ELSER model when it is deployed. You may need to update it if you configure a different input field when deploying your ELSER model.
+* Learn more about [inference-ingest-pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-elser.html#inference-ingest-pipeline) from the tutorial
+
+### Ingest the data through the inference ingest pipeline
+Create the tokens from the text by reindexing the data through the inference pipeline that uses ELSER as the inference model.
+  ```bash
+  curl -X POST "${ES_URL}/_reindex?wait_for_completion=false&pretty" -u "${ES_USER}:${ES_PASSWORD}" \
+  -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
+  {
+    "source": {
+      "index": "wa-docs"
+    },
+    "dest": {
+      "index": "search-wa-docs",
+      "pipeline": "elser-v2-test"
+    }
+  }'
+  ```
+* `wa-docs` is the index you created when uploading the example file to Elasticsearch cluster. It contains the text data.
+* `search-wa-docs` is the search index that has ELSER output field.
+* `elser-v2-test` is the ingest pipeline with an inference processor using ELSER v2 model.
+### Semantic search by using the text_expansion query
+To perform semantic search, use the `text_expansion` query, and provide the query text and the ELSER model ID.
+The example below uses the query text "How to set up custom extension?", the `ml.tokens` field contains
+the generated ELSER output:
+  ```bash
+  curl -X GET "${ES_URL}/search-wa-docs/_search?pretty" -u "${ES_USER}:${ES_PASSWORD}" \
+  -H "Content-Type: application/json" --cacert "${ES_CACERT}" -d'
+  {
+     "query":{
+        "text_expansion":{
+           "ml.tokens":{
+              "model_id":".elser_model_2_linux-x86_64",
+              "model_text":"how to set up custom extension?"
+           }
+        }
+     }
+  }'
+  ```
+Notes:
+* You can also use `API_KEY` for authorization. You can generate an `API_KEY` for your search index on the index overview page in Kibana.
+* Learn more about [text-expansion-query](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-elser.html#text-expansion-query) from the tutorial.
+
+### Enable semantic search for Agent Knowledge on watsonx Orchestrate
+To enable semantic search for your Agent Knowledge on watsonx Orchestrate, you just need to specify the following query body in your Elasticsearch Knowledge source configuration:
+  ```json
+  {
+    "query":{
+      "text_expansion":{
+        "ml.tokens":{
+          "model_id":".elser_model_2_linux-x86_64",
+          "model_text":"$QUERY"
+        }
+      }
+    }
+  }
+  ```
+  <img src="assets/query_body_for_elasticsearch.png" width="547" height="638" />  
+
+Notes:
+* `$QUERY` is the query variable that contains the user search query by default.
+* `.elser_model_2_linux-x86_64` is an optimized version of the ELSER v2 model and is preferred to use if it is available. Otherwise, use `.elser_model_2` for the regular ELSER v2 model or `.elser_model_1` for ELSER v1.
+
+Learn more about configuring Agent Knowledge from [Elasticsearch integration with Agent Knowledge in watsonx Orchestrate](README.md#elasticsearch-integration-with-agent-knowledge-in-watsonx-orchestrate)
diff --git a/agent_knowledge/elasticsearch-install-and-setup/README.md b/agent_knowledge/elasticsearch-install-and-setup/README.md
@@ -0,0 +1,26 @@
+# Elasticsearch Installation and Setup Documentation
+
+This directory contains documentation for installing and setting up Elasticsearch along with related guides and integrations.
+
+## Elasticsearch Setup
+- [Install Docker or Docker alternatives](how_to_install_docker.md): A guide explaining Docker and Docker Compose installation options, essential for running Elasticsearch-related applications.
+- [Set up Elasticsearch from IBM Cloud and integrate it with watsonx Orchestrate](ICD_Elasticsearch_install_and_setup.md): Instructions for provisioning Elasticsearch instance on IBM Cloud and setting up Agent Knowledge in watsonx Orchestrate.
+- [Set up watsonx Discovery (aka Elasticsearch on-prem) and integrate it with watsonx Orchestrate on-prem](watsonx_discovery_install_and_setup.md): Documentation for setting up watsonx Discovery (aka Elasticsearch on-prem) and integrating it with watsonx Orchestrate on-prem.
+
+## Elasticsearch integration with Agent Knowledge in watsonx Orchestrate 
+### Option 1: Add Knowledge to your agents in the Agent Builder UI
+See [Connecting to an Elasticsearch content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-elasticsearch-content-repository) in watsonx Orchestrate documentation for more details.
+
+### Option 2: Create Knowledge bases via watsonx Orchestrate ADK (Agent Development Kit)
+See [Creating external knowledge bases with Elasticsearch](https://developer.watson-orchestrate.ibm.com/knowledge_base/build_kb#elasticsearch) in ADK documentation for more details.
+
+### Configure the Advanced Elasticsearch Settings
+There are two settings under `Advanced Elasticsearch Settings` for using custom query body and custom filters to achieve advanced search use cases. See the guide [How to configure Advanced Elasticsearch Settings](./how_to_configure_advanced_elasticsearch_settings.md) for more details. 
+
+### Federated search
+You can follow the guide [here](federated_search.md) to run queries across multiple indexes within your Elasticsearch cluster.
+
+## Document Ingestion with Elasticsearch
+- [Set up the web crawler in Elasticsearch](how_to_use_web_crawler_in_elasticsearch.md): Guide for setting up and using the web crawler in Elasticsearch and connecting it to Agent Knowledge in watsonx Orchestrate.
+- [Working with PDF and office documents in Elasticsearch](how_to_index_pdf_and_office_documents_elasticsearch.md): Guide for working with PDF and Office Documents in Elasticsearch, including indexing and connecting to Agent Knowledge in watsonx Orchestrate.
+- [Set up text embedding models in Elasticsearch](text_embedding_deploy_and_use.md): Instructions for setting up and using 3rd-party text embeddings for dense vector search in Elasticsearch.
diff --git a/...edge/elasticsearch-install-and-setup/assets/add_crawl_rules_for_web_crawler.png b/...edge/elasticsearch-install-and-setup/assets/add_crawl_rules_for_web_crawler.png
diff --git a/...edge/elasticsearch-install-and-setup/assets/advanced_elasticsearch_settings.png b/...edge/elasticsearch-install-and-setup/assets/advanced_elasticsearch_settings.png
diff --git a/...icsearch-install-and-setup/assets/config_query_source_when_use_es_extension.png b/...icsearch-install-and-setup/assets/config_query_source_when_use_es_extension.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/config_result_content.png b/agent_knowledge/elasticsearch-install-and-setup/assets/config_result_content.png
diff --git a/...ledge/elasticsearch-install-and-setup/assets/connect_to_elasticsearch_index.png b/...ledge/elasticsearch-install-and-setup/assets/connect_to_elasticsearch_index.png
diff --git a/...earch-install-and-setup/assets/conversation-search-example-with-web-crawler.png b/...earch-install-and-setup/assets/conversation-search-example-with-web-crawler.png
diff --git a/...install-and-setup/assets/conversational_search_example_python_doc_ingestion.png b/...install-and-setup/assets/conversational_search_example_python_doc_ingestion.png
diff --git a/..._knowledge/elasticsearch-install-and-setup/assets/elasticsearch-integration.png b/..._knowledge/elasticsearch-install-and-setup/assets/elasticsearch-integration.png
diff --git a/...search-install-and-setup/assets/elasticsearch-pdfofficedocs-watsonx-example.png b/...search-install-and-setup/assets/elasticsearch-pdfofficedocs-watsonx-example.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/example_volume_mount.png b/agent_knowledge/elasticsearch-install-and-setup/assets/example_volume_mount.png
diff --git a/...csearch-install-and-setup/assets/federated_search_official_search_extension.png b/...csearch-install-and-setup/assets/federated_search_official_search_extension.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/fscrawler-docker-config.zip b/agent_knowledge/elasticsearch-install-and-setup/assets/fscrawler-docker-config.zip
diff --git a/...t_knowledge/elasticsearch-install-and-setup/assets/import_data_to_new_index.png b/...t_knowledge/elasticsearch-install-and-setup/assets/import_data_to_new_index.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/jupyter_play_button.png b/agent_knowledge/elasticsearch-install-and-setup/assets/jupyter_play_button.png
diff --git a/.../elasticsearch-install-and-setup/assets/override_settings_for_uploaded_file.png b/.../elasticsearch-install-and-setup/assets/override_settings_for_uploaded_file.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/podman_extensions.png b/agent_knowledge/elasticsearch-install-and-setup/assets/podman_extensions.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/podman_machine_edit.png b/agent_knowledge/elasticsearch-install-and-setup/assets/podman_machine_edit.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/podman_machine_running.png b/agent_knowledge/elasticsearch-install-and-setup/assets/podman_machine_running.png
diff --git a/...owledge/elasticsearch-install-and-setup/assets/query_body_for_elasticsearch.png b/...owledge/elasticsearch-install-and-setup/assets/query_body_for_elasticsearch.png
diff --git a/...ledge/elasticsearch-install-and-setup/assets/query_body_with_custom_filters.png b/...ledge/elasticsearch-install-and-setup/assets/query_body_with_custom_filters.png
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/rancher_init.png b/agent_knowledge/elasticsearch-install-and-setup/assets/rancher_init.png
diff --git a/...nowledge/elasticsearch-install-and-setup/assets/sample_pdf_docs/lendyr_preferred_card.pdf b/...nowledge/elasticsearch-install-and-setup/assets/sample_pdf_docs/lendyr_preferred_card.pdf
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/sample_pdf_docs/lendyr_topaz_card.pdf b/agent_knowledge/elasticsearch-install-and-setup/assets/sample_pdf_docs/lendyr_topaz_card.pdf
diff --git a/agent_knowledge/elasticsearch-install-and-setup/assets/search_for_the_answer.png b/agent_knowledge/elasticsearch-install-and-setup/assets/search_for_the_answer.png
diff --git a/..._knowledge/elasticsearch-install-and-setup/assets/synchronize_trained_model.png b/..._knowledge/elasticsearch-install-and-setup/assets/synchronize_trained_model.png
diff --git a/...e/elasticsearch-install-and-setup/assets/upload_file_though_data_visualizer.png b/...e/elasticsearch-install-and-setup/assets/upload_file_though_data_visualizer.png
diff --git a/...ch-install-and-setup/assets/use_nested_query_in_search_integration_settings.png b/...ch-install-and-setup/assets/use_nested_query_in_search_integration_settings.png
diff --git a/...-install-and-setup/assets/use_session_variable_in_custom_filters_for_search.png b/...-install-and-setup/assets/use_session_variable_in_custom_filters_for_search.png
diff --git a/...edge/elasticsearch-install-and-setup/assets/wa_conversational_search_result.png b/...edge/elasticsearch-install-and-setup/assets/wa_conversational_search_result.png