From d98fb75f2ca5d37dd7ddaa3665c5d224e878bff5 Mon Sep 17 00:00:00 2001 From: TimLFletcher Date: Fri, 19 Sep 2025 11:28:15 +0100 Subject: [PATCH 01/17] added note --- modules/ROOT/pages/howto-xdcr.adoc | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/modules/ROOT/pages/howto-xdcr.adoc b/modules/ROOT/pages/howto-xdcr.adoc index 8dd5d59..b8e336d 100644 --- a/modules/ROOT/pages/howto-xdcr.adoc +++ b/modules/ROOT/pages/howto-xdcr.adoc @@ -230,6 +230,20 @@ In this discouraged scenario, there is no shared DNS between two Kubernetes clus Pods are exposed by using Kubernetes `NodePort` type services. As there is no DNS, TLS is not supported, so security must be maintained between the two clusters using a VPN. +When you use NodePorts on a remote Couchbase cluster for XDCR connections, you risk interruptions if Kubernetes deletes and recreates the Service. +Kubernetes does not guarantee that a new Service reuses the same NodePort. +If the port number changes, XDCR connections that rely on the old IP:NodePort become invalid. + +The exact outcome depends on the Kubernetes CNI (Container Networking Interface) implementation. +In some cases, Kubernetes removes the old port before it creates the new one, which causes a brief loss of connectivity. +In other cases, Kubernetes creates the new NodePort first, which reduces or avoids downtime. + +- Single-node Couchbase cluster: When the node's NodePort changes, XDCR cannot reconnect automatically. +In this case, you must manually update the replication configuration at couchbaseclusters.spec.xdcr.remoteClusters.hostname with the new IP:NodePort of the Couchbase node. + +- Multi-node Couchbase cluster: When a node's NodePort changes, XDCR reconnects to another node that still exposes a valid NodePort. +However, if XDCR tries to reconnect through the updated node, you may still need to update couchbaseclusters.spec.xdcr.remoteClusters.hostname with the new port. + [IMPORTANT] ==== When using Istio or another service mesh, remember that strict mode mTLS cannot be used with Kubernetes node ports. From ac3906ec3ec5d2f4d308462f43dcc429f94bb285 Mon Sep 17 00:00:00 2001 From: TimLFletcher Date: Fri, 19 Sep 2025 11:38:44 +0100 Subject: [PATCH 02/17] Provided a small cleanup of the worst Vale errors --- modules/ROOT/pages/howto-xdcr.adoc | 36 +++++++++++++++--------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/modules/ROOT/pages/howto-xdcr.adoc b/modules/ROOT/pages/howto-xdcr.adoc index b8e336d..4e31648 100644 --- a/modules/ROOT/pages/howto-xdcr.adoc +++ b/modules/ROOT/pages/howto-xdcr.adoc @@ -14,8 +14,8 @@ This page documents how to setup XDCR to replicate data to a different Kubernete In this scenario the remote cluster is accessible with Kubernetes based DNS. This applies to both xref:concept-couchbase-networking.adoc#intra-kubernetes-networking[intra-Kubernetes networking] and xref:concept-couchbase-networking.adoc#inter-kubernetes-networking-with-forwarded-dns[inter-Kubernetes networking with forwarded DNS]. -When using inter-Kubernetes networking, the local XDCR client must forward DNS requests to the remote cluster in order to resolve DNS names of the target Couchbase instances. -Refer to the xref:tutorial-remote-dns.adoc[Inter-Kubernetes Networking with Forwarded DNS] tutorial to understand how to configure forwarding DNS servers. +When using inter-Kubernetes networking, the local XDCR client must forward DNS requests to the remote cluster to resolve DNS names of the target Couchbase instances. +For more information, see the xref:tutorial-remote-dns.adoc[Inter-Kubernetes Networking with Forwarded DNS] tutorial to understand how to configure forwarding DNS servers. TLS is optional with this configuration, but shown for completeness. To configure without TLS, omit any TLS related attributes. @@ -56,10 +56,10 @@ spec: remoteBucket: destination ---- -<.> The resource is labeled with `replication:from-my-cluster-to-remote-cluster` to avoid any ambiguity because by default the Operator will select all `CouchbaseReplication` resources in the namespace and apply them to all remote clusters. -Thus the label is specific to the source cluster and target cluster. +<.> The resource is labeled with `replication:from-my-cluster-to-remote-cluster` to avoid any ambiguity because by default the Couchbase Autonomous Operator selects all `CouchbaseReplication` resources in the namespace and apply them to all remote clusters. +The label is specific to the source cluster and target cluster. -We define a remote cluster on our local resource: +Define a remote cluster on the local resource: [source,yaml] ---- @@ -101,12 +101,12 @@ spec: <.> The correct hostname to use is the remote cluster's console service to provide stable naming and service discovery. The hostname is calculated as per the xref:howto-client-sdks.adoc#dns-based-addressing[SDK configuration how-to]. -<.> As we are not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. <.> **TLS only:** For TLS connections you need to specify the remote cluster CA certificate in order to verify the remote cluster is trusted. xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-xdcr-remoteclusters-tls-secret[`couchbaseclusters.spec.xdcr.remoteClusters.tls.secret`] documents the secret format. -<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote one. +<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. <.> **Inter-Kubernetes networking with forwarded DNS only:** the xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-servers-pod[`couchbaseclusters.spec.servers.pod.spec.dnsPolicy`] field tells Kubernetes to provide no default DNS configuration. @@ -184,7 +184,7 @@ spec: ---- <.> The resource is labeled with `replication:from-my-cluster-to-remote-cluster` to avoid any ambiguity because by default the Operator will select all `CouchbaseReplication` resources in the namespace and apply them to all remote clusters. -Thus the label is specific to the source cluster and target cluster. +The label is specific to the source cluster and target cluster. We define a remote cluster on our local resource: @@ -217,25 +217,25 @@ spec: <.> The correct hostname to use is the remote cluster's console service to provide stable naming and service discovery. The hostname is calculated as per the xref:howto-client-sdks.adoc#dns-based-addressing-with-external-dns[SDK configuration how-to]. -<.> As we are not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. <.> For TLS connections you need to specify the remote cluster CA certificate in order to verify the remote cluster is trusted. xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-xdcr-remoteclusters-tls-secret[`couchbaseclusters.spec.xdcr.remoteClusters.tls.secret`] documents the secret format. -<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote one. +<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. == IP Based Addressing -In this discouraged scenario, there is no shared DNS between two Kubernetes clusters - we must use IP based addressing. +In this discouraged scenario, there is no shared DNS between 2 Kubernetes clusters - we must use IP based addressing. Pods are exposed by using Kubernetes `NodePort` type services. -As there is no DNS, TLS is not supported, so security must be maintained between the two clusters using a VPN. +As there is no DNS, TLS is not supported, so security must be maintained between the 2 clusters using a VPN. When you use NodePorts on a remote Couchbase cluster for XDCR connections, you risk interruptions if Kubernetes deletes and recreates the Service. Kubernetes does not guarantee that a new Service reuses the same NodePort. If the port number changes, XDCR connections that rely on the old IP:NodePort become invalid. The exact outcome depends on the Kubernetes CNI (Container Networking Interface) implementation. -In some cases, Kubernetes removes the old port before it creates the new one, which causes a brief loss of connectivity. +In some cases, Kubernetes removes the old port before it creates the new service, which causes a brief loss of connectivity. In other cases, Kubernetes creates the new NodePort first, which reduces or avoids downtime. - Single-node Couchbase cluster: When the node's NodePort changes, XDCR cannot reconnect automatically. @@ -301,7 +301,7 @@ spec: ---- <.> The resource is labeled with `replication:from-my-cluster-to-remote-cluster` to avoid any ambiguity because by default the Operator will select all `CouchbaseReplication` resources in the namespace and apply them to all remote clusters. -Thus the label is specific to the source cluster and target cluster. +The label is specific to the source cluster and target cluster. We define a remote cluster on our local resource: @@ -332,14 +332,14 @@ spec: <.> The correct hostname to use. The hostname is calculated as per the xref:howto-client-sdks.adoc#ip-based-addressing[SDK configuration how-to]. -<.> As we are not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. -<.> Finally we select replications that match the labels we specify, in this instance the ones that go from this cluster to the remote one. +<.> Finally we select replications that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. == Scopes and collections support With Couchbase Server version 7 and greater, scope and collections support is now present for XDCR. -The Couchbase Kubernetes Operator fully supports the various options available to the Couchbase Server version it is running with, full details can be found in the xref:server:manage:manage-xdcr/replicate-using-scopes-and-collections.html[official documentation]. +The Couchbase Kubernetes Operator fully supports the various options available to the Couchbase Server version it's running with, full details can be found in the xref:server:manage:manage-xdcr/replicate-using-scopes-and-collections.html[official documentation]. [NOTE] ==== @@ -410,7 +410,7 @@ Eventual consistency rules apply so if the bucket is still being created then we <.> This is an example of replicating only a specific collection `collection1` in scope `scope1`. -<.> The target keyspace must be of identical size so as we are replicating from a collection we must also specify a target collection. +<.> The target keyspace must be of identical size so as we're replicating from a collection we must also specify a target collection. <.> Deny rules can be used to prevent replication of specific keyspaces. This is useful if for example you have a scope with a large number of collections and you want to replicate all but a small number. From ecd05fc0652813a6f3146b78dc07e62fbcc746a3 Mon Sep 17 00:00:00 2001 From: BenMotts Date: Tue, 14 Oct 2025 13:48:43 +0100 Subject: [PATCH 03/17] Update pv compatibility to include any CSI --- modules/ROOT/pages/prerequisite-and-setup.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/ROOT/pages/prerequisite-and-setup.adoc b/modules/ROOT/pages/prerequisite-and-setup.adoc index 522d7f6..3bf9c47 100644 --- a/modules/ROOT/pages/prerequisite-and-setup.adoc +++ b/modules/ROOT/pages/prerequisite-and-setup.adoc @@ -146,6 +146,7 @@ This release supports the following managed Kubernetes services and utilities: == Persistent Volume Compatibility Persistent volumes are mandatory for production deployments. +The Couchbase Operator is designed to work with any CSI-compliant storage driver and compatibility with different CSI implementations can be validated using the xref:concept-platform-certification.adoc[Operator Self-Certification Lifecycle] tooling. Review the Kubernetes Operator xref:best-practices.adoc#persistent-volumes-best-practices[best practices] for more information about cluster supportability requirements. == Hardware Requirements From 95be8ec285b920985c9214a91c9ed5f1501722fa Mon Sep 17 00:00:00 2001 From: usamah jassat Date: Tue, 21 Oct 2025 09:59:41 +0100 Subject: [PATCH 04/17] K8S-4272: Add encryption at rest documentation --- modules/ROOT/nav.adoc | 5 + .../pages/concept-encryption-at-rest.adoc | 169 +++++++ .../pages/tutorial-encryption-at-rest.adoc | 423 ++++++++++++++++++ 3 files changed, 597 insertions(+) create mode 100644 modules/ROOT/pages/concept-encryption-at-rest.adoc create mode 100644 modules/ROOT/pages/tutorial-encryption-at-rest.adoc diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 10fa530..3771de2 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -37,6 +37,8 @@ *** xref:concept-data-save-restore.adoc[Data Topology Save, Restore and Synchronization] *** xref:howto-guide-save-restore.adoc[How-to Guide: Data Topology Save and Restore] *** xref:howto-guide-data-topology-sync.adoc[How-to Guide: Data Topology Synchronization] +** Data Encryption + *** xref:concept-encryption-at-rest.adoc[Encryption at Rest] ** Hibernation *** xref:concept-hibernation.adoc[Couchbase Cluster Hibernation] ** Logging @@ -89,6 +91,7 @@ *** xref:howto-manage-couchbase-logging.adoc[Manage Couchbase Logging] *** xref:howto-couchbase-log-forwarding.adoc[Configure Log Forwarding] *** xref:howto-non-root-install.adoc[Configure Non-Root Installs] + *** xref:howto-encryption-at-rest.adoc[Configure Encryption at Rest] ** Connect *** xref:howto-ui.adoc[Access the Couchbase User Interface] *** xref:howto-client-sdks.adoc[Configure Client SDKs] @@ -132,6 +135,8 @@ include::partial$autogen-reference.adoc[] ** xref:tutorial-autoscale-query.adoc[Autoscaling Couchbase Query Service] * Backup ** xref:tutorial-velero-backup.adoc[Backup with VMware Velero] +* Encryption at Rest + ** xref:tutorial-encryption-at-rest.adoc[Encryption at Rest] * Logging ** xref:tutorial-couchbase-log-forwarding.adoc[] * Monitoring diff --git a/modules/ROOT/pages/concept-encryption-at-rest.adoc b/modules/ROOT/pages/concept-encryption-at-rest.adoc new file mode 100644 index 0000000..730d504 --- /dev/null +++ b/modules/ROOT/pages/concept-encryption-at-rest.adoc @@ -0,0 +1,169 @@ += Encryption At Rest +:description: Understand encryption at rest in Couchbase Server and how to configure it using the Autonomous Operator. + +[abstract] +{description} + +== Overview + +Encryption at rest is a security feature introduced in Couchbase Server 8.0.0 that protects your data by encrypting it on disk. When enabled, sensitive data stored on the Couchbase nodes is encrypted, ensuring that even if the underlying storage is compromised, the data remains secure. + +== What Data Can Be Encrypted? + +Encryption at rest supports encrypting multiple types of data within your Couchbase deployment: + +* *Data in buckets* - The actual documents and data stored in your buckets +* *Cluster configuration* - Sensitive cluster settings and configurations +* *Logs* - Server log files (note: encrypting logs will break fluent-bit log streaming) +* *Audit logs* - Security audit trail data + +== Key Types + +Couchbase offers flexibility in how encryption keys are managed through three different key types: + +=== Couchbase Server Managed Keys + +Also called AutoGenerated keys, these are the simplest option. Couchbase Server automatically generates and manages these keys without requiring external services. This is ideal for: + +* Environments without external key management infrastructure +* Use cases where key management can be handled within Couchbase + +=== AWS KMS Keys + +AWS Key Management Service (KMS) integration allows you to use AWS-managed encryption keys. This is recommended when: + +* Running Couchbase in AWS (EKS or EC2) +* Your organization uses AWS KMS for centralized key management +* You need compliance with AWS security standards + +=== KMIP Keys + +Key Management Interoperability Protocol (KMIP) is an industry standard that works with enterprise key management systems from vendors like Thales, IBM, or HashiCorp Vault. Choose KMIP when: + +* You have an existing enterprise key management system +* You need vendor-neutral key management +* Compliance requires external key management + +== Key Concepts + +=== Key Encryption Keys (KEK) and Data Encryption Keys (DEK) + +Couchbase uses a two-tier key hierarchy: + +* *Key Encryption Keys (KEK)* - The master keys you define through `CouchbaseEncryptionKey` resources. These encrypt other keys or data. +* *Data Encryption Keys (DEK)* - Temporary keys generated by Couchbase to encrypt actual data. These are encrypted by KEKs. + +=== Key Rotation + +Key rotation is an important security practice. With encryption at rest: + +* KEK rotation can be scheduled through the `CouchbaseEncryptionKey` resource +* DEK rotation happens automatically based on the `rotationInterval` setting +* When a key rotates, new data is encrypted with the new key while old data remains accessible + +=== Key Usage Restrictions + +You can restrict what each key encrypts by setting usage parameters: + +* `configuration` - Cluster configuration data +* `key` - Other encryption keys +* `log` - Log files +* `audit` - Audit logs +* `allBuckets` - All bucket data + +By default, keys can encrypt anything. Restricting usage improves security through separation of concerns. + +== How to Enable Encryption At Rest + +Enabling encryption at rest with the Autonomous Operator involves three main steps: + +=== Step 1: Enable Encryption Management + +First, enable encryption at rest management on your `CouchbaseCluster` resource: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true +---- + +=== Step 2: Create Encryption Keys + +Create one or more `CouchbaseEncryptionKey` resources. Here's a simple example with an auto-generated key: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-key +spec: + keyType: AutoGenerated +---- + +For AWS KMS or KMIP keys, additional configuration is required (see xref:tutorial-encryption-at-rest.adoc[]). + +=== Step 3: Apply Encryption to Data + +Configure which data should be encrypted on your cluster or buckets: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true + configuration: + enabled: true + keyName: "my-key" + audit: + enabled: true + keyName: "my-key" +---- + +For bucket-level encryption: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseBucket +metadata: + name: secure-bucket +spec: + name: secure-bucket + memoryQuota: 512Mi + encryptionAtRest: + keyName: "my-key" +---- + +== Security Considerations + +When implementing encryption at rest: + +* *Key Protection* - Consider encrypting your data keys with a dedicated Key Encryption Key (KEK) rather than using the cluster master password +* *Key Rotation* - Implement regular key rotation schedules appropriate for your security requirements +* *External Key Management* - For sensitive environments, consider using AWS KMS or KMIP instead of auto-generated keys +* *Log Encryption Trade-offs* - Be aware that encrypting logs prevents log streaming to monitoring systems + +== Next Steps + +For detailed configuration instructions and advanced features, see: + +* xref:tutorial-encryption-at-rest.adoc[How to Configure Encryption At Rest] - Complete configuration guide with all options + +== Related Information + +* xref:concept-security.adoc[Security Concepts] +* xref:howto-manage-buckets.adoc[Managing Buckets] +* xref:howto-manage-cluster.adoc[Managing Clusters] + diff --git a/modules/ROOT/pages/tutorial-encryption-at-rest.adoc b/modules/ROOT/pages/tutorial-encryption-at-rest.adoc new file mode 100644 index 0000000..67d7117 --- /dev/null +++ b/modules/ROOT/pages/tutorial-encryption-at-rest.adoc @@ -0,0 +1,423 @@ += Couchbase Encryption At Rest + +[abstract] +How to configure Couchbase Server with encryption at rest. This guide covers operator-managed keys, AWS KMS-backed keys, and KMIP-backed keys, + +Couchbase Server supports encryption at rest. +This is a feature that allows you to encrypt the data at rest on the disk. + +== Prerequisites +* Couchbase Server 8.0.0 or later + +== Overview + +In Couchbase 8.0.0 Encryption at Rest was introduced which allows data on the Couchbase Nodes to be encrypted at rest. The data that can be encrypted at rest includes: + +- Data in buckets +- Cluster configuration +- Logs +- Audit + +Couchbase offers three types of Keys that can be used to encrypt data: + +- Couchbase Server Managed Keys +- AWS KMS Keys +- KMIP Keys + +== Enabling Encryption at Rest Management + +To use any Encryption at Rest features through the Operator, you must first enable encryption at rest management on the Couchbase Cluster resource. + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true # <.> +---- +<.> Enable operator-managed encryption at rest. By default, this is disabled. + +Once enabled, the operator will manage encryption keys and apply encryption settings to your cluster. + +=== Selecting Encryption Keys + +By default, the operator will use all `CouchbaseEncryptionKey` resources in the same namespace as the cluster. You can use a label selector to control which keys the operator manages: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true + selector: # <.> + matchLabels: + cluster: my-cluster +---- +<.> Only encryption keys with the label `cluster: my-cluster` will be managed for this cluster. + +== Managing Couchbase Server Managed Keys + +Couchbase Server Managed Keys (also called AutoGenerated keys) are the simplest type of encryption key. These keys are generated and managed automatically by Couchbase Server without requiring external key management services. + +=== Basic Example + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-key +spec: + keyType: AutoGenerated +---- + +=== Usage + +Keys can be used to encrypt different types of data, and this usage can be enforced by setting the usage of the key. Setting the usage of the key restricts what it can be be used to encrypt, with the options being: + +- Keys +- Configuration +- Logs +- Audit +- Buckets + +By default keys can be used to encrypt anything. To restrict the usage of the key the `spec.usage` object on the key can be used to set the usage of the key. + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-key +spec: + keyType: AutoGenerated + usage: + configuration: true # <.> + key: true # <.> + log: true # <.> + audit: true # <.> + allBuckets: true # <.> +---- + +<.> The `spec.usage.configuration` field defines whether the key should be used for configurations. This is set to true by default. +<.> The `spec.usage.key` field defines whether the key should be used for keys. This is set to true by default. +<.> The `spec.usage.log` field defines whether the key should be used for logs. This is set to true by default. +<.> The `spec.usage.audit` field defines whether the key should be used for audit. This is set to true by default. +<.> The `spec.usage.allBuckets` field defines whether the key should be used for all buckets. This is set to true by default. + +=== Additional Options + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-key +spec: + keyType: AutoGenerated + autoGenerated: + rotation: + intervalDays: 30 # <.> + StartTime: 2025-01-01T00:00:00Z # <.> + canBeCached: true +---- + +<.> The `spec.autoGenerated.rotation.intervalDays` field defines the interval in days at which the key should be rotated. +<.> The `spec.autoGenerated.rotation.startTime` field defines the first time at which the key rotation will start. + +==== Key Encryption with Another Key + +For enhanced security, AutoGenerated keys can be encrypted with another encryption key instead of the cluster's master password: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: master-key +spec: + keyType: AutoGenerated + usage: # <.> + configuration: false + key: true # <.> + log: false + audit: false + allBuckets: false +--- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: bucket-key +spec: + keyType: AutoGenerated + autoGenerated: + encryptWithKey: master-key # <.> + usage: + configuration: false + key: false + log: false + audit: false + allBuckets: true # <.> +---- +<.> Restrict the master key's usage to only encrypt other keys (Key Encryption Key). +<.> Allow this key to be used for encrypting other keys. +<.> Encrypt this key using the `master-key` encryption key. +<.> This key will only be used to encrypt bucket data. + +== Managing AWS KMS Keys + +AWS Key Management Service (KMS) is a fully managed service that makes it easy to create, manage, and control cryptographic keys used to protect your data. AWS KMS Keys can be used to encrypt the data at rest in the Couchbase Cluster. To use AWS KMS Keys you will need to provide a way to authenticate with AWS, either using IMDS or providing a secret with credentials. + +=== Prerequisites + +* An AWS account with KMS key creation permissions +* A KMS key created in AWS +* Either: + - AWS credentials with permission to use the KMS key, or + - IAM role attached to the Kubernetes nodes with KMS permissions (for IMDS) + +=== Basic Example with AWS Credentials + +To provide AWS credentials via a Kubernetes secret a secret with the AWS credentials must be created. The credentials file should follow the standard AWS credentials format: +[source,ini] +---- +[default] +aws_access_key_id = YOUR_ACCESS_KEY_ID +aws_secret_access_key = YOUR_SECRET_ACCESS_KEY +---- + +The secret can be created using the following command: + +.Step 1: Create an AWS credentials secret +[source,bash] +---- +kubectl create secret generic aws-credentials \ + --from-file=credentials=/path/to/.aws/credentials +---- + +.Step 2: Create the encryption key resource +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-aws-key +spec: + keyType: AWS # <.> + awsKey: + keyARN: "arn:aws:kms:us-east-1:123456789012:key/abcd1234-ab12-cd34-ef56-abcdef123456" # <.> + keyRegion: "us-east-1" # <.> + credentialsSecret: "aws-credentials" # <.> + profileName: # <.> +---- +<.> Specifies that this is an AWS KMS key. +<.> The ARN of your KMS key from AWS. +<.> The AWS region where the KMS key is located. +<.> The name of the Kubernetes secret containing AWS credentials. +<.> The optional profile name to use for the AWS credentials if multiple profiles are present in the credentials file. + +=== Authenticating with IMDS +When running in AWS (EKS or Kubernetes on EC2), you can use Instance Metadata Service (IMDS) to authenticate using the IAM role attached to the nodes: + + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-aws-key-imds +spec: + keyType: AWS + awsKey: + keyARN: "arn:aws:kms:us-east-1:123456789012:key/abcd1234-ab12-cd34-ef56-abcdef123456" + keyRegion: "us-east-1" + useIMDS: true # <.> +---- +<.> Enable authentication using IMDS. No credentials secret is required. + +== Managing KMIP Keys + +Key Management Interoperability Protocol (KMIP) is an OASIS standard for communication between key management systems and applications. KMIP allows you to use external key management solutions from vendors like Thales, IBM, or HashiCorp Vault. + +=== Prerequisites + +* A KMIP-compliant server +* Client certificate and private key in PKCS#8 format +* KMIP server host and port +* A key ID for an existing key on the KMIP server + + +=== Basic Example + +.Step 1: Create a Kubernetes secret with client credentials + + +[source,yaml] +---- +apiVersion: v1 +kind: Secret +metadata: + name: kmip-client-secret +type: Opaque +data: + passphrase: + tls.key: + tls.crt: +---- + +The secret must contain three keys : + +* `tls.crt` - The client certificate +* `tls.key` - The client private key in encrypted PKCS#8 format +* `passphrase` - The passphrase for decrypting the private key + +.Step 2: Create the KMIP encryption key resource +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-kmip-key +spec: + keyType: KMIP # <.> + kmipKey: + host: "kmip.example.com" # <.> + port: 5696 # <.> + timeoutInMs: 5000 # <.> + clientSecret: "kmip-client-cert" # <.> + verifyWithSystemCA: true # <.> + verifyWithCouchbaseCA: true # <.> + keyID: "existing-key-identifier" # <.> + +---- +<.> Specifies that this is a KMIP-managed key. +<.> The hostname of your KMIP server. +<.> The port number of your KMIP server (standard KMIP port is 5696). +<.> Connection timeout in milliseconds (must be between 1000 and 300000). +<.> The name of the Kubernetes secret containing client certificates. +<.> Verify the KMIP server certificate against the system CA bundle. +<.> Verify the KMIP server certificate against the Couchbase CA bundle. +<.> The unique identifier of the existing key in the KMIP server. + +=== Encryption Approaches + +KMIP supports two encryption approaches: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseEncryptionKey +metadata: + name: my-kmip-key-native +spec: + keyType: KMIP + kmipKey: + host: "kmip.example.com" + port: 5696 + timeoutInMs: 5000 + clientSecret: "kmip-client-cert" + encryptionApproach: NativeEncryptDecrypt # <.> +---- +<.> Use native encrypt/decrypt operations on the KMIP server. + +Available approaches: + +* `LocalEncrypt` (default) - Key material is retrieved and encryption/decryption happens locally on Couchbase nodes. Better performance. +* `NativeEncryptDecrypt` - Encryption/decryption operations are performed by the KMIP server. Key material never leaves the KMIP server. More secure but higher latency. + +Choose `NativeEncryptDecrypt` when security requirements mandate that key material never leaves the key management system. Choose `LocalEncrypt` for better performance when the security model allows it. + +== Encrypting Cluster Data + +Once encryption keys are created, you can enable encryption for different types of cluster data. + +=== Encrypting Configuration, Logs, and Audit + +Configuration, logs, and audit logs can be encrypted at the cluster level: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true + configuration: # <.> + enabled: true + keyName: "my-autogen-key" # <.> + keyLifetime: "8760h" # <.> + rotationInterval: "720h" # <.> + audit: # <.> + enabled: true + keyName: "my-autogen-key" + keyLifetime: "8760h" + rotationInterval: "720h" + log: # <.> + enabled: true + keyName: "my-autogen-key" + keyLifetime: "8760h" + rotationInterval: "720h" +---- +<.> Enable encryption for cluster configuration. +<.> Use the `my-autogen-key` encryption key. If not specified, the cluster master password is used. +<.> Data Encryption Key (DEK) lifetime in hours. Default is 8760 hours (1 year). Must be at least 720 hours (30 days). +<.> DEK rotation interval in hours. Default is 720 hours (30 days). Must be at least 168 hours (7 days). +<.> Enable encryption for audit logs. +<.> Enable encryption for log files. + +WARNING: Enabling encryption for log files will break fluent-bit log streaming, as the logs will be encrypted and unreadable by the log collector. Only enable log encryption if you don't rely on log streaming. + +=== Using Default Encryption (Master Password) + +You can enable encryption without specifying a key name. In this case, the cluster's master password is used: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: my-cluster +spec: + security: + encryptionAtRest: + managed: true + configuration: + enabled: true # <.> + # keyName not specified - uses master password +---- +<.> Encrypt configuration using the cluster master password instead of an encryption key. + +=== Encrypting Buckets + +Individual buckets can be encrypted with specific keys. This is configured at the bucket level: + +[source,yaml] +---- +apiVersion: couchbase.com/v2 +kind: CouchbaseBucket +metadata: + name: my-encrypted-bucket +spec: + name: my-encrypted-bucket + memoryQuota: 512Mi + encryptionAtRest: # <.> + keyName: "my-autogen-key" # <.> + keyLifetime: "8760h" # <.> + rotationInterval: "720h" # <.> +---- +<.> Enable bucket encryption. +<.> The encryption key to use. +<.> DEK lifetime. Default is 8760 hours (1 year). +<.> DEK rotation interval. Default is 720 hours (30 days). + From d23015c28128e3d64e2a0194e8a77ea4711a794c Mon Sep 17 00:00:00 2001 From: Ray Offiah <77050471+rayoffiah@users.noreply.github.com> Date: Wed, 22 Oct 2025 09:55:31 +0100 Subject: [PATCH 05/17] DOC-13656-Create-release-note-for-Couchbase-Operator-2.9.0 Added release notes for Couchbase Operator 2.9.0 including fixed issues, updates to constants, and preview configuration adjustments. Removed obsolete release notes for versions 2.8.0 and 2.8.1. Updated Antora configuration to reflect the 2.9 release. Signed-off-by: Ray Offiah --- antora.yml | 2 +- modules/ROOT/pages/release-notes.adoc | 85 +------------------ modules/ROOT/partials/constants.adoc | 4 +- ...ouchbase-operator-release-notes-2.8.0.adoc | 84 ------------------ ...ouchbase-operator-release-notes-2.8.1.adoc | 29 ------- ...ouchbase-operator-release-notes-2.9.0.adoc | 47 ++++++++++ preview/HEAD.yml | 6 ++ 7 files changed, 60 insertions(+), 197 deletions(-) delete mode 100644 modules/ROOT/partials/couchbase-operator-release-notes-2.8.0.adoc delete mode 100644 modules/ROOT/partials/couchbase-operator-release-notes-2.8.1.adoc create mode 100644 modules/ROOT/partials/couchbase-operator-release-notes-2.9.0.adoc create mode 100644 preview/HEAD.yml diff --git a/antora.yml b/antora.yml index c94abfc..f1b3a13 100644 --- a/antora.yml +++ b/antora.yml @@ -1,6 +1,6 @@ name: operator title: Kubernetes Operator -version: '2.8' +version: '2.9' prerelease: false start_page: ROOT:overview.adoc nav: diff --git a/modules/ROOT/pages/release-notes.adoc b/modules/ROOT/pages/release-notes.adoc index c76b663..264a68d 100644 --- a/modules/ROOT/pages/release-notes.adoc +++ b/modules/ROOT/pages/release-notes.adoc @@ -20,92 +20,15 @@ The necessary steps needed to upgrade to this release depend on which version of There is no direct upgrade path from versions prior to 2.2.0. To upgrade from a 1.x, 2.0.x, or 2.1.x release, you must first upgrade to 2.4.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. -Refer to the xref:2.4@operator::howto-operator-upgrade.adoc[Operator 2.4 upgrade steps] if upgrading from a pre-2.2 release. - -=== Upgrading from 2.2, 2.3, 2.4, 2.5, 2.6, or 2.7 - -There are no additional upgrade steps when upgrading from these versions, and you may follow the xref:howto-operator-upgrade.adoc[standard upgrade process]. +See xref:2.4@operator::howto-operator-upgrade.adoc[Couchbase Operator 2.4 upgrade steps] if upgrading from a pre-2.2 release. +include::partial$couchbase-operator-release-notes-2.9.0.adoc[] For further information read the xref:concept-upgrade.adoc[Couchbase Upgrade] concepts page. -include::partial$couchbase-operator-release-notes-2.8.1.adoc[] - -[#release-v280] -== Release 2.8.0 - -Couchbase Kubernetes Operator 2.8.0 was released in March 2025. - -[#changes-in-behavior-v280] -=== Changes in Behaviour - -==== Admission Controller Changes - -The Dynamic Admission Controller (DAC) will now warn if any cluster settings don't match our xref:best-practices.adoc#production-deployments[Best Practices for Production Deployments]. - -The DAC will now prevent changes to the `CouchbaseCluster` spec while a hibernation is taking place. -If hibernation is enabled while a cluster is migrating, upgrading, scaling, or rebalancing, that process will conclude before the cluster enters hibernation. The DAC will warn when this is the case, and it will be visible in the operator logs. - -To prevent any invalid resources failing to reconcile (i.e. if the DAC is not deployed in the current environment), the DAC Validation is now run at the beginning of the reconciliation loop. -Any invalid resources will be skipped for reconciliation, marked as `NotValid`, and logged. - -==== Bucket and Index Service Settings - -In a previous version of the Operator, `enablePageBloomFilter` was unfortunately missed from the Index Service settings. -This has been addressed in CAO 2.8.0, and it is now available as xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-cluster-indexer-enablepagebloomfilter[`couchbaseclusters.spec.cluster.indexer.enablePageBloomFilter`]. - -Until CAO 2.8.0, Bucket Compaction settings were only available to be set in the xref:resource/couchbasecluster.adoc[`CouchbaseCluster`] resource, at xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-cluster-autocompaction[`couchbaseclusters.spec.cluster.autoCompaction`]. -These settings have now been added to the xref:resource/couchbasebucket.adoc[`CouchbaseBucket`] resource at xref:resource/couchbasebucket.adoc#couchbasebuckets-spec-autocompaction[`couchbasebuckets.spec.autoCompaction`]. - -[IMPORTANT] -==== -Prior to Operator 2.8.0, the above settings could still be set directly on the cluster. - -To avoid these being reset to default values during the CAO upgrade, any of the above settings that have been changed must be added to the appropriate resource during the upgrade. - -Specifically, this needs to be done _after_ updating the CRDs, and _before_ installing the new Operator - -For further information see xref:howto-operator-upgrade.adoc#update-existing-resources[Update Existing Resources]. -==== - -==== Metrics Changes - -A number of new metrics have been added, see xref:reference-prometheus-metrics.adoc[Prometheus Metrics Reference] for details. - -It is now possible to include the Couchbase Cluster UUID, or Cluster UUID and Cluster Name, as labels with any Operator metric that is related to a specific Couchbase Cluster. -This can be enabled by setting `optional-metric-labels` to either `uuid-only` or `uuid-and-name`, when using xref:tools/cao.adoc#cao-create-operator-flags[cao create operator] or xref:tools/cao.adoc#cao-generate-operator-flags[cao generate operator]. - -While adding the Couchbase Cluster UUID and Cluster Name labels, it was discovered that there were inconsistencies regarding the Kubernetes Namespace and Cluster Resource Name labels in some of the existing metrics. -Some had separate labels for `namespace` and `name`, and some had a combined `namespace/name` label. -In order to provide consistency, all metrics by default now have separate `name` and `namespace` labels. -The previous behavior, where a small number of metrics had the combined form of the label, can be achieved by setting `separate-cluster-namespace-and-name` to `false`, when using xref:tools/cao.adoc#cao-create-operator-flags[cao create operator] or xref:tools/cao.adoc#cao-generate-operator-flags[cao generate operator]. - -==== Annotation Changes - -===== Storage Backend Migration - -As an enhancement to the Couchstore/Magma migration functionality added in Operator 2.7, CAO 2.8.0 adds two new annotations: - -* Bucket Migrations are now disabled by default, to prevent unexpected node rebalances. These can be enabled with xref:reference-annotations.adoc#cao-couchbase-combuckets-enablebucketmigrationroutines[`cao.couchbase.com/buckets.enableBucketMigrationRoutines`]. -* Similar to a maintenance upgrade, it is now possible to specify how many Pods can be migrated at a time with xref:reference-annotations.adoc#cao-couchbase-combuckets-maxconcurrentpodswaps[`cao.couchbase.com/buckets.maxConcurrentPodSwaps`]. - -===== History Retention - -The annotations related to History Retention, that were added in Operator 2.4.1, have now been added to the xref:resource/couchbasebucket.adoc[`CouchbaseBucket`], and xref:resource/couchbasecollection.adoc[`CouchbaseCollection`] resources, at xref:resource/couchbasebucket.adoc#couchbasebuckets-spec-historyretention[`couchbasebuckets.spec.historyRetention`], and xref:resource/couchbasecollection.adoc#couchbasecollections-spec-history[`couchbasecollections.spec.history`], respectively. - -The History Retention annotations should be considered deprecated, and it should be noted that if used, they will take precedence over the equivalent values in the resources. -Care should be taken to make sure that the annotations are removed as soon as the resources have been updated with the new attributes. - -include::partial$couchbase-operator-release-notes-2.8.0.adoc[] - -== Feedback - -You can have a big impact on future versions of the Operator (and its documentation) by providing Couchbase with your direct feedback and observations. -Please feel free to post your questions and comments to the https://forums.couchbase.com/c/couchbase-server/Kubernetes[Couchbase Forums]. +== Previous Release Notes -== Licenses for Third-Party Components +* xref:2.8@operator::release-notes.adoc[Couchbase Kubernetes Operator 2.8 Release Notes] -The complete list of licenses for Couchbase products is available on the https://www.couchbase.com/legal/agreements[Legal Agreements] page. -Couchbase is thankful to all of the individuals that have created these third-party components. == More Information diff --git a/modules/ROOT/partials/constants.adoc b/modules/ROOT/partials/constants.adoc index f17e54f..fcae046 100644 --- a/modules/ROOT/partials/constants.adoc +++ b/modules/ROOT/partials/constants.adoc @@ -1,5 +1,5 @@ -:operator-version: 2.8.1 -:operator-version-minor: 2.8 +:operator-version: 2.9.0 +:operator-version-minor: 2.9 :admission-controller-version: 2.8.1 :couchbase-version: 7.6.6 :couchbase-version-upgrade-from: 7.0.0 diff --git a/modules/ROOT/partials/couchbase-operator-release-notes-2.8.0.adoc b/modules/ROOT/partials/couchbase-operator-release-notes-2.8.0.adoc deleted file mode 100644 index c152a67..0000000 --- a/modules/ROOT/partials/couchbase-operator-release-notes-2.8.0.adoc +++ /dev/null @@ -1,84 +0,0 @@ - -[#fixed-issues-v280] -=== Fixed Issues - - -*https://jira.issues.couchbase.com/browse/K8S-3558[K8S-3558^]*:: - -Couchbase Autonomous Operator commences an In-place Upgrade when the cluster is under-resourced. - -*https://jira.issues.couchbase.com/browse/K8S-3579[K8S-3579^]*:: - -Couchbase Autonomous Operator tries to change invalid bucket configurations in a loop. - -*https://jira.issues.couchbase.com/browse/K8S-3591[K8S-3591^]*:: - -Couchbase Autonomous Operator crashes if Incremental Backup is missing schedule. - -*https://jira.issues.couchbase.com/browse/K8S-3596[K8S-3596^]*:: - -Crash in Operator due to invalid memory access. - -*https://jira.issues.couchbase.com/browse/K8S-3605[K8S-3605^]*:: - -Upgrade Swap Rebalance is retried with different parameters on Operator Pod deletion. - -*https://jira.issues.couchbase.com/browse/K8S-3609[K8S-3609^]*:: - - Hibernation fails to bring back any Pod with error extracting image version. - -*https://jira.issues.couchbase.com/browse/K8S-3621[K8S-3621^]*:: - -Shadowed Secret did not get updated. - -*https://jira.issues.couchbase.com/browse/K8S-3632[K8S-3632^]*:: - -Unable to set -1 for Collection-level `maxTTL`. - -*https://jira.issues.couchbase.com/browse/K8S-3639[K8S-3639^]*:: - -Operator loses track of pending Pods when an Eviction of the Operator Pod occurs. - -*https://jira.issues.couchbase.com/browse/K8S-3641[K8S-3641^]*:: - - Crash in `handleVolumeExpansion` if `enableOnlineVolumeExpansion` is True but no Volume Mounts configured. - -*https://jira.issues.couchbase.com/browse/K8S-3655[K8S-3655^]*:: - - Clear Upgrade condition if the Operator is not performing an upgrade. - -*https://jira.issues.couchbase.com/browse/K8S-3659[K8S-3659^]*:: - - When scaling down, Cluster does not maintain balance across Server Groups. - -*https://jira.issues.couchbase.com/browse/K8S-3696[K8S-3696^]*:: - - DAC prevents configuration of multiple XDCR Replications of same Buckets to different remote Clusters. - -*https://jira.issues.couchbase.com/browse/K8S-3772[K8S-3772^]*:: - -Self-Certification: Artifacts PVC should use `--storage-class` parameter when creating the Certification Pod. - -*https://jira.issues.couchbase.com/browse/K8S-3788[K8S-3788^]*:: - -Operator container crashes when there is a managed Scope/Collection Group added for the Ephemeral Bucket. - - -[#known-issues-v280] -=== Known Issues - -*https://jira.issues.couchbase.com/browse/K8S-3617[K8S-3617^]*:: - -It's not possible to set xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-cluster-indexer-redistributeindexes[`couchbaseclusters.spec.cluster.indexer.redistributeIndexes`] from True to False during a reconciliation. - -*https://jira.issues.couchbase.com/browse/K8S-3908[K8S-3908^]*:: - -Metric `couchbase_operator_memory_under_management_bytes` is incorrectly showing 0. - -*https://jira.issues.couchbase.com/browse/K8S-3909[K8S-3909^]*:: - -Metric `couchbase_operator_cpu_under_management` is incorrectly showing 0. - -*https://jira.issues.couchbase.com/browse/K8S-3910[K8S-3910^]*:: - -Operator tries to migrate storage backend of buckets even before Couchbase cluster is in 7.6.0+. diff --git a/modules/ROOT/partials/couchbase-operator-release-notes-2.8.1.adoc b/modules/ROOT/partials/couchbase-operator-release-notes-2.8.1.adoc deleted file mode 100644 index c52caf7..0000000 --- a/modules/ROOT/partials/couchbase-operator-release-notes-2.8.1.adoc +++ /dev/null @@ -1,29 +0,0 @@ -[#release-281] -== Release 2.8.1 (June 2025) - -Couchbase Operator 2.8.1 was released in June 2025. -This maintenance release contains fixes to issues. - -[#fixed-issues-v281] -== Fixed Issues - - -[#table-fixed-issues-v281,cols="25,66"] - - -*https://jira.issues.couchbase.com/browse/K8S-3793/[K8S-3793^]*:: - -Fixed a bug in Local Persistent Volume comparison logic that previously triggered unnecessary pod rebalancing when comparing existing and desired states, despite no actual differences being detected. - -*https://jira.issues.couchbase.com/browse/K8S-3840/[K8S-3840^]*:: - -Due to ephemeral volumes removing the staging directory, backups will fail if the defaultRecoveryMethod is set to resume. The admission controller will now invalidate backups using ephemeral volumes unless the defaultRecoveryMethod is set to either purge or none. - -*https://jira.issues.couchbase.com/browse/K8S-3889/[K8S-3889^]*:: - -Inplace upgrades are not supported prior to Couchbase Server Versions 7.2.x due to a required change in the startup files required by Couchbase Server. - - - - - diff --git a/modules/ROOT/partials/couchbase-operator-release-notes-2.9.0.adoc b/modules/ROOT/partials/couchbase-operator-release-notes-2.9.0.adoc new file mode 100644 index 0000000..ca7f5fb --- /dev/null +++ b/modules/ROOT/partials/couchbase-operator-release-notes-2.9.0.adoc @@ -0,0 +1,47 @@ +[#release-290] +== Release 2.9.0 (November 2025) + +Couchbase Operator 2.9.0 was released in November 2025. +This maintenance release contains fixes to issues. + +[#fixed-issues-v290] +== Fixed Issues + + +*https://jira.issues.couchbase.com/browse/K8S-3258/[K8S-3258]*:: + +Added a new `logging.configNameReleasePrefix` boolean to the helm chart. This defaults to false, but setting it to true will prefix the fluent-bit config with the release name. Setting this to true for existing clusters will trigger recreation of all pods so should only really be used for new clusters. + +*https://jira.issues.couchbase.com/browse/K8S-4091/[K8S-4091]*:: + +Updated the `spec.networking.addressFamily` field to accept `IPv4Only`, `IPv4Priority`, `IPv6Only` and `IPv6Priority`. The current `IPv4/IPv6` values will have the `Ipv4/6Only` functionality. I.e. customers that have set the fields will not see any change. ++ +These should be considered deprecated and will be removed in a future release. + +The priority/only choice determines whether `addressFamilyOnly` is true or false. + +*https://jira.issues.couchbase.com/browse/K8S-4097/[K8S-4097]*:: + +Manual Intervention Required is a new state that the couchbase cluster will enter in the unlikely scenario that the operator is unable to reconcile the cluster due to reasons outside of its control/capabilities, and which therefore require manual intervention by a user to resolve. ++ +If the cluster enters this condition, it will: ++ +* Set the cluster_manual_intervention metric by 1 +* Add (where possible) the `ManualInterventionRequired` condition to the cluster, with a message detailing the reason for entering the MIR state. +* Raise a `ManualInterventionRequired` Kubernetes event, with the event message set to the reason for entering manual intervention +* Most importantly, reconciliation will be skipped until the manual intervention required state has been resolved, i.e. the issue that put the cluster into that condition has been fixed. + +*https://jira.issues.couchbase.com/browse/K8S-4144/[K8S-4144]*:: + +In prior versions of Couchbase Operator, the metrics port annotation (`prometheus.io/port`) was set to 8091, even if TLS was enabled. It will now correctly set to 18091. + +*https://jira.issues.couchbase.com/browse/K8S-4161/[K8S-4161]*:: + +The latest update includes the addition of the analytics numReplicas setting in the Couchbase Operator. This enhancement allows users to configure the number of replicas for analytics service, offering improved flexibility and reliability. The update is part of the ongoing improvements to enhance functionality and user experience. + +*https://jira.issues.couchbase.com/browse/K8S-4270/[K8S-4270]*:: + +Potentially where we use `kubectl apply` for CRDS, we add a note that this error is possible in 2.9+, and to add `--server-side` to the `kubectl apply` command. + + + + diff --git a/preview/HEAD.yml b/preview/HEAD.yml new file mode 100644 index 0000000..3736c35 --- /dev/null +++ b/preview/HEAD.yml @@ -0,0 +1,6 @@ +sources: + docs-server: + branches: [release/8.0] + + docs-operator: + branches: [DOC-13656-Create-release-note-for-Couchbase-Operator-2.9.0, release/2.8] \ No newline at end of file From c749f739d92fcc535c912143de40df9a5cfe6921 Mon Sep 17 00:00:00 2001 From: Yusuf Ramzan Date: Wed, 22 Oct 2025 10:33:01 +0100 Subject: [PATCH 06/17] K8S-3607 Updated self certification docs to remove reference to creating K8S ticket --- modules/ROOT/pages/concept-platform-certification.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/ROOT/pages/concept-platform-certification.adoc b/modules/ROOT/pages/concept-platform-certification.adoc index 6be074f..718bc75 100644 --- a/modules/ROOT/pages/concept-platform-certification.adoc +++ b/modules/ROOT/pages/concept-platform-certification.adoc @@ -127,9 +127,9 @@ To submit the self-certification results to Couchbase, follow these steps: . Capture the Kubernetes platform's version information and other platform-specific components such as storage and networking. -. To upload the results to Couchbase, you will need a JIRA account for Couchbase; you can request a JIRA account here: https://issues.couchbase.com/secure/ContactAdministrators!default.jspa. +. If you are an existing customer of Couchbase, create a support ticket for instructions on how to submit your certification archive. -. Create a new JIRA ticket, project - Couchbase Kubernetes (K8S), and Summary - [Operator Self-Certification Lifecycle]. +. If you are a new customer of Couchbase, contact your Couchbase Account Team or use our general https://www.couchbase.com/contact/[contact page]. == Platform Requirements From ff776a1781aee8b75cce9f57cebcd7e112335c5b Mon Sep 17 00:00:00 2001 From: Ray Offiah Date: Wed, 5 Nov 2025 12:05:21 +0000 Subject: [PATCH 07/17] DOC-13656-Create-release-note-for-Couchbase-Operator-2.9.0 Added release notes for Couchbase Operator 2.9.0, detailing fixed issues and feature updates. Signed-off-by: Ray Offiah --- modules/ROOT/pages/release-notes.adoc | 73 +++++++++++++++++++-------- 1 file changed, 53 insertions(+), 20 deletions(-) diff --git a/modules/ROOT/pages/release-notes.adoc b/modules/ROOT/pages/release-notes.adoc index 264a68d..bd4fb6b 100644 --- a/modules/ROOT/pages/release-notes.adoc +++ b/modules/ROOT/pages/release-notes.adoc @@ -1,35 +1,68 @@ -= Release Notes for Couchbase Kubernetes Operator {operator-version-minor} -include::partial$constants.adoc[] +[#release-290] +== Release 2.9.0 (November 2025) -Autonomous Operator {operator-version-minor} introduces our new Cluster Migration functionality well as a number of other improvements and minor fixes. +Couchbase Operator 2.9.0 was released in November 2025. +This maintenance release contains fixes to issues. -Take a look at the xref:whats-new.adoc[What's New] page for a list of new features and improvements that are available in this release. +[#fixed-issues-v290] +== Fixed Issues -== Installation -For installation instructions, refer to: +*https://jira.issues.couchbase.com/browse/K8S-3258/[K8S-3258]*:: -* xref:install-kubernetes.adoc[] -* xref:install-openshift.adoc[] +Added a new `logging.configNameReleasePrefix` boolean to the helm chart. +This defaults to false, but setting it to true will prefix the fluent-bit config with the release name. +Setting this to true for existing clusters will trigger recreation of all pods so should only really be used for new clusters. -== Upgrading to Kubernetes Operator {operator-version-minor} +*https://jira.issues.couchbase.com/browse/K8S-4091/[K8S-4091]*:: -The necessary steps needed to upgrade to this release depend on which version of the Kubernetes Operator you are upgrading from. +Updated the `spec.networking.addressFamily` field to accept `IPv4Only`, `IPv4Priority`, `IPv6Only` and `IPv6Priority`. +The current `IPv4/IPv6` values will have the `Ipv4/6Only` functionality. +I.e. +customers that have set the fields will not see any change. -=== Upgrading from 1.x, 2.0, or 2.1 ++ +These should be considered deprecated and will be removed in a future release. + + +The priority/only choice determines whether `addressFamilyOnly` is true or false. -There is no direct upgrade path from versions prior to 2.2.0. -To upgrade from a 1.x, 2.0.x, or 2.1.x release, you must first upgrade to 2.4.x, paying particular attention to supported Kubernetes platforms and Couchbase Server versions. -See xref:2.4@operator::howto-operator-upgrade.adoc[Couchbase Operator 2.4 upgrade steps] if upgrading from a pre-2.2 release. +*https://jira.issues.couchbase.com/browse/K8S-4097/[K8S-4097]*:: -include::partial$couchbase-operator-release-notes-2.9.0.adoc[] -For further information read the xref:concept-upgrade.adoc[Couchbase Upgrade] concepts page. +The MirWatchdog is an out-of-band check that allows for additional alerting to be in place in the unlikely scenario that an Operator is unable to reconcile a cluster due to reasons outside of its controls/capabilities and which therefore require manual intervention by a user to resolve. +Scenarios include but are not limited to, tls expiration, couchbase authentication errors and loss of quorum. +By default this is disabled, but can be enabled and configured using the `mirWatchdog` field in the couchbase cluster CRD. +If the cluster enters this condition, it will: ++ +* Set the cluster_manual_intervention gauge metric to 1 +* Add (where possible) the `ManualInterventionRequired` condition to the cluster, with a message detailing the reason for entering the MIR state. +* Raise a `ManualInterventionRequired` Kubernetes event, with the event message set to the reason for entering manual intervention +* Optionally, reconciliation will be skipped until the manual intervention required state has been resolved, i.e. +the issue that put the cluster into that condition has been fixed. -== Previous Release Notes +*https://jira.issues.couchbase.com/browse/K8S-4144/[K8S-4144]*:: -* xref:2.8@operator::release-notes.adoc[Couchbase Kubernetes Operator 2.8 Release Notes] +In prior versions of Couchbase Operator, the metrics port annotation (`prometheus.io/port`) was set to 8091, even if TLS was enabled. + It will now correctly set to 18091. + +*https://jira.issues.couchbase.com/browse/K8S-4161/[K8S-4161]*:: + +Operator 2.9.0 now allows you to set `spec.cluster.analytics.numReplicas`. +This feature is only supported for couchbase server versions 7.6+. + +*https://jira.issues.couchbase.com/browse/K8S-4270/[K8S-4270]*:: + +Potentially where we use `kubectl apply` for CRDS, we add a note that this error is possible in 2.9+, and to add `--server-side` to the `kubectl apply` command. + +*https://jira.issues.couchbase.com/browse/K8S-4286/[K8S-4286]*:: + +In the latest build, +the mirWatchdog feature is now set to off by default. +The sequence has been adjusted to move the skip function after the validationRunner, +and changes have been included in the CRD. +Additionally, +the system now skips the DAC during status changes. +These updates aim to streamline operations and improve efficiency. +// Generated by [chatgpt:gpt-4o] -== More Information -* xref:server:release-notes:relnotes.adoc[Couchbase Server Release Notes] \ No newline at end of file From 35c43a35eefbfadafa5d17c4c08dbee09478efe1 Mon Sep 17 00:00:00 2001 From: Tim Fletcher Date: Wed, 10 Dec 2025 03:55:23 +0000 Subject: [PATCH 08/17] Apply suggestions from code review Co-authored-by: tech-comm-team-couchbase --- modules/ROOT/pages/howto-xdcr.adoc | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/modules/ROOT/pages/howto-xdcr.adoc b/modules/ROOT/pages/howto-xdcr.adoc index 4e31648..0c9ece5 100644 --- a/modules/ROOT/pages/howto-xdcr.adoc +++ b/modules/ROOT/pages/howto-xdcr.adoc @@ -101,12 +101,12 @@ spec: <.> The correct hostname to use is the remote cluster's console service to provide stable naming and service discovery. The hostname is calculated as per the xref:howto-client-sdks.adoc#dns-based-addressing[SDK configuration how-to]. -<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication, specify a secret containing a username and password on the remote system. <.> **TLS only:** For TLS connections you need to specify the remote cluster CA certificate in order to verify the remote cluster is trusted. xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-xdcr-remoteclusters-tls-secret[`couchbaseclusters.spec.xdcr.remoteClusters.tls.secret`] documents the secret format. -<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. +<.> Replications are selected that match the labels specified, in this instance the ones that go from this cluster to the remote cluster. <.> **Inter-Kubernetes networking with forwarded DNS only:** the xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-servers-pod[`couchbaseclusters.spec.servers.pod.spec.dnsPolicy`] field tells Kubernetes to provide no default DNS configuration. @@ -217,16 +217,16 @@ spec: <.> The correct hostname to use is the remote cluster's console service to provide stable naming and service discovery. The hostname is calculated as per the xref:howto-client-sdks.adoc#dns-based-addressing-with-external-dns[SDK configuration how-to]. -<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication, specify a secret containing a username and password on the remote system. <.> For TLS connections you need to specify the remote cluster CA certificate in order to verify the remote cluster is trusted. xref:resource/couchbasecluster.adoc#couchbaseclusters-spec-xdcr-remoteclusters-tls-secret[`couchbaseclusters.spec.xdcr.remoteClusters.tls.secret`] documents the secret format. -<.> Replications are selected that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. +<.> Replications are selected that match the labels specified, in this instance the ones that go from this cluster to the remote cluster. == IP Based Addressing -In this discouraged scenario, there is no shared DNS between 2 Kubernetes clusters - we must use IP based addressing. +In this discouraged scenario, there is no shared DNS between 2 Kubernetes clusters - it must use IP based addressing. Pods are exposed by using Kubernetes `NodePort` type services. As there is no DNS, TLS is not supported, so security must be maintained between the 2 clusters using a VPN. @@ -332,9 +332,9 @@ spec: <.> The correct hostname to use. The hostname is calculated as per the xref:howto-client-sdks.adoc#ip-based-addressing[SDK configuration how-to]. -<.> As we're not using client certificate authentication we specify a secret containing a username and password on the remote system. +<.> As we're not using client certificate authentication, specify a secret containing a username and password on the remote system. -<.> Finally we select replications that match the labels we specify, in this instance the ones that go from this cluster to the remote cluster. +<.> Finally, select replications that match the labels specified, in this instance the ones that go from this cluster to the remote cluster. == Scopes and collections support @@ -410,7 +410,7 @@ Eventual consistency rules apply so if the bucket is still being created then we <.> This is an example of replicating only a specific collection `collection1` in scope `scope1`. -<.> The target keyspace must be of identical size so as we're replicating from a collection we must also specify a target collection. +<.> The target keyspace must be of identical size; as replicating from a collection requires also specifying a target collection. <.> Deny rules can be used to prevent replication of specific keyspaces. This is useful if for example you have a scope with a large number of collections and you want to replicate all but a small number. From 7a62a819136f5344ef0f0c95d5691a00db03b0eb Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 08:02:36 +0530 Subject: [PATCH 09/17] Added content from pr 48 --- .../ROOT/pages/tutorial-avx2-scheduling.adoc | 503 ++++++++++++++++++ 1 file changed, 503 insertions(+) create mode 100644 modules/ROOT/pages/tutorial-avx2-scheduling.adoc diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc new file mode 100644 index 0000000..140a6dc --- /dev/null +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -0,0 +1,503 @@ += AVX2-Aware Scheduling for Couchbase Server + +[abstract] +This tutorial covers how to detect AVX2 CPU extension / x86-64-v3 microarchitecture on Kubernetes nodes, label nodes accordingly, and configure CouchbaseCluster resources to schedule pods only on compatible nodes. + +include::partial$tutorial.adoc[] + +== Background and Motivation + +Starting with **Couchbase Server 8.0**, vector search performance (FTS/GSI) benefits significantly from **AVX2-capable CPUs** on x86-64 nodes. + +=== What is AVX2? + +AVX2 (Advanced Vector Extensions 2) is: + +* A SIMD instruction set available on modern Intel and AMD x86-64 CPUs +* Required for high-performance vectorized operations +* Part of the x86-64-v3 microarchitecture level (along with BMI1, BMI2, and FMA) +* **Not guaranteed** on all cloud VM types +* **Not automatically enforced** by Kubernetes scheduling + +[IMPORTANT] +==== +Kubernetes clusters *must explicitly detect CPU capabilities and constrain scheduling* to ensure Couchbase Server pods land on AVX2-capable nodes. +==== + +== Solution Overview + +This tutorial solves the problem in three layers: + +1. **Node labeling** — detect which nodes support AVX2 +2. **Scheduler constraints** — ensure pods only land on valid nodes +3. **Cloud provisioning** — ensure node pools contain AVX2-capable CPUs + +Two node-labeling approaches are covered: + +* A **simple custom DaemonSet** (lightweight, minimal dependencies) +* **Node Feature Discovery (NFD)** (recommended for production) + +== Method 1: Simple AVX2 Node Labeling via DaemonSet + +This is a lightweight solution when NFD is unavailable or when you prefer minimal dependencies. + +=== How It Works + +* Runs on every node as a DaemonSet +* Reads `/proc/cpuinfo` from the host +* Checks for the `avx2` flag +* Labels the node if AVX2 is present + +=== Label Applied + +[source] +---- +cpu.feature/AVX2=true +---- + +=== DaemonSet YAML + +Create a file named `avx2-node-labeler.yaml`: + +[source,yaml] +---- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: avx2-labeler-sa + namespace: kube-system +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: avx2-labeler-role +rules: +- apiGroups: [""] + resources: ["nodes"] + verbs: ["get", "patch", "update"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: avx2-labeler-binding +subjects: +- kind: ServiceAccount + name: avx2-labeler-sa + namespace: kube-system +roleRef: + kind: ClusterRole + name: avx2-labeler-role + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: avx2-node-labeler + namespace: kube-system +spec: + selector: + matchLabels: + app: avx2-node-labeler + template: + metadata: + labels: + app: avx2-node-labeler + spec: + serviceAccountName: avx2-labeler-sa + containers: + - name: labeler + image: bitnami/kubectl:latest + command: + - /bin/bash + - -c + - | + if grep -qi "avx2" /host/proc/cpuinfo; then + kubectl label node "$NODE_NAME" cpu.feature/AVX2=true --overwrite + fi + sleep infinity + env: + - name: NODE_NAME + valueFrom: + fieldRef: + fieldPath: spec.nodeName + volumeMounts: + - name: host-proc + mountPath: /host/proc + readOnly: true + volumes: + - name: host-proc + hostPath: + path: /proc +---- + +=== Apply the DaemonSet + +[source,console] +---- +kubectl apply -f avx2-node-labeler.yaml +---- + +=== Verify Labels + +[source,console] +---- +kubectl get nodes -L cpu.feature/AVX2 +---- + +== Method 2: Node Feature Discovery (NFD) — Recommended + +**Node Feature Discovery (NFD)** is a Kubernetes SIG project that automatically detects hardware features and labels nodes. + +=== NFD AVX2 Label + +NFD uses the following standardized label for AVX2: + +[source] +---- +feature.node.kubernetes.io/cpu-cpuid.AVX2=true +---- + +This label is standardized and safe to rely on across all environments. + +=== Install NFD Using kubectl + +[source,console] +---- +kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.18.3" +---- + +Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. + +=== Install NFD Using Helm + +[source,console] +---- +helm install nfd \ + oci://registry.k8s.io/nfd/charts/node-feature-discovery \ + --version 0.18.3 \ + --namespace node-feature-discovery \ + --create-namespace + +---- + +Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. + +=== Verify NFD Labels + +[source,console] +---- +kubectl get nodes -L feature.node.kubernetes.io/cpu-cpuid.AVX2 +---- + +== Pod Scheduling with nodeAffinity + +Once nodes are labeled, configure your CouchbaseCluster to schedule pods only on AVX2-capable nodes. + +=== Strict AVX2 Scheduling (Recommended) + +Use `requiredDuringSchedulingIgnoredDuringExecution` to enforce AVX2 requirements: + +[source,yaml] +---- +spec: + servers: + - name: data-nodes + size: 3 + services: + - data + - index + - query + pod: + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: feature.node.kubernetes.io/cpu-cpuid.AVX2 + operator: In + values: + - "true" +---- + +=== Soft Preference (Fallback Allowed) + +Use `preferredDuringSchedulingIgnoredDuringExecution` if you want AVX2 to be preferred but not required: + +[source,yaml] +---- +spec: + servers: + - name: data-nodes + size: 3 + services: + - data + pod: + spec: + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + preference: + matchExpressions: + - key: feature.node.kubernetes.io/cpu-cpuid.AVX2 + operator: In + values: + - "true" +---- + +== Google Kubernetes Engine (GKE) + +GKE requires special care because node pools may use mixed CPU generations and AVX2 is not guaranteed by default. + +=== GKE AVX2 Guarantees + +[cols="1,1"] +|=== +|Guarantee |Status + +|AVX2 by machine type +|Not guaranteed + +|AVX2 by region +|Not guaranteed + +|AVX2 by default +|Not guaranteed + +|AVX2 via min CPU platform +|Guaranteed +|=== + +=== Creating a GKE Node Pool with AVX2 + +**Step 1:** Choose a modern machine family (`n2`, `c2`, `c3`, `n4`, `m2`, `m3`, ...) + +**Step 2:** Enforce minimum CPU platform: + +[source,console] +---- +gcloud container node-pools create avx2-pool \ + --cluster=my-cluster \ + --region=us-central1 \ + --machine-type=n2-standard-4 \ + --min-cpu-platform="Intel Cascade Lake" \ + --num-nodes=3 \ + --node-labels=cpu=avx2 +---- + +Pin min-cpu-platform ≥ Intel Haswell or AMD Rome +Verify online for a comprehensive list of AVX2-capable VM series. + +This guarantees AVX2 at the infrastructure level. + +=== GKE Automatic Node Labels + +GKE automatically applies the following label: + +[source] +---- +cloud.google.com/gke-nodepool= +---- + +=== GKE nodeAffinity Pattern + +[source,yaml] +---- +spec: + servers: + - name: data-nodes + size: 3 + services: + - data + - index + - query + pod: + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: cloud.google.com/gke-nodepool + operator: In + values: + - avx2-pool + +---- + +== Amazon EKS + +=== AVX2-Capable Instance Types + +The following EC2 instance families support AVX2: + +* **Intel**: M5, C5, R5, M6i, C6i, R6i, M7i, C7i (and newer) +* **AMD**: M5a, C5a, R5a, M6a, C6a, R6a (and newer) + +Verify online for a comprehensive list of AVX2-capable instance types. + +=== Creating an EKS Node Group + +[source,console] +---- +eksctl create nodegroup \ + --cluster my-cluster \ + --name avx2-ng \ + --node-type c6i.large \ + --nodes 3 \ + --node-labels cpu=avx2 +---- + +=== EKS nodeAffinity Pattern + +[source,yaml] +---- +spec: + servers: + - name: data-nodes + size: 3 + services: + - data + - index + - query + pod: + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: cpu + operator: In + values: + - avx2 +---- + +You can also use the automatic instance type label: + +[source,yaml] +---- +- key: node.kubernetes.io/instance-type + operator: In + values: + - c6i.large + - c6i.xlarge +---- + +== Azure AKS + +=== AVX2-Capable VM Series + +The following Azure VM series support AVX2: + +* **Dv3, Ev3** (Haswell/Broadwell) +* **Dv4, Ev4** (Cascade Lake) +* **Dv5, Ev5** (Ice Lake) + +Verify online for a comprehensive list of AVX2-capable VM series. + +=== Creating an AKS Node Pool + +[source,console] +---- +az aks nodepool add \ + --resource-group rg \ + --cluster-name my-aks \ + --name avx2pool \ + --node-vm-size Standard_D8s_v5 \ + --node-count 3 \ + --labels cpu=avx2 +---- + +=== AKS nodeAffinity Pattern + +[source,yaml] +---- +spec: + servers: + - name: data-nodes + size: 3 + services: + - data + - index + - query + pod: + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: cpu + operator: In + values: + - avx2 +---- + +== Complete CouchbaseCluster Example + +Here is a complete example combining all best practices: + +[source,yaml] +---- +apiVersion: v1 +kind: Secret +metadata: + name: cb-example-auth +type: Opaque +data: + username: QWRtaW5pc3RyYXRvcg== + password: cGFzc3dvcmQ= +--- +apiVersion: couchbase.com/v2 +kind: CouchbaseCluster +metadata: + name: cb-example +spec: + image: couchbase/server:8.0.0 + security: + adminSecret: cb-example-auth + buckets: + managed: true + servers: + - name: data-nodes + size: 3 + services: + - data + - index + - query + pod: + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: feature.node.kubernetes.io/cpu-cpuid.AVX2 + operator: In + values: + - "true" + # Alternative using custom DaemonSet label: + # - key: cpu.feature/AVX2 + # operator: In + # values: + # - "true" +---- + +== Troubleshooting + + +=== Verify Node Labels + +[source,console] +---- +# For NFD labels +kubectl get nodes -o custom-columns=\ +NAME:.metadata.name,\ +AVX2:.metadata.labels."feature\.node\.kubernetes\.io/cpu-cpuid\.AVX2" + +# For custom labels (Using the DaemonSet) +kubectl get nodes -L cpu.feature/AVX2 +---- + From c6302a22c545f93a16c0a518fc8ef75bd3e9c182 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 09:07:14 +0530 Subject: [PATCH 10/17] Updated nav n prerequisite-and-setup files --- modules/ROOT/nav.adoc | 2 ++ modules/ROOT/pages/prerequisite-and-setup.adoc | 2 ++ 2 files changed, 4 insertions(+) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 3771de2..2cc92aa 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -147,6 +147,8 @@ include::partial$autogen-reference.adoc[] ** xref:tutorial-kubernetes-network-policy.adoc[Kubernetes Network Policies Using Deny-All Default] * Persistent Volumes ** xref:tutorial-volume-expansion.adoc[Persistent Volume Expansion] +* Scheduling + ** xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] * Sync Gateway ** xref:tutorial-sync-gateway.adoc[Connecting Sync-Gateway to a Couchbase Cluster] ** xref:tutorial-sync-gateway-clients.adoc[Exposing Sync-Gateway to Couchbase Lite Clients] diff --git a/modules/ROOT/pages/prerequisite-and-setup.adoc b/modules/ROOT/pages/prerequisite-and-setup.adoc index 3d4354a..44674c5 100644 --- a/modules/ROOT/pages/prerequisite-and-setup.adoc +++ b/modules/ROOT/pages/prerequisite-and-setup.adoc @@ -177,6 +177,8 @@ The architecture of each node must be uniform across the cluster as the use of m NOTE: The official Couchbase docker repository contains multi-arch images which do not require explicit references to architecture tags when being pulled and deployed. However, when pulling from a private repository, or performing intermediate processing on a machine with a different architecture than the deployed cluster, the use of explicit tags may be required to ensure the correct images are deployed. +IMPORTANT: For optimal performance with Couchbase Server 8.0+, especially for vector search (FTS/GSI) workloads, ensure your nodes support AVX2 CPU instructions (x86-64-v3 microarchitecture). Refer to xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] for detailed guidance on detecting and scheduling pods on AVX2-capable nodes. + == RBAC and Networking Requirements Preparing the Kubernetes cluster to run the Operator may require setting up proper RBAC and network settings in your Kubernetes cluster. From 0633d9c408c28ed6bee0970a34b83803fc6a98ec Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 09:20:47 +0530 Subject: [PATCH 11/17] Added preview yml --- modules/ROOT/pages/prerequisite-and-setup.adoc | 3 ++- preview/HEAD.yml | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/modules/ROOT/pages/prerequisite-and-setup.adoc b/modules/ROOT/pages/prerequisite-and-setup.adoc index 44674c5..6bea8fe 100644 --- a/modules/ROOT/pages/prerequisite-and-setup.adoc +++ b/modules/ROOT/pages/prerequisite-and-setup.adoc @@ -177,7 +177,8 @@ The architecture of each node must be uniform across the cluster as the use of m NOTE: The official Couchbase docker repository contains multi-arch images which do not require explicit references to architecture tags when being pulled and deployed. However, when pulling from a private repository, or performing intermediate processing on a machine with a different architecture than the deployed cluster, the use of explicit tags may be required to ensure the correct images are deployed. -IMPORTANT: For optimal performance with Couchbase Server 8.0+, especially for vector search (FTS/GSI) workloads, ensure your nodes support AVX2 CPU instructions (x86-64-v3 microarchitecture). Refer to xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] for detailed guidance on detecting and scheduling pods on AVX2-capable nodes. +IMPORTANT: For optimal performance with Couchbase Server 8.0 and later versions, in particular for vector search (FTS and GSI) workloads, use nodes that support AVX2 CPU instructions (x86-64-v3 Microarchitecture). +For guidance on detecting AVX2 support and scheduling pods on AVX2-capable nodes, see xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server]. == RBAC and Networking Requirements diff --git a/preview/HEAD.yml b/preview/HEAD.yml index 3736c35..a29fd69 100644 --- a/preview/HEAD.yml +++ b/preview/HEAD.yml @@ -3,4 +3,4 @@ sources: branches: [release/8.0] docs-operator: - branches: [DOC-13656-Create-release-note-for-Couchbase-Operator-2.9.0, release/2.8] \ No newline at end of file + branches: [DOC-13857-tutorial-to-detect-avx2, release/2.8] \ No newline at end of file From 201029468508ed6802a1c00d70bfc6d393d1ba57 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 14:14:10 +0530 Subject: [PATCH 12/17] Edited-structured-added-lead-in-and-then-rewrote --- modules/ROOT/nav.adoc | 2 +- .../ROOT/pages/tutorial-avx2-scheduling.adoc | 317 +++++++++++------- 2 files changed, 203 insertions(+), 116 deletions(-) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 2cc92aa..6698900 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -148,7 +148,7 @@ include::partial$autogen-reference.adoc[] * Persistent Volumes ** xref:tutorial-volume-expansion.adoc[Persistent Volume Expansion] * Scheduling - ** xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] + ** xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] * Sync Gateway ** xref:tutorial-sync-gateway.adoc[Connecting Sync-Gateway to a Couchbase Cluster] ** xref:tutorial-sync-gateway-clients.adoc[Exposing Sync-Gateway to Couchbase Lite Clients] diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index 140a6dc..e29b9c5 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -1,63 +1,141 @@ = AVX2-Aware Scheduling for Couchbase Server [abstract] -This tutorial covers how to detect AVX2 CPU extension / x86-64-v3 microarchitecture on Kubernetes nodes, label nodes accordingly, and configure CouchbaseCluster resources to schedule pods only on compatible nodes. +This tutorial explains how to detect the AVX2 CPU extension and x86-64-v3 Microarchitecture on Kubernetes nodes, label nodes accordingly, and configure CouchbaseCluster resources to schedule pods only on compatible nodes. include::partial$tutorial.adoc[] -== Background and Motivation +== Background -Starting with **Couchbase Server 8.0**, vector search performance (FTS/GSI) benefits significantly from **AVX2-capable CPUs** on x86-64 nodes. +Starting with Couchbase Server 8.0, Vector Search (FTS and GSI) performance benefits from AVX2-capable CPUs on x86-64 nodes. -=== What is AVX2? +=== What's Advanced Vector Extensions 2 (AVX2) -AVX2 (Advanced Vector Extensions 2) is: +AVX2 is: -* A SIMD instruction set available on modern Intel and AMD x86-64 CPUs +* An SIMD instruction set available on modern Intel and AMD x86-64 CPUs * Required for high-performance vectorized operations -* Part of the x86-64-v3 microarchitecture level (along with BMI1, BMI2, and FMA) -* **Not guaranteed** on all cloud VM types -* **Not automatically enforced** by Kubernetes scheduling +* Part of the x86-64-v3 Microarchitecture level, along with BMI1, BMI2, and FMA +* Not guaranteed on all cloud VM types +* Not enforced by default in Kubernetes scheduling -[IMPORTANT] -==== -Kubernetes clusters *must explicitly detect CPU capabilities and constrain scheduling* to ensure Couchbase Server pods land on AVX2-capable nodes. -==== +IMPORTANT: Kubernetes clusters must explicitly detect CPU capabilities and restrict scheduling to make sure Couchbase Server pods run on AVX2-capable nodes. -== Solution Overview +== AVX2-Aware Scheduling Approach -This tutorial solves the problem in three layers: +This tutorial approaches the problem through the following layers: -1. **Node labeling** — detect which nodes support AVX2 -2. **Scheduler constraints** — ensure pods only land on valid nodes -3. **Cloud provisioning** — ensure node pools contain AVX2-capable CPUs +* <<#node-labeling-methods,*Node labeling*>>: Detect nodes that support AVX2. +* <<#pod-scheduling-with-nodeaffinity,*Scheduler constraints*>>: Schedule pods only on compatible nodes. +* <<#cloud-specific-node-provisioning,*Cloud provisioning*>>: Make sure node pools use AVX2-capable CPUs. -Two node-labeling approaches are covered: +[#node-labeling-methods] +== Node Labeling Methods -* A **simple custom DaemonSet** (lightweight, minimal dependencies) -* **Node Feature Discovery (NFD)** (recommended for production) +Use one of the following methods to label Kubernetes nodes that support AVX2: -== Method 1: Simple AVX2 Node Labeling via DaemonSet +* <<#node-labeling-via-nfd, *Node Feature Discovery (NFD)*>>: Recommended for production environments +* <<#node-labeling-via-daemonset, *A custom DaemonSet*>>: Provides a direct, lightweight option with minimal dependencies -This is a lightweight solution when NFD is unavailable or when you prefer minimal dependencies. +[#node-labeling-via-nfd] +=== Method 1: Node Feature Discovery (Recommended) -=== How It Works +Node Feature Discovery (NFD) is a Kubernetes SIG project that detects hardware features and labels nodes automatically. -* Runs on every node as a DaemonSet +IMPORTANT: Couchbase recommends this method for production environments. + +Use the following steps to label Kubernetes nodes that support AVX2 using NFD: + +. <<#avx2-node-label-used-by-nfd, NFD to detect AVX2 support>> +. Install NFD by using your preferred method +** <<#install-nfd-kubectl, Install NFD by Using kubectl>> +** <<#install-nfd-helm, Install NFD by Using Helm>> +. <<#verify-nfd-node-labels, Verify NFD Node Labels>> + +[#avx2-node-label-used-by-nfd] +==== AVX2 Node Label Used by NFD + +NFD applies the following standardized node label to indicate AVX2 support. + +[source] +---- +feature.node.kubernetes.io/cpu-cpuid.AVX2=true +---- + +This label follows a standard format and is safe to use across environments. + +[#install-nfd-kubectl] +==== Install NFD by Using kubectl + +Install NFD on the cluster by using `kubectl`. +Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. + +[source,console] +---- +kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.18.3" +---- + +[#install-nfd-helm] +==== Install NFD by Using Helm + +Install NFD on the cluster by using Helm. +Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. + +[source,console] +---- +helm install nfd \ + oci://registry.k8s.io/nfd/charts/node-feature-discovery \ + --version 0.18.3 \ + --namespace node-feature-discovery \ + --create-namespace + +---- + +[#verify-nfd-node-labels] +==== Verify NFD Node Labels + +Verify that NFD applies the AVX2 label to supported nodes. + +[source,console] +---- +kubectl get nodes -L feature.node.kubernetes.io/cpu-cpuid.AVX2 +---- + +[#node-labeling-via-daemonset] +=== Method 2: AVX2 Node Labeling via DaemonSet + +This approach provides a lightweight option when NFD is unavailable or when you want to limit dependencies. + +==== AVX2 Node Labeling Process + +The DaemonSet uses the following process to detect AVX2 support and label nodes: + +* Runs as a DaemonSet on every node * Reads `/proc/cpuinfo` from the host * Checks for the `avx2` flag -* Labels the node if AVX2 is present +* Labels the node when AVX2 support is present -=== Label Applied +Use the following steps to label Kubernetes nodes that support AVX2: + +. <<#define-avx2-label, Define the AVX2 node label>> +. <<#create-daemonset-manifest, Create the DaemonSet manifest>> +. <<#deploy-daemonset, Deploy the DaemonSet>> +. <<#verify-node-labels, Verify node labels>> + +[#define-avx2-label] +==== Define the AVX2 Node Label + +Define the AVX2 node label to identify nodes that support the AVX2 CPU extension. [source] ---- cpu.feature/AVX2=true ---- -=== DaemonSet YAML +[#create-daemonset-manifest] +==== Create the DaemonSet Manifest -Create a file named `avx2-node-labeler.yaml`: +Create a DaemonSet manifest named `avx2-node-labeler.yaml` with the following content that detects AVX2 support and applies the node label. [source,yaml] ---- @@ -130,72 +208,38 @@ spec: path: /proc ---- -=== Apply the DaemonSet +[#deploy-daemonset] +==== Deploy the DaemonSet -[source,console] ----- -kubectl apply -f avx2-node-labeler.yaml ----- - -=== Verify Labels - -[source,console] ----- -kubectl get nodes -L cpu.feature/AVX2 ----- - -== Method 2: Node Feature Discovery (NFD) — Recommended - -**Node Feature Discovery (NFD)** is a Kubernetes SIG project that automatically detects hardware features and labels nodes. - -=== NFD AVX2 Label - -NFD uses the following standardized label for AVX2: - -[source] ----- -feature.node.kubernetes.io/cpu-cpuid.AVX2=true ----- - -This label is standardized and safe to rely on across all environments. - -=== Install NFD Using kubectl +Deploy the DaemonSet to run the AVX2 detection process on all nodes. [source,console] ---- -kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.18.3" +kubectl apply -f avx2-node-labeler.yaml ---- -Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. +[#verify-node-labels] +==== Verify Node Labels -=== Install NFD Using Helm +Verify that Kubernetes correctly applies the AVX2 label to supported nodes. [source,console] ---- -helm install nfd \ - oci://registry.k8s.io/nfd/charts/node-feature-discovery \ - --version 0.18.3 \ - --namespace node-feature-discovery \ - --create-namespace - +kubectl get nodes -L cpu.feature/AVX2 ---- -Replace `v0.18.3` with the latest release tag from the https://github.com/kubernetes-sigs/node-feature-discovery/releases[NFD releases page]. - -=== Verify NFD Labels +[#pod-scheduling-with-nodeaffinity] +== Pod Scheduling by Using nodeAffinity -[source,console] ----- -kubectl get nodes -L feature.node.kubernetes.io/cpu-cpuid.AVX2 ----- - -== Pod Scheduling with nodeAffinity +After you label nodes, configure the CouchbaseCluster resource to restrict pod scheduling to AVX2-capable nodes in one of the following ways: -Once nodes are labeled, configure your CouchbaseCluster to schedule pods only on AVX2-capable nodes. +* <<#enforce-avx2-scheduling, *Enforce AVX2 Scheduling*>>: Recommended +* <<#prefer-avx2-scheduling, *Prefer AVX2 Scheduling*>>: Fallback allowed -=== Strict AVX2 Scheduling (Recommended) +[#enforce-avx2-scheduling] +=== Enforce AVX2 Scheduling (Recommended) -Use `requiredDuringSchedulingIgnoredDuringExecution` to enforce AVX2 requirements: +Use `requiredDuringSchedulingIgnoredDuringExecution` to enforce AVX2 requirements during pod scheduling. [source,yaml] ---- @@ -220,9 +264,10 @@ spec: - "true" ---- -=== Soft Preference (Fallback Allowed) +[#prefer-avx2-scheduling] +=== Prefer AVX2 Scheduling (Fallback Allowed) -Use `preferredDuringSchedulingIgnoredDuringExecution` if you want AVX2 to be preferred but not required: +Use `preferredDuringSchedulingIgnoredDuringExecution` to prefer AVX2-capable nodes while allowing scheduling on other nodes. [source,yaml] ---- @@ -246,11 +291,21 @@ spec: - "true" ---- -== Google Kubernetes Engine (GKE) +[#cloud-specific-node-provisioning] +== Cloud-Specific Node Provisioning + +Cloud providers expose CPU capabilities and node selection options differently. +Use the following cloud platform-specific guidance to provision nodes with AVX2 support. -GKE requires special care because node pools may use mixed CPU generations and AVX2 is not guaranteed by default. +[#google-gke] +=== Google Kubernetes Engine (GKE) -=== GKE AVX2 Guarantees +GKE requires additional consideration because node pools can include mixed CPU generations and do not guarantee AVX2 support by default. + +[#gke-avx2-guarantees] +==== AVX2 Support Guarantees in GKE + +The following table summarizes how GKE guarantees AVX2 support under different configurations. [cols="1,1"] |=== @@ -269,12 +324,16 @@ GKE requires special care because node pools may use mixed CPU generations and A |Guaranteed |=== -=== Creating a GKE Node Pool with AVX2 +[#creating-gke-node-pool-with-avx2] +==== Create a GKE Node Pool with AVX2 Support -**Step 1:** Choose a modern machine family (`n2`, `c2`, `c3`, `n4`, `m2`, `m3`, ...) +Use the following steps to create a GKE node pool that guarantees AVX2 support. -**Step 2:** Enforce minimum CPU platform: +. Select a compatible machine family, such as `n2`, `c2`, `c3`, `n4`, `m2`, `m3`, and so on. +. Enforce a minimum CPU platform that supports AVX2. ++ +-- [source,console] ---- gcloud container node-pools create avx2-pool \ @@ -285,22 +344,28 @@ gcloud container node-pools create avx2-pool \ --num-nodes=3 \ --node-labels=cpu=avx2 ---- +-- + +. Set the minimum CPU platform (`min-cpu-platform`) to Intel Haswell or AMD Rome, or a newer generation. -Pin min-cpu-platform ≥ Intel Haswell or AMD Rome -Verify online for a comprehensive list of AVX2-capable VM series. +. Verify the selected VM series supports AVX2 by referring to the provider documentation. -This guarantees AVX2 at the infrastructure level. +This configuration guarantees AVX2 support at the infrastructure level. -=== GKE Automatic Node Labels +[#gke-automatic-node-labels] +==== GKE Automatic Node Labels -GKE automatically applies the following label: +GKE automatically applies node labels that identify the node pool associated with each node. [source] ---- cloud.google.com/gke-nodepool= ---- -=== GKE nodeAffinity Pattern +[#gke-node-affinity-pattern] +==== GKE nodeAffinity Pattern + +Use node affinity to restrict pod scheduling to a specific GKE node pool. [source,yaml] ---- @@ -326,18 +391,25 @@ spec: ---- -== Amazon EKS +[#amazon-eks] +=== Amazon Elastic Kubernetes Service (EKS) + +Use the following sections to provision AVX2-capable nodes and configure pod scheduling in Amazon Elastic Kubernetes Service (EKS). + +[#eks-avx2-capable-instance-types] +==== AVX2-Capable EC2 Instance Types -=== AVX2-Capable Instance Types +The following EC2 instance families support AVX2 instructions: -The following EC2 instance families support AVX2: +* *Intel*: M5, C5, R5, M6i, C6i, R6i, M7i, C7i and newer generations +* *AMD*: M5a, C5a, R5a, M6a, C6a, R6a and newer generations -* **Intel**: M5, C5, R5, M6i, C6i, R6i, M7i, C7i (and newer) -* **AMD**: M5a, C5a, R5a, M6a, C6a, R6a (and newer) +Verify the selected instance type supports AVX2 by referring to the provider documentation. -Verify online for a comprehensive list of AVX2-capable instance types. +[#creating-eks-node-group-with-avx2] +==== Create an EKS Node Group with AVX2 Support -=== Creating an EKS Node Group +Create an EKS node group by using AVX2-capable instance types and apply a node label to identify supported nodes. [source,console] ---- @@ -349,7 +421,10 @@ eksctl create nodegroup \ --node-labels cpu=avx2 ---- -=== EKS nodeAffinity Pattern +[#eks-node-affinity-configuration] +==== EKS nodeAffinity Configuration + +Use node affinity to restrict pod scheduling to AVX2-capable nodes. [source,yaml] ---- @@ -374,7 +449,7 @@ spec: - avx2 ---- -You can also use the automatic instance type label: +You can also restrict scheduling by using the automatic instance type label: [source,yaml] ---- @@ -385,19 +460,26 @@ You can also use the automatic instance type label: - c6i.xlarge ---- -== Azure AKS +[#azure-aks] +=== Azure Kubernetes Service (AKS) + +Use the following sections to provision AVX2-capable nodes and configure pod scheduling in Azure AKS. + +[#aks-avx2-capable-vm-series] +==== AVX2-Capable Azure VM Series -=== AVX2-Capable VM Series +The following Azure VM series support AVX2 instructions: -The following Azure VM series support AVX2: +* Dv3 and Ev3 VM series, based on Intel Haswell and Broadwell processors +* Dv4 and Ev4 VM series, based on Intel Cascade Lake processors +* Dv5 and Ev5 VM series, based on Intel Ice Lake processors -* **Dv3, Ev3** (Haswell/Broadwell) -* **Dv4, Ev4** (Cascade Lake) -* **Dv5, Ev5** (Ice Lake) +Verify the selected VM series supports AVX2 by referring to the Azure documentation. -Verify online for a comprehensive list of AVX2-capable VM series. +[#creating-aks-node-pool-with-avx2] +==== Create an AKS Node Pool with AVX2 Support -=== Creating an AKS Node Pool +Create an AKS node pool by using an AVX2-capable VM series and apply a node label to identify supported nodes. [source,console] ---- @@ -410,7 +492,10 @@ az aks nodepool add \ --labels cpu=avx2 ---- -=== AKS nodeAffinity Pattern +[#aks-node-affinity-pattern] +==== AKS nodeAffinity Configuration + +Use node affinity to restrict pod scheduling to AVX2-capable nodes. [source,yaml] ---- @@ -435,9 +520,9 @@ spec: - avx2 ---- -== Complete CouchbaseCluster Example +== A Complete CouchbaseCluster Example -Here is a complete example combining all best practices: +Here's a complete example combining all best practices. [source,yaml] ---- @@ -487,8 +572,11 @@ spec: == Troubleshooting +Use the following checks to confirm that Kubernetes applies AVX2 node labels as expected. + +=== Verify AVX2 Node Labels -=== Verify Node Labels +Verify that nodes expose the expected AVX2 labels, based on the labeling method you use. [source,console] ---- @@ -500,4 +588,3 @@ AVX2:.metadata.labels."feature\.node\.kubernetes\.io/cpu-cpuid\.AVX2" # For custom labels (Using the DaemonSet) kubectl get nodes -L cpu.feature/AVX2 ---- - From bda3ddf7601381e312af0f23acd2f1dcf420af87 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 14:36:52 +0530 Subject: [PATCH 13/17] Set toc levels --- modules/ROOT/pages/tutorial-avx2-scheduling.adoc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index e29b9c5..328edc1 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -1,5 +1,8 @@ = AVX2-Aware Scheduling for Couchbase Server +:page-toclevels: 2 +:page-category: Tutorials + [abstract] This tutorial explains how to detect the AVX2 CPU extension and x86-64-v3 Microarchitecture on Kubernetes nodes, label nodes accordingly, and configure CouchbaseCluster resources to schedule pods only on compatible nodes. @@ -115,7 +118,7 @@ The DaemonSet uses the following process to detect AVX2 support and label nodes: * Checks for the `avx2` flag * Labels the node when AVX2 support is present -Use the following steps to label Kubernetes nodes that support AVX2: +Use the following steps to label Kubernetes nodes that support AVX2 by using a custom DaemonSet: . <<#define-avx2-label, Define the AVX2 node label>> . <<#create-daemonset-manifest, Create the DaemonSet manifest>> From 21a68b5c878d4c5445410582ea87765df2547cfa Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 14:46:03 +0530 Subject: [PATCH 14/17] Fixed the header --- modules/ROOT/nav.adoc | 2 +- modules/ROOT/pages/tutorial-avx2-scheduling.adoc | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 6698900..6a13c23 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -148,7 +148,7 @@ include::partial$autogen-reference.adoc[] * Persistent Volumes ** xref:tutorial-volume-expansion.adoc[Persistent Volume Expansion] * Scheduling - ** xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling for Couchbase Server] + ** xref:tutorial-avx2-scheduling.adoc[AVX2-Aware Scheduling] * Sync Gateway ** xref:tutorial-sync-gateway.adoc[Connecting Sync-Gateway to a Couchbase Cluster] ** xref:tutorial-sync-gateway-clients.adoc[Exposing Sync-Gateway to Couchbase Lite Clients] diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index 328edc1..42fedd4 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -1,5 +1,4 @@ = AVX2-Aware Scheduling for Couchbase Server - :page-toclevels: 2 :page-category: Tutorials From 207ddb493ae08118eb8fe80645fe6cb798ea9fe4 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 15:45:35 +0530 Subject: [PATCH 15/17] Removed page category variable --- modules/ROOT/pages/tutorial-avx2-scheduling.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index 42fedd4..320a347 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -1,6 +1,5 @@ = AVX2-Aware Scheduling for Couchbase Server :page-toclevels: 2 -:page-category: Tutorials [abstract] This tutorial explains how to detect the AVX2 CPU extension and x86-64-v3 Microarchitecture on Kubernetes nodes, label nodes accordingly, and configure CouchbaseCluster resources to schedule pods only on compatible nodes. From 345cd678b11765e1d5582c4b0d6301e127b7f725 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 15:53:23 +0530 Subject: [PATCH 16/17] Minor edit --- modules/ROOT/pages/tutorial-avx2-scheduling.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index 320a347..2597c0d 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -10,7 +10,7 @@ include::partial$tutorial.adoc[] Starting with Couchbase Server 8.0, Vector Search (FTS and GSI) performance benefits from AVX2-capable CPUs on x86-64 nodes. -=== What's Advanced Vector Extensions 2 (AVX2) +=== What's an Advanced Vector Extensions 2 (AVX2) AVX2 is: From c679ca1478b3adc9129efcda11ee6f2f0c3d0b97 Mon Sep 17 00:00:00 2001 From: Shwetha Rao Date: Thu, 18 Dec 2025 20:49:05 +0530 Subject: [PATCH 17/17] Implemented peer review comments --- .../ROOT/pages/tutorial-avx2-scheduling.adoc | 39 ++++++++++--------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc index 2597c0d..2fb0cc9 100644 --- a/modules/ROOT/pages/tutorial-avx2-scheduling.adoc +++ b/modules/ROOT/pages/tutorial-avx2-scheduling.adoc @@ -10,15 +10,15 @@ include::partial$tutorial.adoc[] Starting with Couchbase Server 8.0, Vector Search (FTS and GSI) performance benefits from AVX2-capable CPUs on x86-64 nodes. -=== What's an Advanced Vector Extensions 2 (AVX2) +=== What is Advanced Vector Extensions 2 (AVX2) AVX2 is: -* An SIMD instruction set available on modern Intel and AMD x86-64 CPUs -* Required for high-performance vectorized operations -* Part of the x86-64-v3 Microarchitecture level, along with BMI1, BMI2, and FMA -* Not guaranteed on all cloud VM types -* Not enforced by default in Kubernetes scheduling +* An SIMD instruction set available on modern Intel and AMD x86-64 CPUs. +* Required for high-performance vectorized operations. +* Part of the x86-64-v3 Microarchitecture level, along with BMI1, BMI2, and FMA. +* Not guaranteed on all cloud VM types. +* Not enforced by default in Kubernetes scheduling. IMPORTANT: Kubernetes clusters must explicitly detect CPU capabilities and restrict scheduling to make sure Couchbase Server pods run on AVX2-capable nodes. @@ -35,8 +35,8 @@ This tutorial approaches the problem through the following layers: Use one of the following methods to label Kubernetes nodes that support AVX2: -* <<#node-labeling-via-nfd, *Node Feature Discovery (NFD)*>>: Recommended for production environments -* <<#node-labeling-via-daemonset, *A custom DaemonSet*>>: Provides a direct, lightweight option with minimal dependencies +* <<#node-labeling-via-nfd, *Node Feature Discovery (NFD)*>>: Recommended for production environments. +* <<#node-labeling-via-daemonset, *A custom DaemonSet*>>: Provides a direct, lightweight option with minimal dependencies. [#node-labeling-via-nfd] === Method 1: Node Feature Discovery (Recommended) @@ -111,10 +111,10 @@ This approach provides a lightweight option when NFD is unavailable or when you The DaemonSet uses the following process to detect AVX2 support and label nodes: -* Runs as a DaemonSet on every node -* Reads `/proc/cpuinfo` from the host -* Checks for the `avx2` flag -* Labels the node when AVX2 support is present +* Runs as a DaemonSet on every node. +* Reads `/proc/cpuinfo` from the host. +* Checks for the `avx2` flag. +* Labels the node when AVX2 support is present. Use the following steps to label Kubernetes nodes that support AVX2 by using a custom DaemonSet: @@ -234,8 +234,8 @@ kubectl get nodes -L cpu.feature/AVX2 After you label nodes, configure the CouchbaseCluster resource to restrict pod scheduling to AVX2-capable nodes in one of the following ways: -* <<#enforce-avx2-scheduling, *Enforce AVX2 Scheduling*>>: Recommended -* <<#prefer-avx2-scheduling, *Prefer AVX2 Scheduling*>>: Fallback allowed +* <<#enforce-avx2-scheduling, *Enforce AVX2 Scheduling*>>: Recommended. +* <<#prefer-avx2-scheduling, *Prefer AVX2 Scheduling*>>: Fallback allowed. [#enforce-avx2-scheduling] === Enforce AVX2 Scheduling (Recommended) @@ -333,6 +333,7 @@ Use the following steps to create a GKE node pool that guarantees AVX2 support. . Select a compatible machine family, such as `n2`, `c2`, `c3`, `n4`, `m2`, `m3`, and so on. . Enforce a minimum CPU platform that supports AVX2. +For example: + -- [source,console] @@ -402,8 +403,8 @@ Use the following sections to provision AVX2-capable nodes and configure pod sch The following EC2 instance families support AVX2 instructions: -* *Intel*: M5, C5, R5, M6i, C6i, R6i, M7i, C7i and newer generations -* *AMD*: M5a, C5a, R5a, M6a, C6a, R6a and newer generations +* *Intel*: M5, C5, R5, M6i, C6i, R6i, M7i, C7i and newer generations. +* *AMD*: M5a, C5a, R5a, M6a, C6a, R6a and newer generations. Verify the selected instance type supports AVX2 by referring to the provider documentation. @@ -471,9 +472,9 @@ Use the following sections to provision AVX2-capable nodes and configure pod sch The following Azure VM series support AVX2 instructions: -* Dv3 and Ev3 VM series, based on Intel Haswell and Broadwell processors -* Dv4 and Ev4 VM series, based on Intel Cascade Lake processors -* Dv5 and Ev5 VM series, based on Intel Ice Lake processors +* Dv3 and Ev3 VM series, based on Intel Haswell and Broadwell processors. +* Dv4 and Ev4 VM series, based on Intel Cascade Lake processors. +* Dv5 and Ev5 VM series, based on Intel Ice Lake processors. Verify the selected VM series supports AVX2 by referring to the Azure documentation.