From bf0aa16333647766aa7662c4898424a8a6da1081 Mon Sep 17 00:00:00 2001
From: Mahmoud Ismail <mahmoud@hopsworks.ai>
Date: Fri, 13 Feb 2026 20:28:42 +0100
Subject: [PATCH 1/5] [HWORKS-2538] Add in-place restore documentation to DR
 page

Restructure the Restore section to cover both new cluster restore and
in-place restore modes. The new in-place restore section documents
prerequisites, backup ID identification (including Velero schedules),
required and optional helm values, and re-run instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/setup_installation/admin/ha-dr/dr.md | 132 +++++++++++++++++++++-
 1 file changed, 127 insertions(+), 5 deletions(-)

diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md
index fbab1820e6..22ef3050cf 100644
--- a/docs/setup_installation/admin/ha-dr/dr.md
+++ b/docs/setup_installation/admin/ha-dr/dr.md
@@ -121,15 +121,22 @@ For S3 object storage, you can also configure a bucket lifecycle policy to expir
 
 ## Restore
 
+Hopsworks supports two restore modes:
+
+- **New cluster restore**: Install a fresh cluster and restore data from a backup during installation.
+- **In-place restore**: Restore data onto an existing running cluster via `helm upgrade`.
+
 !!! Note
-    Restore is only supported in a newly created cluster; in-place restore is not supported. Use the exact Hopsworks version that was used to create the backup.
+    Use the exact Hopsworks version that was used to create the backup.
 
-The restore process has two phases:
+### New Cluster Restore
+
+The new cluster restore process has two phases:
 
 - Restore Kubernetes objects required for the cluster restore.
 - Install the cluster with Helm using the correct backup IDs.
 
-### Restore Kubernetes objects
+#### Restore Kubernetes objects
 
 Restore the Kubernetes objects that were backed up using Velero.
 
@@ -248,7 +255,7 @@ kubectl get configmap opensearch-backups-metadata -n hopsworks -o json \
 | sort -nr
 ```
 
-### Restore on Cluster installation
+#### Restore on Cluster installation
 
 To restore a cluster during installation, configure the backup ID in the values YAML file:
 
@@ -262,7 +269,7 @@ global:
       backupId: "254811200"
 ```
 
-#### Customizations
+##### Customizations
 
 !!! Warning
     Even if you override the backup IDs for RonDB and Opensearch, you must still set `.global._hopsworks.restoreFromBackup.backupId` to ensure HopsFS is restored.
@@ -327,3 +334,118 @@ olk:
               payload:
                 indices: "-myindex"
 ```
+
+### In-Place Restore
+
+In-place restore allows you to restore data onto an existing running cluster using `helm upgrade`. Unlike a new cluster restore, this does not require provisioning a fresh cluster — the existing stateful services are shut down, wiped if necessary, and restored from backup.
+
+!!! Warning
+    In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost.
+
+#### Prerequisites
+
+- A running Hopsworks cluster deployed via Helm.
+- A previously created backup with a known backup ID.
+- Object storage configured and accessible with the backup data.
+- Velero installed and configured as described in the [prerequisites](#prerequisites).
+
+#### Identify the backup ID
+
+Get the backup ID from the **Cluster Settings > Backup** tab or by using the following commands.
+
+```bash
+# RonDB backup IDs (newest first)
+kubectl get configmap rondb-backups-metadata -n hopsworks -o json \
+| jq -r '.data | to_entries[] | select(.value | fromjson | .state == "SUCCESS") | .key' \
+| sort -nr
+
+# Opensearch backup IDs (newest first)
+kubectl get configmap opensearch-backups-metadata -n hopsworks -o json \
+| jq -r '.data | to_entries[] | select(.value | fromjson | .state == "SUCCESS") | .key' \
+| sort -nr
+
+# Velero backup IDs for the main schedule (newest first)
+kubectl get backups -n velero -o json \
+| jq -r '[.items[] | select(.spec.storageLocation == "hopsworks-bsl" and .metadata.labels["velero.io/schedule-name"] == "k8s-backups-main" and .status.phase == "Completed")] | sort_by(.status.completionTimestamp) | reverse[] | .metadata.name'
+
+# Velero backup IDs for the users schedule (newest first)
+kubectl get backups -n velero -o json \
+| jq -r '[.items[] | select(.spec.storageLocation == "hopsworks-bsl" and .metadata.labels["velero.io/schedule-name"] == "k8s-backups-users-resources" and .status.phase == "Completed")] | sort_by(.status.completionTimestamp) | reverse[] | .metadata.name'
+```
+
+#### Run the in-place restore
+
+Configure the restore in the values file and run `helm upgrade`:
+
+```yaml
+global:
+  _hopsworks:
+    backups:
+      enabled: true
+      schedule: "@weekly"
+    restoreFromBackup:
+      backupId: "254811200"
+      inPlace: true
+      forceDataClear: true
+
+# Optional: specify Velero backup IDs. If not set, the latest completed backup is used.
+hopsworks:
+  velero:
+    restore:
+      mainScheduleBackupId: "k8s-backups-main-20260213T153627Z"
+      usersScheduleBackupId: "k8s-backups-users-resources-20260213T153627Z"
+```
+
+Then run:
+
+```bash
+helm upgrade hopsworks --version <CHART_VERSION> \
+  --namespace hopsworks \
+  -f values.yaml \
+  --timeout 1200s
+```
+
+You can also pass the restore flags directly on the command line:
+
+```bash
+helm upgrade hopsworks --version <CHART_VERSION> \
+  --namespace hopsworks \
+  --set-string global._hopsworks.restoreFromBackup.backupId="254811200" \
+  --set global._hopsworks.restoreFromBackup.inPlace=true \
+  --set global._hopsworks.restoreFromBackup.forceDataClear=true \
+  --set-string hopsworks.velero.restore.mainScheduleBackupId="k8s-backups-main-20260213T153627Z" \
+  --set-string hopsworks.velero.restore.usersScheduleBackupId="k8s-backups-users-resources-20260213T153627Z" \
+  --timeout 1200s
+```
+
+The required flags are:
+
+| Parameter | Description |
+|-----------|-------------|
+| `global._hopsworks.restoreFromBackup.backupId` | The backup ID to restore from. |
+| `global._hopsworks.restoreFromBackup.inPlace` | Must be `true` to enable in-place restore mode. |
+| `global._hopsworks.restoreFromBackup.forceDataClear` | Must be `true` to confirm that existing data will be replaced. This is a safety mechanism to prevent accidental data loss. |
+
+The following flags are optional. If not set, the latest available Velero backup will be used:
+
+| Parameter | Description |
+|-----------|-------------|
+| `hopsworks.velero.restore.mainScheduleBackupId` | The Velero backup ID for the main schedule (`k8s-backups-main`). |
+| `hopsworks.velero.restore.usersScheduleBackupId` | The Velero backup ID for the users schedule (`k8s-backups-users-resources`). |
+
+#### Re-running an in-place restore
+
+In-place restore creates marker resources to prevent accidental re-runs. If you need to run the restore again with the same backup ID, delete the marker resources first:
+
+```bash
+# Delete the HopsFS restore job
+kubectl delete job hopsfs-inplace-restore-<BACKUP_ID> -n hopsworks --ignore-not-found=true
+
+# Delete the RonDB restore jobs
+kubectl delete job restore-native-backup-<BACKUP_ID> -n hopsworks --ignore-not-found=true
+kubectl delete job setup-mysqld-dont-remove-<BACKUP_ID> -n hopsworks --ignore-not-found=true
+```
+
+#### Customizations
+
+The same customization options for [RonDB and Opensearch](#customizations) backup IDs apply to in-place restore. You can override individual service backup IDs while keeping the global backup ID for HopsFS.

From a35ac629b32584a1d577625b1645b2e002c7d5d3 Mon Sep 17 00:00:00 2001
From: Mahmoud Ismail <mahmoud@hopsworks.ai>
Date: Tue, 17 Feb 2026 14:44:34 +0100
Subject: [PATCH 2/5] [HWORKS-2538] fixes for rerun restore in place

---
 docs/setup_installation/admin/ha-dr/dr.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md
index 22ef3050cf..0869d9768c 100644
--- a/docs/setup_installation/admin/ha-dr/dr.md
+++ b/docs/setup_installation/admin/ha-dr/dr.md
@@ -444,6 +444,13 @@ kubectl delete job hopsfs-inplace-restore-<BACKUP_ID> -n hopsworks --ignore-not-
 # Delete the RonDB restore jobs
 kubectl delete job restore-native-backup-<BACKUP_ID> -n hopsworks --ignore-not-found=true
 kubectl delete job setup-mysqld-dont-remove-<BACKUP_ID> -n hopsworks --ignore-not-found=true
+
+# Delete the Opensearch restore job
+kubectl delete job opensearch-restore-default-default-<BACKUP_ID> -n hopsworks --ignore-not-found=true
+
+# Delete the velero restore objects, use th exact backup name or schedule name
+kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not-found=true
+kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true
 ```
 
 #### Customizations

From 733f08ed75acd274b046f3bb1d0a0360b978468e Mon Sep 17 00:00:00 2001
From: Mahmoud Ismail <mahmoud@hopsworks.ai>
Date: Tue, 17 Feb 2026 16:26:32 +0100
Subject: [PATCH 3/5] [HWORKS-2538] Fix markdown linting errors in DR docs

Resolve duplicate heading names (MD024) and table column style issues
(MD060) in the in-place restore section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/setup_installation/admin/ha-dr/dr.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md
index 0869d9768c..da79e00d20 100644
--- a/docs/setup_installation/admin/ha-dr/dr.md
+++ b/docs/setup_installation/admin/ha-dr/dr.md
@@ -342,7 +342,7 @@ In-place restore allows you to restore data onto an existing running cluster usi
 !!! Warning
     In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost.
 
-#### Prerequisites
+#### In-place restore prerequisites
 
 - A running Hopsworks cluster deployed via Helm.
 - A previously created backup with a known backup ID.
@@ -421,7 +421,7 @@ helm upgrade hopsworks --version <CHART_VERSION> \
 The required flags are:
 
 | Parameter | Description |
-|-----------|-------------|
+| --------- | ----------- |
 | `global._hopsworks.restoreFromBackup.backupId` | The backup ID to restore from. |
 | `global._hopsworks.restoreFromBackup.inPlace` | Must be `true` to enable in-place restore mode. |
 | `global._hopsworks.restoreFromBackup.forceDataClear` | Must be `true` to confirm that existing data will be replaced. This is a safety mechanism to prevent accidental data loss. |
@@ -429,7 +429,7 @@ The required flags are:
 The following flags are optional. If not set, the latest available Velero backup will be used:
 
 | Parameter | Description |
-|-----------|-------------|
+| --------- | ----------- |
 | `hopsworks.velero.restore.mainScheduleBackupId` | The Velero backup ID for the main schedule (`k8s-backups-main`). |
 | `hopsworks.velero.restore.usersScheduleBackupId` | The Velero backup ID for the users schedule (`k8s-backups-users-resources`). |
 
@@ -453,6 +453,6 @@ kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not
 kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true
 ```
 
-#### Customizations
+#### In-place restore customizations
 
 The same customization options for [RonDB and Opensearch](#customizations) backup IDs apply to in-place restore. You can override individual service backup IDs while keeping the global backup ID for HopsFS.

From aebe13568fe59fd691d1934e14808e988c944652 Mon Sep 17 00:00:00 2001
From: Mahmoud Ismail <mahmoud@hopsworks.ai>
Date: Wed, 18 Feb 2026 12:02:20 +0100
Subject: [PATCH 4/5] [HWORKS-2538] Fix typo, helm chart ref, and velero
 restore names in DR docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/setup_installation/admin/ha-dr/dr.md | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md
index da79e00d20..d5e375df0d 100644
--- a/docs/setup_installation/admin/ha-dr/dr.md
+++ b/docs/setup_installation/admin/ha-dr/dr.md
@@ -209,19 +209,18 @@ done
 
 # Restores the latest - if specific backup is needed then backupName instead
 echo "=== Creating Velero Restore object for k8s-backups-main ==="
-RESTORE_SUFFIX=$(date +%s)
 kubectl apply -f - <<EOF
 apiVersion: velero.io/v1
 kind: Restore
 metadata:
-  name: k8s-backups-main-restore-$RESTORE_SUFFIX
+  name: k8s-backups-main
   namespace: velero
 spec:
   scheduleName: k8s-backups-main
 EOF
 
 echo "=== Waiting for Velero restore to finish ==="
-until [ "$(kubectl get restore k8s-backups-main-restore-$RESTORE_SUFFIX -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do
+until [ "$(kubectl get restore k8s-backups-main -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do
   echo "Still waiting..."; sleep 5;
 done
 
@@ -231,14 +230,14 @@ kubectl apply -f - <<EOF
 apiVersion: velero.io/v1
 kind: Restore
 metadata:
-  name: k8s-backups-users-resources-restore-$RESTORE_SUFFIX
+  name: k8s-backups-users-resources
   namespace: velero
 spec:
   scheduleName: k8s-backups-users-resources
 EOF
 
 echo "=== Waiting for Velero restore to finish ==="
-until [ "$(kubectl get restore k8s-backups-users-resources-restore-$RESTORE_SUFFIX -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do
+until [ "$(kubectl get restore k8s-backups-users-resources -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do
   echo "Still waiting..."; sleep 5;
 done
 ```
@@ -399,7 +398,7 @@ hopsworks:
 Then run:
 
 ```bash
-helm upgrade hopsworks --version <CHART_VERSION> \
+helm upgrade hopsworks hopsworks/hopsworks --version <CHART_VERSION> \
   --namespace hopsworks \
   -f values.yaml \
   --timeout 1200s
@@ -408,7 +407,7 @@ helm upgrade hopsworks --version <CHART_VERSION> \
 You can also pass the restore flags directly on the command line:
 
 ```bash
-helm upgrade hopsworks --version <CHART_VERSION> \
+helm upgrade hopsworks hopsworks/hopsworks --version <CHART_VERSION> \
   --namespace hopsworks \
   --set-string global._hopsworks.restoreFromBackup.backupId="254811200" \
   --set global._hopsworks.restoreFromBackup.inPlace=true \
@@ -448,9 +447,9 @@ kubectl delete job setup-mysqld-dont-remove-<BACKUP_ID> -n hopsworks --ignore-no
 # Delete the Opensearch restore job
 kubectl delete job opensearch-restore-default-default-<BACKUP_ID> -n hopsworks --ignore-not-found=true
 
-# Delete the velero restore objects, use th exact backup name or schedule name
-kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not-found=true
-kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true
+# Delete the velero restore objects, use the exact backup name or schedule name
+kubectl delete restore.velero.io k8s-backups-main -n velero --ignore-not-found=true
+kubectl delete restore.velero.io k8s-backups-users-resources -n velero --ignore-not-found=true
 ```
 
 #### In-place restore customizations

From 2edcc1059fd1ccec11e2c3f56eeb54c4040bdf35 Mon Sep 17 00:00:00 2001
From: Mahmoud Ismail <mahmoud@hopsworks.ai>
Date: Fri, 20 Feb 2026 09:14:33 +0100
Subject: [PATCH 5/5] [HWORKS-2538] Add in-place restore certificate constraint
 note to DR docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/setup_installation/admin/ha-dr/dr.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md
index d5e375df0d..694782eaef 100644
--- a/docs/setup_installation/admin/ha-dr/dr.md
+++ b/docs/setup_installation/admin/ha-dr/dr.md
@@ -341,6 +341,9 @@ In-place restore allows you to restore data onto an existing running cluster usi
 !!! Warning
     In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost.
 
+!!! Info
+    After a fresh install from backup (new cluster restore), in-place restores can only be performed using backups taken **after** that fresh install, because the cluster certificates are regenerated during installation. To restore to a backup that was taken **before** the fresh install, you must perform another new cluster restore from that backup instead of an in-place restore.
+
 #### In-place restore prerequisites
 
 - A running Hopsworks cluster deployed via Helm.