From bf0aa16333647766aa7662c4898424a8a6da1081 Mon Sep 17 00:00:00 2001 From: Mahmoud Ismail Date: Fri, 13 Feb 2026 20:28:42 +0100 Subject: [PATCH 1/5] [HWORKS-2538] Add in-place restore documentation to DR page Restructure the Restore section to cover both new cluster restore and in-place restore modes. The new in-place restore section documents prerequisites, backup ID identification (including Velero schedules), required and optional helm values, and re-run instructions. Co-Authored-By: Claude Opus 4.6 --- docs/setup_installation/admin/ha-dr/dr.md | 132 +++++++++++++++++++++- 1 file changed, 127 insertions(+), 5 deletions(-) diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md index fbab1820e6..22ef3050cf 100644 --- a/docs/setup_installation/admin/ha-dr/dr.md +++ b/docs/setup_installation/admin/ha-dr/dr.md @@ -121,15 +121,22 @@ For S3 object storage, you can also configure a bucket lifecycle policy to expir ## Restore +Hopsworks supports two restore modes: + +- **New cluster restore**: Install a fresh cluster and restore data from a backup during installation. +- **In-place restore**: Restore data onto an existing running cluster via `helm upgrade`. + !!! Note - Restore is only supported in a newly created cluster; in-place restore is not supported. Use the exact Hopsworks version that was used to create the backup. + Use the exact Hopsworks version that was used to create the backup. -The restore process has two phases: +### New Cluster Restore + +The new cluster restore process has two phases: - Restore Kubernetes objects required for the cluster restore. - Install the cluster with Helm using the correct backup IDs. -### Restore Kubernetes objects +#### Restore Kubernetes objects Restore the Kubernetes objects that were backed up using Velero. @@ -248,7 +255,7 @@ kubectl get configmap opensearch-backups-metadata -n hopsworks -o json \ | sort -nr ``` -### Restore on Cluster installation +#### Restore on Cluster installation To restore a cluster during installation, configure the backup ID in the values YAML file: @@ -262,7 +269,7 @@ global: backupId: "254811200" ``` -#### Customizations +##### Customizations !!! Warning Even if you override the backup IDs for RonDB and Opensearch, you must still set `.global._hopsworks.restoreFromBackup.backupId` to ensure HopsFS is restored. @@ -327,3 +334,118 @@ olk: payload: indices: "-myindex" ``` + +### In-Place Restore + +In-place restore allows you to restore data onto an existing running cluster using `helm upgrade`. Unlike a new cluster restore, this does not require provisioning a fresh cluster — the existing stateful services are shut down, wiped if necessary, and restored from backup. + +!!! Warning + In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost. + +#### Prerequisites + +- A running Hopsworks cluster deployed via Helm. +- A previously created backup with a known backup ID. +- Object storage configured and accessible with the backup data. +- Velero installed and configured as described in the [prerequisites](#prerequisites). + +#### Identify the backup ID + +Get the backup ID from the **Cluster Settings > Backup** tab or by using the following commands. + +```bash +# RonDB backup IDs (newest first) +kubectl get configmap rondb-backups-metadata -n hopsworks -o json \ +| jq -r '.data | to_entries[] | select(.value | fromjson | .state == "SUCCESS") | .key' \ +| sort -nr + +# Opensearch backup IDs (newest first) +kubectl get configmap opensearch-backups-metadata -n hopsworks -o json \ +| jq -r '.data | to_entries[] | select(.value | fromjson | .state == "SUCCESS") | .key' \ +| sort -nr + +# Velero backup IDs for the main schedule (newest first) +kubectl get backups -n velero -o json \ +| jq -r '[.items[] | select(.spec.storageLocation == "hopsworks-bsl" and .metadata.labels["velero.io/schedule-name"] == "k8s-backups-main" and .status.phase == "Completed")] | sort_by(.status.completionTimestamp) | reverse[] | .metadata.name' + +# Velero backup IDs for the users schedule (newest first) +kubectl get backups -n velero -o json \ +| jq -r '[.items[] | select(.spec.storageLocation == "hopsworks-bsl" and .metadata.labels["velero.io/schedule-name"] == "k8s-backups-users-resources" and .status.phase == "Completed")] | sort_by(.status.completionTimestamp) | reverse[] | .metadata.name' +``` + +#### Run the in-place restore + +Configure the restore in the values file and run `helm upgrade`: + +```yaml +global: + _hopsworks: + backups: + enabled: true + schedule: "@weekly" + restoreFromBackup: + backupId: "254811200" + inPlace: true + forceDataClear: true + +# Optional: specify Velero backup IDs. If not set, the latest completed backup is used. +hopsworks: + velero: + restore: + mainScheduleBackupId: "k8s-backups-main-20260213T153627Z" + usersScheduleBackupId: "k8s-backups-users-resources-20260213T153627Z" +``` + +Then run: + +```bash +helm upgrade hopsworks --version \ + --namespace hopsworks \ + -f values.yaml \ + --timeout 1200s +``` + +You can also pass the restore flags directly on the command line: + +```bash +helm upgrade hopsworks --version \ + --namespace hopsworks \ + --set-string global._hopsworks.restoreFromBackup.backupId="254811200" \ + --set global._hopsworks.restoreFromBackup.inPlace=true \ + --set global._hopsworks.restoreFromBackup.forceDataClear=true \ + --set-string hopsworks.velero.restore.mainScheduleBackupId="k8s-backups-main-20260213T153627Z" \ + --set-string hopsworks.velero.restore.usersScheduleBackupId="k8s-backups-users-resources-20260213T153627Z" \ + --timeout 1200s +``` + +The required flags are: + +| Parameter | Description | +|-----------|-------------| +| `global._hopsworks.restoreFromBackup.backupId` | The backup ID to restore from. | +| `global._hopsworks.restoreFromBackup.inPlace` | Must be `true` to enable in-place restore mode. | +| `global._hopsworks.restoreFromBackup.forceDataClear` | Must be `true` to confirm that existing data will be replaced. This is a safety mechanism to prevent accidental data loss. | + +The following flags are optional. If not set, the latest available Velero backup will be used: + +| Parameter | Description | +|-----------|-------------| +| `hopsworks.velero.restore.mainScheduleBackupId` | The Velero backup ID for the main schedule (`k8s-backups-main`). | +| `hopsworks.velero.restore.usersScheduleBackupId` | The Velero backup ID for the users schedule (`k8s-backups-users-resources`). | + +#### Re-running an in-place restore + +In-place restore creates marker resources to prevent accidental re-runs. If you need to run the restore again with the same backup ID, delete the marker resources first: + +```bash +# Delete the HopsFS restore job +kubectl delete job hopsfs-inplace-restore- -n hopsworks --ignore-not-found=true + +# Delete the RonDB restore jobs +kubectl delete job restore-native-backup- -n hopsworks --ignore-not-found=true +kubectl delete job setup-mysqld-dont-remove- -n hopsworks --ignore-not-found=true +``` + +#### Customizations + +The same customization options for [RonDB and Opensearch](#customizations) backup IDs apply to in-place restore. You can override individual service backup IDs while keeping the global backup ID for HopsFS. From a35ac629b32584a1d577625b1645b2e002c7d5d3 Mon Sep 17 00:00:00 2001 From: Mahmoud Ismail Date: Tue, 17 Feb 2026 14:44:34 +0100 Subject: [PATCH 2/5] [HWORKS-2538] fixes for rerun restore in place --- docs/setup_installation/admin/ha-dr/dr.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md index 22ef3050cf..0869d9768c 100644 --- a/docs/setup_installation/admin/ha-dr/dr.md +++ b/docs/setup_installation/admin/ha-dr/dr.md @@ -444,6 +444,13 @@ kubectl delete job hopsfs-inplace-restore- -n hopsworks --ignore-not- # Delete the RonDB restore jobs kubectl delete job restore-native-backup- -n hopsworks --ignore-not-found=true kubectl delete job setup-mysqld-dont-remove- -n hopsworks --ignore-not-found=true + +# Delete the Opensearch restore job +kubectl delete job opensearch-restore-default-default- -n hopsworks --ignore-not-found=true + +# Delete the velero restore objects, use th exact backup name or schedule name +kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not-found=true +kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true ``` #### Customizations From 733f08ed75acd274b046f3bb1d0a0360b978468e Mon Sep 17 00:00:00 2001 From: Mahmoud Ismail Date: Tue, 17 Feb 2026 16:26:32 +0100 Subject: [PATCH 3/5] [HWORKS-2538] Fix markdown linting errors in DR docs Resolve duplicate heading names (MD024) and table column style issues (MD060) in the in-place restore section. Co-Authored-By: Claude Opus 4.6 --- docs/setup_installation/admin/ha-dr/dr.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md index 0869d9768c..da79e00d20 100644 --- a/docs/setup_installation/admin/ha-dr/dr.md +++ b/docs/setup_installation/admin/ha-dr/dr.md @@ -342,7 +342,7 @@ In-place restore allows you to restore data onto an existing running cluster usi !!! Warning In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost. -#### Prerequisites +#### In-place restore prerequisites - A running Hopsworks cluster deployed via Helm. - A previously created backup with a known backup ID. @@ -421,7 +421,7 @@ helm upgrade hopsworks --version \ The required flags are: | Parameter | Description | -|-----------|-------------| +| --------- | ----------- | | `global._hopsworks.restoreFromBackup.backupId` | The backup ID to restore from. | | `global._hopsworks.restoreFromBackup.inPlace` | Must be `true` to enable in-place restore mode. | | `global._hopsworks.restoreFromBackup.forceDataClear` | Must be `true` to confirm that existing data will be replaced. This is a safety mechanism to prevent accidental data loss. | @@ -429,7 +429,7 @@ The required flags are: The following flags are optional. If not set, the latest available Velero backup will be used: | Parameter | Description | -|-----------|-------------| +| --------- | ----------- | | `hopsworks.velero.restore.mainScheduleBackupId` | The Velero backup ID for the main schedule (`k8s-backups-main`). | | `hopsworks.velero.restore.usersScheduleBackupId` | The Velero backup ID for the users schedule (`k8s-backups-users-resources`). | @@ -453,6 +453,6 @@ kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true ``` -#### Customizations +#### In-place restore customizations The same customization options for [RonDB and Opensearch](#customizations) backup IDs apply to in-place restore. You can override individual service backup IDs while keeping the global backup ID for HopsFS. From aebe13568fe59fd691d1934e14808e988c944652 Mon Sep 17 00:00:00 2001 From: Mahmoud Ismail Date: Wed, 18 Feb 2026 12:02:20 +0100 Subject: [PATCH 4/5] [HWORKS-2538] Fix typo, helm chart ref, and velero restore names in DR docs Co-Authored-By: Claude Opus 4.6 --- docs/setup_installation/admin/ha-dr/dr.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md index da79e00d20..d5e375df0d 100644 --- a/docs/setup_installation/admin/ha-dr/dr.md +++ b/docs/setup_installation/admin/ha-dr/dr.md @@ -209,19 +209,18 @@ done # Restores the latest - if specific backup is needed then backupName instead echo "=== Creating Velero Restore object for k8s-backups-main ===" -RESTORE_SUFFIX=$(date +%s) kubectl apply -f - </dev/null)" = "Completed" ]; do +until [ "$(kubectl get restore k8s-backups-main -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do echo "Still waiting..."; sleep 5; done @@ -231,14 +230,14 @@ kubectl apply -f - </dev/null)" = "Completed" ]; do +until [ "$(kubectl get restore k8s-backups-users-resources -n velero -o jsonpath='{.status.phase}' 2>/dev/null)" = "Completed" ]; do echo "Still waiting..."; sleep 5; done ``` @@ -399,7 +398,7 @@ hopsworks: Then run: ```bash -helm upgrade hopsworks --version \ +helm upgrade hopsworks hopsworks/hopsworks --version \ --namespace hopsworks \ -f values.yaml \ --timeout 1200s @@ -408,7 +407,7 @@ helm upgrade hopsworks --version \ You can also pass the restore flags directly on the command line: ```bash -helm upgrade hopsworks --version \ +helm upgrade hopsworks hopsworks/hopsworks --version \ --namespace hopsworks \ --set-string global._hopsworks.restoreFromBackup.backupId="254811200" \ --set global._hopsworks.restoreFromBackup.inPlace=true \ @@ -448,9 +447,9 @@ kubectl delete job setup-mysqld-dont-remove- -n hopsworks --ignore-no # Delete the Opensearch restore job kubectl delete job opensearch-restore-default-default- -n hopsworks --ignore-not-found=true -# Delete the velero restore objects, use th exact backup name or schedule name -kubectl delete restore.velero.io restore-k8s-backups-main -n velero --ignore-not-found=true -kubectl delete restore.velero.io restore-k8s-backups-users-resources -n velero --ignore-not-found=true +# Delete the velero restore objects, use the exact backup name or schedule name +kubectl delete restore.velero.io k8s-backups-main -n velero --ignore-not-found=true +kubectl delete restore.velero.io k8s-backups-users-resources -n velero --ignore-not-found=true ``` #### In-place restore customizations From 2edcc1059fd1ccec11e2c3f56eeb54c4040bdf35 Mon Sep 17 00:00:00 2001 From: Mahmoud Ismail Date: Fri, 20 Feb 2026 09:14:33 +0100 Subject: [PATCH 5/5] [HWORKS-2538] Add in-place restore certificate constraint note to DR docs Co-Authored-By: Claude Opus 4.6 --- docs/setup_installation/admin/ha-dr/dr.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/setup_installation/admin/ha-dr/dr.md b/docs/setup_installation/admin/ha-dr/dr.md index d5e375df0d..694782eaef 100644 --- a/docs/setup_installation/admin/ha-dr/dr.md +++ b/docs/setup_installation/admin/ha-dr/dr.md @@ -341,6 +341,9 @@ In-place restore allows you to restore data onto an existing running cluster usi !!! Warning In-place restore **replaces all existing data** in the cluster with the backup data. Any data written after the backup was taken will be lost. +!!! Info + After a fresh install from backup (new cluster restore), in-place restores can only be performed using backups taken **after** that fresh install, because the cluster certificates are regenerated during installation. To restore to a backup that was taken **before** the fresh install, you must perform another new cluster restore from that backup instead of an in-place restore. + #### In-place restore prerequisites - A running Hopsworks cluster deployed via Helm.