Skip to content

Commit e2fc8f9

Browse files
committed
feat(workspace-backup): Backup to Azure blob storage
1 parent b7e9aeb commit e2fc8f9

File tree

10 files changed

+424
-21
lines changed

10 files changed

+424
-21
lines changed

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,16 @@ packages/gooddata-sdk/tests/catalog/translate
1212
.vscode
1313
.ruff_cache
1414

15+
# Python build artifacts
16+
.tox
17+
*.egg-info
18+
dist/
19+
build/
20+
__pycache__/
21+
*.pyc
22+
*.pyo
23+
*.pyd
24+
1525
docs/node_modules
1626
docs/public
1727
docs/resources/_gen

docs/content/en/latest/pipelines/backup_and_restore/backup.md

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ linkTitle: "Workspace Backup"
44
weight: 2
55
---
66

7-
Workspace Backup allows you to create backups of one or more workspaces. Backups can be stored either locally or uploaded to an S3 bucket.
7+
Workspace Backup allows you to create backups of one or more workspaces. Backups can be stored locally, uploaded to an S3 bucket, or uploaded to Azure Blob Storage.
88

99
The backup stores following definitions:
1010

@@ -141,6 +141,43 @@ logging.basicConfig(level=logging.INFO)
141141
logger = logging.getLogger(__name__)
142142
backup_manager.logger.subscribe(logger)
143143

144+
# Run the backup
145+
backup_manager.backup_workspaces(workspace_ids=["workspace_id_1", "workspace_id_2"])
146+
```
147+
148+
### Example with Azure Blob Storage
149+
150+
Here is an example using Azure Blob Storage with Workload Identity:
151+
152+
```python
153+
import logging
154+
import os
155+
156+
from gooddata_pipelines import (
157+
BackupManager,
158+
BackupRestoreConfig,
159+
AzureStorageConfig,
160+
StorageType,
161+
)
162+
163+
# Create storage configuration
164+
azure_storage_config = AzureStorageConfig.from_workload_identity(
165+
backup_path="backup_folder", account_name="mystorageaccount", container="my-container"
166+
)
167+
168+
# Create backup configuration
169+
config = BackupRestoreConfig(storage_type=StorageType.AZURE, storage=azure_storage_config)
170+
171+
# Initialize the BackupManager with your configuration and GoodData credentials
172+
backup_manager = BackupManager.create(
173+
config, os.environ["GD_HOST"], os.environ["GD_TOKEN"]
174+
)
175+
176+
# Optionally set up a logger and subscribe it to the logs from the BackupManager
177+
logging.basicConfig(level=logging.INFO)
178+
logger = logging.getLogger(__name__)
179+
backup_manager.logger.subscribe(logger)
180+
144181
# Run the backup
145182
backup_manager.backup_workspaces(workspace_ids=["workspace_id_1", "workspace_id_2"])
146183

docs/content/en/latest/pipelines/backup_and_restore/configuration.md

Lines changed: 80 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,26 @@ from gooddata_pipelines import BackupRestoreConfig
1515

1616
```
1717

18-
If you plan on storing your backups on S3, you will also need to import the `StorageType` enum and `S3StorageConfig` class. You can find more details about configuration for the S3 storage below in the [S3 Storage](#s3-storage) section.
18+
If you plan on storing your backups on S3 or Azure Blob Storage, you will also need to import the `StorageType` enum and the appropriate storage config class (`S3StorageConfig` or `AzureStorageConfig`). You can find more details about configuration for each storage type below in the [S3 Storage](#s3-storage) and [Azure Blob Storage](#azure-blob-storage) sections.
1919

2020
```python
21-
from gooddata_pipelines import BackupRestoreConfig, S3StorageConfig, StorageType
21+
from gooddata_pipelines import BackupRestoreConfig, S3StorageConfig, AzureStorageConfig, StorageType
2222

2323
```
2424

2525
The `BackupRestoreConfig` accepts following parameters:
2626

2727
| name | description |
2828
| -------------------- | ------------------------------------------------------------------------------------------------------------ |
29-
| storage_type | The type of storage to use - either `local` or `s3`. Defaults to `local`. |
29+
| storage_type | The type of storage to use - either `local`, `s3`, or `azure`. Defaults to `local`. |
3030
| storage | Configuration for the storage type. Defaults to local storage configuration. |
3131
| api_page_size | Page size for fetching workspace relationships. Defaults to 100 when unspecified. |
3232
| batch_size | Configures how many workspaces are backed up in a single batch. Defaults to 100 when unspecified. |
3333
| api_calls_per_second | Limits the maximum number of API calls to your GoodData instance. Defaults to 1. Only applied during Backup. |
3434

3535
## Storage
3636

37-
The configuration supports two types of storage - local and S3.
37+
The configuration supports three types of storage - local, S3, and Azure Blob Storage.
3838

3939
The backups are organized in a tree with following nodes:
4040

@@ -100,6 +100,63 @@ s3_storage_config = S3StorageConfig.from_aws_credentials(
100100
)
101101
```
102102

103+
### Azure Blob Storage
104+
105+
To configure upload of the backups to Azure Blob Storage, use the AzureStorageConfig object:
106+
107+
```python
108+
from gooddata_pipelines.backup_and_restore.models.storage import AzureStorageConfig
109+
110+
```
111+
112+
The configuration is responsible for establishing a valid connection to Azure Blob Storage, connecting to a storage account and container, and specifying the folder where the backups will be stored or read. You can create the object in three ways, depending on the type of Azure authentication you want to use. The common arguments for all three ways are:
113+
114+
| name | description |
115+
| ------------ | ------------------------------------------------------------- |
116+
| account_name | The name of the Azure storage account |
117+
| container | The name of the blob container |
118+
| backup_path | Path to the folder serving as the root for the backup storage |
119+
120+
#### Config from Workload Identity
121+
122+
Will use Azure Workload Identity (for Kubernetes environments). You only need to specify the `account_name`, `container`, and `backup_path` arguments.
123+
124+
```python
125+
azure_storage_config = AzureStorageConfig.from_workload_identity(
126+
backup_path="backups_folder", account_name="mystorageaccount", container="my-container"
127+
)
128+
129+
```
130+
131+
#### Config from Connection String
132+
133+
Will use an Azure Storage connection string to authenticate.
134+
135+
```python
136+
azure_storage_config = AzureStorageConfig.from_connection_string(
137+
backup_path="backups_folder",
138+
account_name="mystorageaccount",
139+
container="my-container",
140+
connection_string="DefaultEndpointsProtocol=https;AccountName=...",
141+
)
142+
143+
```
144+
145+
#### Config from Service Principal
146+
147+
Will use Azure Service Principal credentials to authenticate.
148+
149+
```python
150+
azure_storage_config = AzureStorageConfig.from_service_principal(
151+
backup_path="backups_folder",
152+
account_name="mystorageaccount",
153+
container="my-container",
154+
client_id="your-client-id",
155+
client_secret="your-client-secret",
156+
tenant_id="your-tenant-id",
157+
)
158+
```
159+
103160
## Examples
104161

105162
Here is a couple of examples of different configuration cases.
@@ -133,3 +190,22 @@ s3_storage_config = S3StorageConfig.from_aws_profile(
133190
config = BackupRestoreConfig(storage_type=StorageType.S3, storage=s3_storage_config)
134191

135192
```
193+
194+
### Config with Azure Blob Storage and Workload Identity
195+
196+
If you plan to use Azure Blob Storage, your config might look like this:
197+
198+
```python
199+
from gooddata_pipelines import (
200+
BackupRestoreConfig,
201+
AzureStorageConfig,
202+
StorageType,
203+
)
204+
205+
azure_storage_config = AzureStorageConfig.from_workload_identity(
206+
backup_path="backups_folder", account_name="mystorageaccount", container="my-container"
207+
)
208+
209+
config = BackupRestoreConfig(storage_type=StorageType.AZURE, storage=azure_storage_config)
210+
211+
```

packages/gooddata-pipelines/README.md

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You can use the package to manage following resources in GDC:
1111
- User Data Filters
1212
- Child workspaces (incl. Workspace Data Filter settings)
1313
1. Backup and restore of workspaces
14-
- Create and backup snapshots of workspace metadata.
14+
- Create and backup snapshots of workspace metadata to local storage, AWS S3, or Azure Blob Storage
1515
1. LDM Extension
1616
- extend the Logical Data Model of a child workspace with custom datasets and fields
1717

@@ -34,7 +34,7 @@ import logging
3434
from csv import DictReader
3535
from pathlib import Path
3636

37-
# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
37+
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
3838
from gooddata_pipelines import UserFullLoad, UserProvisioner
3939

4040
# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
@@ -62,12 +62,51 @@ provisioner.full_load(full_load_data)
6262

6363
```
6464

65+
Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository.
66+
67+
## Backup and Restore of Workspaces
68+
69+
The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. Backups can be stored locally, in AWS S3, or Azure Blob Storage.
70+
71+
```python
72+
import os
73+
74+
from gooddata_pipelines import BackupManager
75+
from gooddata_pipelines.backup_and_restore.models.storage import (
76+
BackupRestoreConfig,
77+
LocalStorageConfig,
78+
StorageType,
79+
)
80+
81+
# Configure backup storage
82+
config = BackupRestoreConfig(
83+
storage_type=StorageType.LOCAL,
84+
storage=LocalStorageConfig(),
85+
)
86+
87+
# Create the BackupManager instance
88+
backup_manager = BackupManager.create(
89+
config=config,
90+
host=os.environ["GDC_HOSTNAME"],
91+
token=os.environ["GDC_AUTH_TOKEN"]
92+
)
93+
94+
# Backup specific workspaces
95+
backup_manager.backup_workspaces(workspace_ids=["workspace1", "workspace2"])
96+
97+
# Backup workspace hierarchies (workspace + all children)
98+
backup_manager.backup_hierarchies(workspace_ids=["parent_workspace"])
99+
100+
# Backup entire organization
101+
backup_manager.backup_entire_organization()
102+
```
103+
104+
For S3 or Azure Blob Storage, configure the appropriate storage type and credentials in `BackupRestoreConfig`.
105+
65106
## Bugs & Requests
66107

67-
Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs
68-
or request features.
108+
Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs or request features.
69109

70110
## Changelog
71111

72-
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
73-
and a list of changes.
112+
See [GitHub releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions and a list of changes.

packages/gooddata-pipelines/pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ dependencies = [
1414
"gooddata-sdk~=1.54.0",
1515
"boto3 (>=1.39.3,<2.0.0)",
1616
"boto3-stubs (>=1.39.3,<2.0.0)",
17+
"azure-storage-blob (>=12.19.0,<13.0.0)",
18+
"azure-identity (>=1.15.0,<2.0.0)",
1719
"types-pyyaml (>=6.0.12.20250326,<7.0.0)",
1820
]
1921

packages/gooddata-pipelines/src/gooddata_pipelines/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
# -------- Backup and Restore --------
66
from .backup_and_restore.backup_manager import BackupManager
77
from .backup_and_restore.models.storage import (
8+
AzureStorageConfig,
89
BackupRestoreConfig,
910
LocalStorageConfig,
1011
S3StorageConfig,
@@ -14,6 +15,7 @@
1415
RestoreManager,
1516
WorkspaceToRestore,
1617
)
18+
from .backup_and_restore.storage.azure_storage import AzureStorage
1719
from .backup_and_restore.storage.local_storage import LocalStorage
1820
from .backup_and_restore.storage.s3_storage import S3Storage
1921

@@ -67,13 +69,15 @@
6769
"StorageType",
6870
"LocalStorage",
6971
"S3Storage",
72+
"AzureStorage",
7073
"WorkspaceFullLoad",
7174
"WorkspaceProvisioner",
7275
"UserIncrementalLoad",
7376
"UserGroupIncrementalLoad",
7477
"PermissionFullLoad",
7578
"LocalStorageConfig",
7679
"S3StorageConfig",
80+
"AzureStorageConfig",
7781
"PermissionIncrementalLoad",
7882
"UserFullLoad",
7983
"UserGroupFullLoad",

packages/gooddata-pipelines/src/gooddata_pipelines/backup_and_restore/base_manager.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
from gooddata_sdk.utils import PROFILES_FILE_PATH, profile_content
88

9-
from gooddata_pipelines.api.gooddata_api_wrapper import GoodDataApi
9+
from gooddata_pipelines.api import GoodDataApi
1010
from gooddata_pipelines.backup_and_restore.models.storage import (
1111
BackupRestoreConfig,
1212
StorageType,
@@ -18,6 +18,9 @@
1818
LocalStorage,
1919
)
2020
from gooddata_pipelines.backup_and_restore.storage.s3_storage import S3Storage
21+
from gooddata_pipelines.backup_and_restore.storage.azure_storage import (
22+
AzureStorage,
23+
)
2124
from gooddata_pipelines.logger import LogObserver
2225
from gooddata_pipelines.utils.file_utils import JsonUtils, YamlUtils
2326

@@ -44,6 +47,8 @@ def _get_storage(self, conf: BackupRestoreConfig) -> BackupStorage:
4447
"""Returns the storage class based on the storage type."""
4548
if conf.storage_type == StorageType.S3:
4649
return S3Storage(conf)
50+
elif conf.storage_type == StorageType.AZURE:
51+
return AzureStorage(conf)
4752
elif conf.storage_type == StorageType.LOCAL:
4853
return LocalStorage(conf)
4954
else:

0 commit comments

Comments
 (0)