Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,16 @@ packages/gooddata-sdk/tests/catalog/translate
.vscode
.ruff_cache

# Python build artifacts
.tox
*.egg-info
dist/
build/
__pycache__/
*.pyc
*.pyo
*.pyd

docs/node_modules
docs/public
docs/resources/_gen
Expand Down
39 changes: 38 additions & 1 deletion docs/content/en/latest/pipelines/backup_and_restore/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linkTitle: "Workspace Backup"
weight: 2
---

Workspace Backup allows you to create backups of one or more workspaces. Backups can be stored either locally or uploaded to an S3 bucket.
Workspace Backup allows you to create backups of one or more workspaces. Backups can be stored locally, uploaded to an S3 bucket, or uploaded to Azure Blob Storage.

The backup stores following definitions:

Expand Down Expand Up @@ -141,6 +141,43 @@ logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
backup_manager.logger.subscribe(logger)

# Run the backup
backup_manager.backup_workspaces(workspace_ids=["workspace_id_1", "workspace_id_2"])
```

### Example with Azure Blob Storage

Here is an example using Azure Blob Storage with Workload Identity:

```python
import logging
import os

from gooddata_pipelines import (
BackupManager,
BackupRestoreConfig,
AzureStorageConfig,
StorageType,
)

# Create storage configuration
azure_storage_config = AzureStorageConfig.from_workload_identity(
backup_path="backup_folder", account_name="mystorageaccount", container="my-container"
)

# Create backup configuration
config = BackupRestoreConfig(storage_type=StorageType.AZURE, storage=azure_storage_config)

# Initialize the BackupManager with your configuration and GoodData credentials
backup_manager = BackupManager.create(
config, os.environ["GD_HOST"], os.environ["GD_TOKEN"]
)

# Optionally set up a logger and subscribe it to the logs from the BackupManager
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
backup_manager.logger.subscribe(logger)

# Run the backup
backup_manager.backup_workspaces(workspace_ids=["workspace_id_1", "workspace_id_2"])

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,26 @@ from gooddata_pipelines import BackupRestoreConfig

```

If you plan on storing your backups on S3, you will also need to import the `StorageType` enum and `S3StorageConfig` class. You can find more details about configuration for the S3 storage below in the [S3 Storage](#s3-storage) section.
If you plan on storing your backups on S3 or Azure Blob Storage, you will also need to import the `StorageType` enum and the appropriate storage config class (`S3StorageConfig` or `AzureStorageConfig`). You can find more details about configuration for each storage type below in the [S3 Storage](#s3-storage) and [Azure Blob Storage](#azure-blob-storage) sections.

```python
from gooddata_pipelines import BackupRestoreConfig, S3StorageConfig, StorageType
from gooddata_pipelines import BackupRestoreConfig, S3StorageConfig, AzureStorageConfig, StorageType

```

The `BackupRestoreConfig` accepts following parameters:

| name | description |
| -------------------- | ------------------------------------------------------------------------------------------------------------ |
| storage_type | The type of storage to use - either `local` or `s3`. Defaults to `local`. |
| storage_type | The type of storage to use - either `local`, `s3`, or `azure`. Defaults to `local`. |
| storage | Configuration for the storage type. Defaults to local storage configuration. |
| api_page_size | Page size for fetching workspace relationships. Defaults to 100 when unspecified. |
| batch_size | Configures how many workspaces are backed up in a single batch. Defaults to 100 when unspecified. |
| api_calls_per_second | Limits the maximum number of API calls to your GoodData instance. Defaults to 1. Only applied during Backup. |

## Storage

The configuration supports two types of storage - local and S3.
The configuration supports three types of storage - local, S3, and Azure Blob Storage.

The backups are organized in a tree with following nodes:

Expand Down Expand Up @@ -100,6 +100,63 @@ s3_storage_config = S3StorageConfig.from_aws_credentials(
)
```

### Azure Blob Storage

To configure upload of the backups to Azure Blob Storage, use the AzureStorageConfig object:

```python
from gooddata_pipelines.backup_and_restore.models.storage import AzureStorageConfig

```

The configuration is responsible for establishing a valid connection to Azure Blob Storage, connecting to a storage account and container, and specifying the folder where the backups will be stored or read. You can create the object in three ways, depending on the type of Azure authentication you want to use. The common arguments for all three ways are:

| name | description |
| ------------ | ------------------------------------------------------------- |
| account_name | The name of the Azure storage account |
| container | The name of the blob container |
| backup_path | Path to the folder serving as the root for the backup storage |

#### Config from Workload Identity

Will use Azure Workload Identity (for Kubernetes environments). You only need to specify the `account_name`, `container`, and `backup_path` arguments.

```python
azure_storage_config = AzureStorageConfig.from_workload_identity(
backup_path="backups_folder", account_name="mystorageaccount", container="my-container"
)

```

#### Config from Connection String

Will use an Azure Storage connection string to authenticate.

```python
azure_storage_config = AzureStorageConfig.from_connection_string(
backup_path="backups_folder",
account_name="mystorageaccount",
container="my-container",
connection_string="DefaultEndpointsProtocol=https;AccountName=...",
)

```

#### Config from Service Principal

Will use Azure Service Principal credentials to authenticate.

```python
azure_storage_config = AzureStorageConfig.from_service_principal(
backup_path="backups_folder",
account_name="mystorageaccount",
container="my-container",
client_id="your-client-id",
client_secret="your-client-secret",
tenant_id="your-tenant-id",
)
```

## Examples

Here is a couple of examples of different configuration cases.
Expand Down Expand Up @@ -133,3 +190,22 @@ s3_storage_config = S3StorageConfig.from_aws_profile(
config = BackupRestoreConfig(storage_type=StorageType.S3, storage=s3_storage_config)

```

### Config with Azure Blob Storage and Workload Identity

If you plan to use Azure Blob Storage, your config might look like this:

```python
from gooddata_pipelines import (
BackupRestoreConfig,
AzureStorageConfig,
StorageType,
)

azure_storage_config = AzureStorageConfig.from_workload_identity(
backup_path="backups_folder", account_name="mystorageaccount", container="my-container"
)

config = BackupRestoreConfig(storage_type=StorageType.AZURE, storage=azure_storage_config)

```
51 changes: 45 additions & 6 deletions packages/gooddata-pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ You can use the package to manage following resources in GDC:
- User Data Filters
- Child workspaces (incl. Workspace Data Filter settings)
1. Backup and restore of workspaces
- Create and backup snapshots of workspace metadata.
- Create and backup snapshots of workspace metadata to local storage, AWS S3, or Azure Blob Storage
1. LDM Extension
- extend the Logical Data Model of a child workspace with custom datasets and fields

Expand All @@ -34,7 +34,7 @@ import logging
from csv import DictReader
from pathlib import Path

# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
from gooddata_pipelines import UserFullLoad, UserProvisioner

# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
Expand Down Expand Up @@ -62,12 +62,51 @@ provisioner.full_load(full_load_data)

```

Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository.

## Backup and Restore of Workspaces

The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. Backups can be stored locally, in AWS S3, or Azure Blob Storage.

```python
import os

from gooddata_pipelines import BackupManager
from gooddata_pipelines.backup_and_restore.models.storage import (
BackupRestoreConfig,
LocalStorageConfig,
StorageType,
)

# Configure backup storage
config = BackupRestoreConfig(
storage_type=StorageType.LOCAL,
storage=LocalStorageConfig(),
)

# Create the BackupManager instance
backup_manager = BackupManager.create(
config=config,
host=os.environ["GDC_HOSTNAME"],
token=os.environ["GDC_AUTH_TOKEN"]
)

# Backup specific workspaces
backup_manager.backup_workspaces(workspace_ids=["workspace1", "workspace2"])

# Backup workspace hierarchies (workspace + all children)
backup_manager.backup_hierarchies(workspace_ids=["parent_workspace"])

# Backup entire organization
backup_manager.backup_entire_organization()
```

For S3 or Azure Blob Storage, configure the appropriate storage type and credentials in `BackupRestoreConfig`.

## Bugs & Requests

Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs
or request features.
Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs or request features.

## Changelog

See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
and a list of changes.
See [GitHub releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions and a list of changes.
2 changes: 2 additions & 0 deletions packages/gooddata-pipelines/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ dependencies = [
"gooddata-sdk~=1.54.0",
"boto3 (>=1.39.3,<2.0.0)",
"boto3-stubs (>=1.39.3,<2.0.0)",
"azure-storage-blob (>=12.19.0,<13.0.0)",
"azure-identity (>=1.15.0,<2.0.0)",
"types-pyyaml (>=6.0.12.20250326,<7.0.0)",
]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# -------- Backup and Restore --------
from .backup_and_restore.backup_manager import BackupManager
from .backup_and_restore.models.storage import (
AzureStorageConfig,
BackupRestoreConfig,
LocalStorageConfig,
S3StorageConfig,
Expand All @@ -14,6 +15,7 @@
RestoreManager,
WorkspaceToRestore,
)
from .backup_and_restore.storage.azure_storage import AzureStorage
from .backup_and_restore.storage.local_storage import LocalStorage
from .backup_and_restore.storage.s3_storage import S3Storage

Expand Down Expand Up @@ -67,13 +69,15 @@
"StorageType",
"LocalStorage",
"S3Storage",
"AzureStorage",
"WorkspaceFullLoad",
"WorkspaceProvisioner",
"UserIncrementalLoad",
"UserGroupIncrementalLoad",
"PermissionFullLoad",
"LocalStorageConfig",
"S3StorageConfig",
"AzureStorageConfig",
"PermissionIncrementalLoad",
"UserFullLoad",
"UserGroupFullLoad",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from gooddata_sdk.utils import PROFILES_FILE_PATH, profile_content

from gooddata_pipelines.api.gooddata_api_wrapper import GoodDataApi
from gooddata_pipelines.api import GoodDataApi
from gooddata_pipelines.backup_and_restore.models.storage import (
BackupRestoreConfig,
StorageType,
Expand All @@ -18,6 +18,9 @@
LocalStorage,
)
from gooddata_pipelines.backup_and_restore.storage.s3_storage import S3Storage
from gooddata_pipelines.backup_and_restore.storage.azure_storage import (
AzureStorage,
)
from gooddata_pipelines.logger import LogObserver
from gooddata_pipelines.utils.file_utils import JsonUtils, YamlUtils

Expand All @@ -44,6 +47,8 @@ def _get_storage(self, conf: BackupRestoreConfig) -> BackupStorage:
"""Returns the storage class based on the storage type."""
if conf.storage_type == StorageType.S3:
return S3Storage(conf)
elif conf.storage_type == StorageType.AZURE:
return AzureStorage(conf)
elif conf.storage_type == StorageType.LOCAL:
return LocalStorage(conf)
else:
Expand Down
Loading
Loading