Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/content/en/latest/pipelines/backup_and_restore/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: "Backup & Restore"
linkTitle: "Backup & Restore"
weight: 2
no_list: true
---

The Backup & Restore module lets you create snapshots of GoodData Cloud workspaces and restore them later. It is useful for:

- Backing up before major changes
- Migrating workspaces across environments
- Disaster recovery
- Cloning workspace configurations

Backup and restore share common configuration objects, documented on the [Configuration](configuration/) page. For detailed, step-by-step instructions, see the [Backup](backup/) and [Restore](restore/) guides.
147 changes: 147 additions & 0 deletions docs/content/en/latest/pipelines/backup_and_restore/backup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: "Workspace Backup"
linkTitle: "Workspace Backup"
weight: 2
---

Workspace Backup allows you to create backups of one or more workspaces. Backups can be stored either locally or uploaded to an S3 bucket.

The backup stores following definitions:

- Logical Data Model
- Analytics Model
- User Data Filters
- Filter Views
- Automations

## Usage

Import and initialize the BackupManager and BackupRestoreConfig from GoodData Pipelines:

```python
from gooddata_pipelines import BackupManager, BackupRestoreConfig

host = "http://localhost:3000"
token = "some_user_token"

# Create your customized backup configuration or use default values
config = BackupRestoreConfig(
storage_type="local"
)

# Initialize the BackupManager with your configuration and GoodData Cloud credentials
backup_manager = BackupManager.create(config=config, host=host, token=token)

# Run a backup method. For example, the `backup_entire_organization` method backs up all workspaces in GoodData Cloud.
backup_manager.backup_entire_organization()

```

## Configuration

See [Configuration](/latest/pipelines/backup_and_restore/configuration/) for details on how to set up the configuration object.

## Backup Methods

You can use one of these methods to back up your workspaces:

### Back up specific workspaces

This methods allows you to back up specific workspaces. You can supply the list of their IDs either directly or by specifying a path to a CSV file.

#### Usage with direct input:

```python
workspace_ids = ["workspace_1", "workspace_2", "workspace_3"]

backup_manager.backup_workspaces(workspace_ids=workspace_ids)

```

#### Usage with a csv:

```python
path_to_csv = "path/to/local/file.csv"

backup_manager.backup_workspaces(path_to_csv=path_to_csv)

```

### Back up workspace hierarchies

This method accepts a list of parent workspace IDs and created a backup of each workspace within their hierarchy. That includes the parent workspace and both its direct and indirect children (i.e., the children of child workspaces etc.). The IDs can be provided either directly as a list or as a path to a CSV file containing the IDs.

#### Usage with direct input:

```python
parent_workspace_ids = ["parent_1", "parent_2", "parent_3"]

backup_manager.backup_hierarchies(workspace_ids=parent_workspace_ids)

```

#### Usage with a csv:

```python
path_to_csv = "path/to/local/file.csv"

backup_manager.backup_hierarchies(path_to_csv=path_to_csv)

```

### Back up entire organization

Create a backup of all workspaces within the GoodData organization. The method requires no arguments.

```python
backup_manager.backup_entire_organization()

```

### Input CSV Format

When using a CSV as input for backup, following format is expected:

| **workspace_id** |
| ---------------- |
| parent_1 |
| parent_2 |
| parent_3 |

## Example

Here is a full example of a workspace backup process:

```python
import logging
import os

from gooddata_pipelines import (
BackupManager,
BackupRestoreConfig,
S3StorageConfig,
StorageType,
)

# Create storage configuration
s3_storage_config = S3StorageConfig.from_aws_profile(
backup_path="backup_folder", bucket="backup_bucket", profile="dev"
)

# Create backup configuration
config = BackupRestoreConfig(storage_type=StorageType.S3, storage=s3_storage_config)

# Initialize the BackupManager with your configuration and GoodData credentials
backup_manager = BackupManager.create(
config, os.environ["GD_HOST"], os.environ["GD_TOKEN"]
)

# Optionally set up a logger and subscribe it to the logs from the BackupManager
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
backup_manager.logger.subscribe(logger)

# Run the backup
backup_manager.backup_workspaces(workspace_ids=["workspace_id_1", "workspace_id_2"])

```
135 changes: 135 additions & 0 deletions docs/content/en/latest/pipelines/backup_and_restore/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
title: "Configuration"
linkTitle: "Configuration"
weight: 1
---

The backup algorithm is configured via the `BackupRestoreConfig` class.

## Usage

Import `BackupRestoreConfig` from GoodData Pipelines.

```python
from gooddata_pipelines import BackupRestoreConfig

```

If you plan on storing your backups on S3, you will also need to import the `StorageType` enum and `S3StorageConfig` class. You can find more details about configuration for the S3 storage below in the [S3 Storage](#s3-storage) section.

```python
from gooddata_pipelines import BackupRestoreConfig, S3StorageConfig, StorageType

```

The `BackupRestoreConfig` accepts following parameters:

| name | description |
| -------------------- | ------------------------------------------------------------------------------------------------------------ |
| storage_type | The type of storage to use - either `local` or `s3`. Defaults to `local`. |
| storage | Configuration for the storage type. Defaults to local storage configuration. |
| api_page_size | Page size for fetching workspace relationships. Defaults to 100 when unspecified. |
| batch_size | Configures how many workspaces are backed up in a single batch. Defaults to 100 when unspecified. |
| api_calls_per_second | Limits the maximum number of API calls to your GoodData instance. Defaults to 1. Only applied during Backup. |

## Storage

The configuration supports two types of storage - local and S3.

The backups are organized in a tree with following nodes:

- Organization ID
- Workspace ID
- Timestamped folder

The timestamped folder will contain a `gooddata_layouts.zip` file containing the stored definitions.

### Local Storage

Local storage requires a single parameter - `backup_path`. It defines where the backup tree will be saved in your file system. If not defined, the script will default to creating a `local_backups` folder in current working directory and store the backups there.

### S3 Storage

To configure upload of the backups to S3, use the S3StorageConfig object:

```python
from gooddata_pipelines.backup_and_restore.models.storage import S3StorageConfig

```

The configuration is responsible for establishing a valid connection to S3, connecting to a bucket and specyfing the folder where the backups will be stored or read. You can create the object in three ways, depending on the type of AWS credentials you want to use. The common arguments for all three ways are:

| name | description |
| ----------- | ------------------------------------------------------------- |
| bucket | The name of the bucket to use |
| backup_path | Path to the folder serving as the root for the backup storage |

#### Config from IAM Role

Will use default IAM role or environment. You only need to specify the `bucket` and `backup_path` arguments.

```python
s3_storage_config = S3StorageConfig.from_iam_role(
backup_path="backups_folder", bucket="backup_bucket"
)

```

#### Config from AWS Profile

Will use an existing profile to authenticate with AWS.

```python
s3_storage_config = S3StorageConfig.from_aws_profile(
backup_path="backups_folder", bucket="backup_bucket", profile="dev"
)

```

#### Config from AWS Credentials

Will use long lived AWS Access Keys to authenticate with AWS.

```python
s3_storage_config = S3StorageConfig.from_aws_credentials(
backup_path="backups_folder",
bucket="backup_bucket",
aws_access_key_id="AWS_ACCESS_KEY_ID",
aws_secret_access_key="AWS_SECRET_ACCESS_KEY",
aws_default_region="us-east-1",
)
```

## Examples

Here is a couple of examples of different configuration cases.

### Simple Local Backups

If you want to store your backups locally and are okay with the default values, you can create the configuration object without having to specify any values:

```python
from gooddata_pipelines import BackupRestoreConfig

config = BackupRestoreConfig()

```

### Config with S3 and AWS Profile

If you plan to use S3, your config might look like this:

```python
from gooddata_pipelines import (
BackupRestoreConfig,
S3StorageConfig,
StorageType,
)

s3_storage_config = S3StorageConfig.from_aws_profile(
backup_path="backups_folder", bucket="backup_bucket", profile="dev"
)

config = BackupRestoreConfig(storage_type=StorageType.S3, storage=s3_storage_config)

```
Loading
Loading