Skip to content

Commit 0aae03c

Browse files
committed
feat(gooddata-pipelines): Docs for workspace backup
1 parent 9c40e48 commit 0aae03c

File tree

1 file changed

+155
-2
lines changed

1 file changed

+155
-2
lines changed

gooddata-pipelines/README.md

Lines changed: 155 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can use the package to manage following resources in GDC:
1010
- User/Group permissions
1111
- User Data Filters
1212
- Child workspaces (incl. Workspace Data Filter settings)
13-
1. _[PLANNED]:_ Backup and restore of workspaces
13+
1. Backup and restore of workspaces
1414
1. _[PLANNED]:_ Custom fields management
1515
- extend the Logical Data Model of a child workspace
1616

@@ -33,7 +33,7 @@ import logging
3333
from csv import DictReader
3434
from pathlib import Path
3535

36-
# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
36+
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
3737
from gooddata_pipelines import UserFullLoad, UserProvisioner
3838

3939
# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
@@ -70,3 +70,156 @@ or request features.
7070

7171
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
7272
and a list of changes.
73+
74+
## Backup and restore of workspaces
75+
The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. This is useful for:
76+
77+
- Creating backups before major changes
78+
- Migrating workspaces between environments
79+
- Disaster recovery scenarios
80+
- Copying workspace configurations
81+
82+
### Backup
83+
84+
The module supports three backup modes:
85+
86+
1. **List of workspaces** - Backup specific workspaces by providing a list of workspace IDs
87+
2. **Workspace hierarchies** - Backup a workspace and all its direct and indirect children
88+
3. **Entire organization** - Backup all workspaces in the organization
89+
90+
Each backup includes:
91+
- Workspace declarative model (logical data model, analytics model, permissions)
92+
- User data filters
93+
- Filter views
94+
- Automations
95+
96+
#### Storage Options
97+
98+
Backups can be stored in:
99+
- **Local storage** - Save backups to a local directory
100+
- **S3 storage** - Upload backups to an AWS S3 bucket
101+
102+
#### Basic Usage
103+
104+
```python
105+
import os
106+
from pathlib import Path
107+
108+
from gooddata_pipelines import BackupManager
109+
from gooddata_pipelines.backup_and_restore.models.storage import (
110+
BackupRestoreConfig,
111+
LocalStorageConfig,
112+
StorageType,
113+
)
114+
from gooddata_pipelines.logger.logger import LogObserver
115+
116+
# Optionally, subscribe a standard Python logger to the LogObserver
117+
import logging
118+
logger = logging.getLogger(__name__)
119+
LogObserver().subscribe(logger)
120+
121+
# Configure backup storage
122+
config = BackupRestoreConfig(
123+
storage_type=StorageType.LOCAL,
124+
storage=LocalStorageConfig(),
125+
batch_size=10, # Number of workspaces to process in one batch
126+
api_calls_per_second=10, # Rate limit for API calls
127+
)
128+
129+
# Create the BackupManager instance
130+
backup_manager = BackupManager.create(
131+
config=config,
132+
host=os.environ["GDC_HOSTNAME"],
133+
token=os.environ["GDC_AUTH_TOKEN"]
134+
)
135+
136+
# Backup-specific workspaces
137+
workspace_ids = ["workspace1", "workspace2", "workspace3"]
138+
backup_manager.backup_workspaces(workspace_ids=workspace_ids)
139+
140+
# Or read workspace IDs from a CSV file
141+
backup_manager.backup_workspaces(path_to_csv="workspaces.csv")
142+
143+
# Backup workspace hierarchies (workspace + all children)
144+
backup_manager.backup_hierarchies(workspace_ids=["parent_workspace"])
145+
146+
# Backup entire organization
147+
backup_manager.backup_entire_organization()
148+
```
149+
150+
#### Using S3 Storage
151+
``` python
152+
from gooddata_pipelines.backup_and_restore.models.storage import (
153+
BackupRestoreConfig,
154+
S3StorageConfig,
155+
StorageType,
156+
)
157+
158+
# Configure S3 storage with explicit credentials
159+
config = BackupRestoreConfig(
160+
storage_type=StorageType.S3,
161+
storage=S3StorageConfig(
162+
bucket="my-backup-bucket",
163+
backup_path="gooddata-backups/",
164+
aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
165+
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
166+
aws_default_region="us-east-1"
167+
),
168+
)
169+
170+
# Or use AWS profile
171+
config = BackupRestoreConfig(
172+
storage_type=StorageType.S3,
173+
storage=S3StorageConfig(
174+
bucket="my-backup-bucket",
175+
backup_path="gooddata-backups/",
176+
profile="my-aws-profile"
177+
),
178+
)
179+
180+
backup_manager = BackupManager.create(
181+
config=config,
182+
host=os.environ["GDC_HOSTNAME"],
183+
token=os.environ["GDC_AUTH_TOKEN"]
184+
)
185+
186+
backup_manager.backup_workspaces(workspace_ids=["workspace1"])
187+
```
188+
189+
#### Using GoodData Profile
190+
You can also create the BackupManager from a GoodData profile file:
191+
``` python
192+
from pathlib import Path
193+
194+
backup_manager = BackupManager.create_from_profile(
195+
config=config,
196+
profile="production",
197+
profiles_path=Path.home() / ".gooddata" / "profiles.yaml"
198+
)
199+
```
200+
201+
CSV File Format
202+
When providing workspace IDs via a CSV file, the file should have a workspace_id column:
203+
``` csv
204+
workspace_id
205+
workspace1
206+
workspace2
207+
workspace3
208+
```
209+
210+
#### Configuration Options
211+
212+
The BackupRestoreConfig class accepts the following parameters:
213+
- `storage_type` - Type of storage (StorageType.LOCAL or StorageType.S3)
214+
- `storage` - Storage-specific configuration (LocalStorageConfig or S3StorageConfig)
215+
- `batch_size` (optional, default: 10) - Number of workspaces to process in one batch
216+
- `api_calls_per_second` (optional, default: 10) - Rate limit for API calls to avoid throttling
217+
- `api_page_size` (optional, default: 500) - Page size for paginated API calls
218+
219+
220+
#### Error Handling and Retries
221+
222+
The backup process includes automatic retry logic with exponential backoff. If a batch fails, it will retry up to 3 times before failing completely. Individual workspace errors are logged but don't stop the entire backup process.
223+
Restore
224+
225+
Note: Restore functionality is currently in development.

0 commit comments

Comments
 (0)