Skip to content

Commit 97c7ea8

Browse files
committed
feat(gooddata-pipelines): Docs for workspace backup
1 parent 9c40e48 commit 97c7ea8

File tree

1 file changed

+158
-2
lines changed

1 file changed

+158
-2
lines changed

gooddata-pipelines/README.md

Lines changed: 158 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can use the package to manage following resources in GDC:
1010
- User/Group permissions
1111
- User Data Filters
1212
- Child workspaces (incl. Workspace Data Filter settings)
13-
1. _[PLANNED]:_ Backup and restore of workspaces
13+
1. Backup and restore of workspaces
1414
1. _[PLANNED]:_ Custom fields management
1515
- extend the Logical Data Model of a child workspace
1616

@@ -33,7 +33,7 @@ import logging
3333
from csv import DictReader
3434
from pathlib import Path
3535

36-
# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
36+
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
3737
from gooddata_pipelines import UserFullLoad, UserProvisioner
3838

3939
# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
@@ -70,3 +70,159 @@ or request features.
7070

7171
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
7272
and a list of changes.
73+
=======
74+
Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository
75+
76+
## Backup and restore of workspaces
77+
The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. This is useful for:
78+
79+
- Creating backups before major changes
80+
- Migrating workspaces between environments
81+
- Disaster recovery scenarios
82+
- Copying workspace configurations
83+
84+
### Backup
85+
86+
The module supports three backup modes:
87+
88+
1. **List of workspaces** - Backup specific workspaces by providing a list of workspace IDs
89+
2. **Workspace hierarchies** - Backup a workspace and all its direct and indirect children
90+
3. **Entire organization** - Backup all workspaces in the organization
91+
92+
Each backup includes:
93+
- Workspace declarative model (logical data model, analytics model, permissions)
94+
- User data filters
95+
- Filter views
96+
- Automations
97+
98+
#### Storage Options
99+
100+
Backups can be stored in:
101+
- **Local storage** - Save backups to a local directory
102+
- **S3 storage** - Upload backups to an AWS S3 bucket
103+
104+
#### Basic Usage
105+
106+
```python
107+
import os
108+
from pathlib import Path
109+
110+
from gooddata_pipelines import BackupManager
111+
from gooddata_pipelines.backup_and_restore.models.storage import (
112+
BackupRestoreConfig,
113+
LocalStorageConfig,
114+
StorageType,
115+
)
116+
from gooddata_pipelines.logger.logger import LogObserver
117+
118+
# Optionally, subscribe a standard Python logger to the LogObserver
119+
import logging
120+
logger = logging.getLogger(__name__)
121+
LogObserver().subscribe(logger)
122+
123+
# Configure backup storage
124+
config = BackupRestoreConfig(
125+
storage_type=StorageType.LOCAL,
126+
storage=LocalStorageConfig(),
127+
batch_size=10, # Number of workspaces to process in one batch
128+
api_calls_per_second=10, # Rate limit for API calls
129+
)
130+
131+
# Create the BackupManager instance
132+
backup_manager = BackupManager.create(
133+
config=config,
134+
host=os.environ["GDC_HOSTNAME"],
135+
token=os.environ["GDC_AUTH_TOKEN"]
136+
)
137+
138+
# Backup-specific workspaces
139+
workspace_ids = ["workspace1", "workspace2", "workspace3"]
140+
backup_manager.backup_workspaces(workspace_ids=workspace_ids)
141+
142+
# Or read workspace IDs from a CSV file
143+
backup_manager.backup_workspaces(path_to_csv="workspaces.csv")
144+
145+
# Backup workspace hierarchies (workspace + all children)
146+
backup_manager.backup_hierarchies(workspace_ids=["parent_workspace"])
147+
148+
# Backup entire organization
149+
backup_manager.backup_entire_organization()
150+
```
151+
152+
#### Using S3 Storage
153+
``` python
154+
from gooddata_pipelines.backup_and_restore.models.storage import (
155+
BackupRestoreConfig,
156+
S3StorageConfig,
157+
StorageType,
158+
)
159+
160+
# Configure S3 storage with explicit credentials
161+
config = BackupRestoreConfig(
162+
storage_type=StorageType.S3,
163+
storage=S3StorageConfig(
164+
bucket="my-backup-bucket",
165+
backup_path="gooddata-backups/",
166+
aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
167+
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
168+
aws_default_region="us-east-1"
169+
),
170+
)
171+
172+
# Or use AWS profile
173+
config = BackupRestoreConfig(
174+
storage_type=StorageType.S3,
175+
storage=S3StorageConfig(
176+
bucket="my-backup-bucket",
177+
backup_path="gooddata-backups/",
178+
profile="my-aws-profile"
179+
),
180+
)
181+
182+
backup_manager = BackupManager.create(
183+
config=config,
184+
host=os.environ["GDC_HOSTNAME"],
185+
token=os.environ["GDC_AUTH_TOKEN"]
186+
)
187+
188+
backup_manager.backup_workspaces(workspace_ids=["workspace1"])
189+
```
190+
191+
#### Using GoodData Profile
192+
You can also create the BackupManager from a GoodData profile file:
193+
``` python
194+
from pathlib import Path
195+
196+
backup_manager = BackupManager.create_from_profile(
197+
config=config,
198+
profile="production",
199+
profiles_path=Path.home() / ".gooddata" / "profiles.yaml"
200+
)
201+
```
202+
203+
CSV File Format
204+
When providing workspace IDs via a CSV file, the file should have a workspace_id column:
205+
``` csv
206+
workspace_id
207+
workspace1
208+
workspace2
209+
workspace3
210+
```
211+
212+
#### Configuration Options
213+
214+
The BackupRestoreConfig class accepts the following parameters:
215+
- `storage_type` - Type of storage (StorageType.LOCAL or StorageType.S3)
216+
- `storage` - Storage-specific configuration (LocalStorageConfig or S3StorageConfig)
217+
- `batch_size` (optional, default: 10) - Number of workspaces to process in one batch
218+
- `api_calls_per_second` (optional, default: 10) - Rate limit for API calls to avoid throttling
219+
- `api_page_size` (optional, default: 500) - Page size for paginated API calls
220+
221+
222+
#### Error Handling and Retries
223+
224+
The backup process includes automatic retry logic with exponential backoff. If a batch fails, it will retry up to 3 times before failing completely. Individual workspace errors are logged but don't stop the entire backup process.
225+
Restore
226+
227+
Note: Restore functionality is currently in development.
228+
>>>>>>> Stashed changes

0 commit comments

Comments
 (0)