Skip to content

Commit af8fa34

Browse files
committed
feat(gooddata-pipelines): Docs for workspace backup
1 parent 9c40e48 commit af8fa34

File tree

1 file changed

+159
-2
lines changed

1 file changed

+159
-2
lines changed

gooddata-pipelines/README.md

Lines changed: 159 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ You can use the package to manage following resources in GDC:
1010
- User/Group permissions
1111
- User Data Filters
1212
- Child workspaces (incl. Workspace Data Filter settings)
13-
1. _[PLANNED]:_ Backup and restore of workspaces
13+
1. Backup and restore of workspaces
1414
1. _[PLANNED]:_ Custom fields management
1515
- extend the Logical Data Model of a child workspace
1616

@@ -33,7 +33,7 @@ import logging
3333
from csv import DictReader
3434
from pathlib import Path
3535

36-
# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
36+
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
3737
from gooddata_pipelines import UserFullLoad, UserProvisioner
3838

3939
# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
@@ -61,6 +61,7 @@ provisioner.full_load(full_load_data)
6161

6262
```
6363

64+
<<<<<<< Updated upstream
6465
## Bugs & Requests
6566

6667
Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs
@@ -70,3 +71,159 @@ or request features.
7071

7172
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
7273
and a list of changes.
74+
=======
75+
Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository
76+
77+
## Backup and restore of workspaces
78+
The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. This is useful for:
79+
80+
- Creating backups before major changes
81+
- Migrating workspaces between environments
82+
- Disaster recovery scenarios
83+
- Copying workspace configurations
84+
85+
### Backup
86+
87+
The module supports three backup modes:
88+
89+
1. **List of workspaces** - Backup specific workspaces by providing a list of workspace IDs
90+
2. **Workspace hierarchies** - Backup a workspace and all its direct and indirect children
91+
3. **Entire organization** - Backup all workspaces in the organization
92+
93+
Each backup includes:
94+
- Workspace declarative model (logical data model, analytics model, permissions)
95+
- User data filters
96+
- Filter views
97+
- Automations
98+
99+
#### Storage Options
100+
101+
Backups can be stored in:
102+
- **Local storage** - Save backups to a local directory
103+
- **S3 storage** - Upload backups to an AWS S3 bucket
104+
105+
#### Basic Usage
106+
107+
```python
108+
import os
109+
from pathlib import Path
110+
111+
from gooddata_pipelines import BackupManager
112+
from gooddata_pipelines.backup_and_restore.models.storage import (
113+
BackupRestoreConfig,
114+
LocalStorageConfig,
115+
StorageType,
116+
)
117+
from gooddata_pipelines.logger.logger import LogObserver
118+
119+
# Optionally, subscribe a standard Python logger to the LogObserver
120+
import logging
121+
logger = logging.getLogger(__name__)
122+
LogObserver().subscribe(logger)
123+
124+
# Configure backup storage
125+
config = BackupRestoreConfig(
126+
storage_type=StorageType.LOCAL,
127+
storage=LocalStorageConfig(),
128+
batch_size=10, # Number of workspaces to process in one batch
129+
api_calls_per_second=10, # Rate limit for API calls
130+
)
131+
132+
# Create the BackupManager instance
133+
backup_manager = BackupManager.create(
134+
config=config,
135+
host=os.environ["GDC_HOSTNAME"],
136+
token=os.environ["GDC_AUTH_TOKEN"]
137+
)
138+
139+
# Backup-specific workspaces
140+
workspace_ids = ["workspace1", "workspace2", "workspace3"]
141+
backup_manager.backup_workspaces(workspace_ids=workspace_ids)
142+
143+
# Or read workspace IDs from a CSV file
144+
backup_manager.backup_workspaces(path_to_csv="workspaces.csv")
145+
146+
# Backup workspace hierarchies (workspace + all children)
147+
backup_manager.backup_hierarchies(workspace_ids=["parent_workspace"])
148+
149+
# Backup entire organization
150+
backup_manager.backup_entire_organization()
151+
```
152+
153+
#### Using S3 Storage
154+
``` python
155+
from gooddata_pipelines.backup_and_restore.models.storage import (
156+
BackupRestoreConfig,
157+
S3StorageConfig,
158+
StorageType,
159+
)
160+
161+
# Configure S3 storage with explicit credentials
162+
config = BackupRestoreConfig(
163+
storage_type=StorageType.S3,
164+
storage=S3StorageConfig(
165+
bucket="my-backup-bucket",
166+
backup_path="gooddata-backups/",
167+
aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
168+
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
169+
aws_default_region="us-east-1"
170+
),
171+
)
172+
173+
# Or use AWS profile
174+
config = BackupRestoreConfig(
175+
storage_type=StorageType.S3,
176+
storage=S3StorageConfig(
177+
bucket="my-backup-bucket",
178+
backup_path="gooddata-backups/",
179+
profile="my-aws-profile"
180+
),
181+
)
182+
183+
backup_manager = BackupManager.create(
184+
config=config,
185+
host=os.environ["GDC_HOSTNAME"],
186+
token=os.environ["GDC_AUTH_TOKEN"]
187+
)
188+
189+
backup_manager.backup_workspaces(workspace_ids=["workspace1"])
190+
```
191+
192+
#### Using GoodData Profile
193+
You can also create the BackupManager from a GoodData profile file:
194+
``` python
195+
from pathlib import Path
196+
197+
backup_manager = BackupManager.create_from_profile(
198+
config=config,
199+
profile="production",
200+
profiles_path=Path.home() / ".gooddata" / "profiles.yaml"
201+
)
202+
```
203+
204+
CSV File Format
205+
When providing workspace IDs via a CSV file, the file should have a workspace_id column:
206+
``` csv
207+
workspace_id
208+
workspace1
209+
workspace2
210+
workspace3
211+
```
212+
213+
#### Configuration Options
214+
215+
The BackupRestoreConfig class accepts the following parameters:
216+
- `storage_type` - Type of storage (StorageType.LOCAL or StorageType.S3)
217+
- `storage` - Storage-specific configuration (LocalStorageConfig or S3StorageConfig)
218+
- `batch_size` (optional, default: 10) - Number of workspaces to process in one batch
219+
- `api_calls_per_second` (optional, default: 10) - Rate limit for API calls to avoid throttling
220+
- `api_page_size` (optional, default: 500) - Page size for paginated API calls
221+
222+
223+
#### Error Handling and Retries
224+
225+
The backup process includes automatic retry logic with exponential backoff. If a batch fails, it will retry up to 3 times before failing completely. Individual workspace errors are logged but don't stop the entire backup process.
226+
Restore
227+
228+
Note: Restore functionality is currently in development.
229+
>>>>>>> Stashed changes

0 commit comments

Comments
 (0)