You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gooddata-pipelines/README.md
+158-2Lines changed: 158 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ You can use the package to manage following resources in GDC:
10
10
- User/Group permissions
11
11
- User Data Filters
12
12
- Child workspaces (incl. Workspace Data Filter settings)
13
-
1._[PLANNED]:_Backup and restore of workspaces
13
+
1. Backup and restore of workspaces
14
14
1._[PLANNED]:_ Custom fields management
15
15
- extend the Logical Data Model of a child workspace
16
16
@@ -33,7 +33,7 @@ import logging
33
33
from csv import DictReader
34
34
from pathlib import Path
35
35
36
-
# Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
36
+
# Import the Entity Provisioner class and corresponding model from the gooddata_pipelines library
37
37
from gooddata_pipelines import UserFullLoad, UserProvisioner
38
38
39
39
# Create the Provisioner instance - you can also create the instance from a GDC yaml profile
@@ -70,3 +70,159 @@ or request features.
70
70
71
71
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
72
72
and a list of changes.
73
+
=======
74
+
Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository
75
+
76
+
## Backup and restore of workspaces
77
+
The backup and restore module allows you to create snapshots of GoodData Cloud workspaces and restore them later. This is useful for:
78
+
79
+
- Creating backups before major changes
80
+
- Migrating workspaces between environments
81
+
- Disaster recovery scenarios
82
+
- Copying workspace configurations
83
+
84
+
### Backup
85
+
86
+
The module supports three backup modes:
87
+
88
+
1.**List of workspaces** - Backup specific workspaces by providing a list of workspace IDs
89
+
2.**Workspace hierarchies** - Backup a workspace and all its direct and indirect children
90
+
3.**Entire organization** - Backup all workspaces in the organization
91
+
92
+
Each backup includes:
93
+
- Workspace declarative model (logical data model, analytics model, permissions)
94
+
- User data filters
95
+
- Filter views
96
+
- Automations
97
+
98
+
#### Storage Options
99
+
100
+
Backups can be stored in:
101
+
-**Local storage** - Save backups to a local directory
102
+
-**S3 storage** - Upload backups to an AWS S3 bucket
103
+
104
+
#### Basic Usage
105
+
106
+
```python
107
+
import os
108
+
from pathlib import Path
109
+
110
+
from gooddata_pipelines import BackupManager
111
+
from gooddata_pipelines.backup_and_restore.models.storage import (
112
+
BackupRestoreConfig,
113
+
LocalStorageConfig,
114
+
StorageType,
115
+
)
116
+
from gooddata_pipelines.logger.logger import LogObserver
117
+
118
+
# Optionally, subscribe a standard Python logger to the LogObserver
119
+
import logging
120
+
logger = logging.getLogger(__name__)
121
+
LogObserver().subscribe(logger)
122
+
123
+
# Configure backup storage
124
+
config = BackupRestoreConfig(
125
+
storage_type=StorageType.LOCAL,
126
+
storage=LocalStorageConfig(),
127
+
batch_size=10, # Number of workspaces to process in one batch
128
+
api_calls_per_second=10, # Rate limit for API calls
When providing workspace IDs via a CSV file, the file should have a workspace_id column:
205
+
```csv
206
+
workspace_id
207
+
workspace1
208
+
workspace2
209
+
workspace3
210
+
```
211
+
212
+
#### Configuration Options
213
+
214
+
The BackupRestoreConfig class accepts the following parameters:
215
+
-`storage_type` - Type of storage (StorageType.LOCAL or StorageType.S3)
216
+
-`storage` - Storage-specific configuration (LocalStorageConfig or S3StorageConfig)
217
+
-`batch_size` (optional, default: 10) - Number of workspaces to process in one batch
218
+
-`api_calls_per_second` (optional, default: 10) - Rate limit for API calls to avoid throttling
219
+
-`api_page_size` (optional, default: 500) - Page size for paginated API calls
220
+
221
+
222
+
#### Error Handling and Retries
223
+
224
+
The backup process includes automatic retry logic with exponential backoff. If a batch fails, it will retry up to 3 times before failing completely. Individual workspace errors are logged but don't stop the entire backup process.
225
+
Restore
226
+
227
+
Note: Restore functionality is currently in development.
0 commit comments