Merge pull request #1172 from janmatzek/jmat-SVS-1262-generic-provisioning-function-for-good-data-pipelines

janmatzek · web-flow · commit 9c40e48bb36f · 2025-10-06T14:29:42.000+02:00
feat(gooddata-pipelines): add generic provisioning function
diff --git a/README.md b/README.md
@@ -27,6 +27,12 @@ create pandas series and data frames.
 
 Check out the GoodData Pandas [documentation](https://gooddata-pandas.readthedocs.io/en/latest/) to learn more and get started.
 
+### GoodData Pipelines
+
+The [gooddata-pipelines](./gooddata-pipelines/) package provides easy ways to manage the lifecycle of GoodData Cloud.
+
+Check out the GoodData Pipelines [documentation](https://www.gooddata.com/docs/python-sdk/latest/pipelines-overview/) to learn more and get started.
+
 ### GoodData FlexConnect
 
 The [gooddata-flexconnect](./gooddata-flexconnect) package is the foundation for writing custom FlexConnect data sources.
@@ -45,5 +51,6 @@ into PostgreSQL as foreign tables that you can then query using SQL.
 Check out the GoodData Foreign Data Wrapper [documentation](https://gooddata-fdw.readthedocs.io/en/latest/) to learn more and get started.
 
 ## Contributing
+
 If you would like to improve, extend or fix a feature in the repository, read and follow the
 [Contributing guide](./CONTRIBUTING.md).
diff --git a/docs/content/en/latest/pipelines/provisioning/_index.md b/docs/content/en/latest/pipelines/provisioning/_index.md
@@ -15,7 +15,7 @@ Resources you can provision using GoodData Pipelines:
 - [Users](users/)
 - [User Groups](user_groups/)
 - [Workspace Permissions](workspace-permissions/)
-
+- [User Data Filters](user_data_filters/)
 
 ## Workflow Types
 
@@ -30,8 +30,8 @@ The provisioning types employ different algorithms and expect different structur
 
 Full load provisioning aims to fully synchronize the state of your GoodData instance with the provided input. This workflow will create new resources and update existing ones based on the input. Any resources existing on GoodData Cloud not included in the input will be deleted.
 
-{{% alert color="warning" title="Full loads are destrucitve"%}}
-Full load provisioning will delete any existing resources not included in your input data. Test in non-production environment.
+{{% alert color="warning" title="Full loads are destructive"%}}
+Full load provisioning will delete any existing resources not included in your input data. Test in a non-production environment.
 {{% /alert %}}
 
 ### Incremental Load
@@ -40,14 +40,20 @@ During incremental provisioning, the algorithm will only interact with resources
 
 ### Workflow Comparison
 
-| **Aspect** | **Full Load** | **Incremental Load** |
-|------------|---------------|----------------------|
-| **Scope** | Synchronizes entire state | Only specified resources |
+| **Aspect**   | **Full Load**                 | **Incremental Load**                             |
+| ------------ | ----------------------------- | ------------------------------------------------ |
+| **Scope**    | Synchronizes entire state     | Only specified resources                         |
 | **Deletion** | Deletes unspecified resources | Only deletes resources marked `is_active: False` |
-| **Use Case** | Complete environment setup | Targeted updates |
+| **Use Case** | Complete environment setup    | Targeted updates                                 |
 
 ## Usage
 
+You can use either resource-specific Provisioner objects, or a generic function to handle the provisioning logic.
+
+The generic function validates the data, creates a provisioner instance, and runs the provisioning under the hood, reducing the boilerplate code. On the other hand, the resource-specific approach is more transparent with expected data structures.
+
+### Provisioner Objects
+
 Regardless of workflow type or resource being provisioned, the typical usage follows these steps:
 
 1. Initialize the provisioner
@@ -56,9 +62,38 @@ Regardless of workflow type or resource being provisioned, the typical usage fol
 
 1. Run the selected provisioning method (`.full_load()` or `.incremental_load()`) with your validated data
 
-
 Check the [resource pages](#supported-resources) for detailed instructions and examples of workflow implementations.
 
+### Generic Function
+
+You can also use a generic provisioning function:
+
+```python
+from gooddata_pipelines import WorkflowType, provision
+
+```
+
+The function requires the following arguments:
+
+| name          | description                                            |
+| ------------- | ------------------------------------------------------ |
+| data          | Raw data as a list of dictionaries                     |
+| workflow_type | Enum indicating provisioned resource and workflow type |
+| host          | URL of your GoodData instance                          |
+| token         | GoodData Personal Access Token                         |
+| logger        | Logger object to subscribe to the logs _[optional]_    |
+
+The function will validate the raw data against the model corresponding to the selected `workflow_type` value. Note that the function only supports resources listed in the `WorkflowType` enum.
+
+To see the expected data structure, check out the pages dedicated to individual resources. The raw dictionaries should have the same structure as the validation models outlined there.
+
+To run the provisioning, simply call the function with its required arguments.
+
+```python
+provision(raw_data, WorkflowType.WORKSPACE_INCREMENTAL_LOAD, host, token)
+
+```
+
 ## Logs
 
 By default, the provisioners operate silently. To monitor progress and troubleshoot issues, you can subscribe to the emitted logs using the `.subscribe()` method on the `logger` property of the provisioner instance.
@@ -89,3 +124,60 @@ provisioner.logger.subscribe(logger)
 # Continue with the provisioning
 ...
 ```
+
+## Example
+
+Here is an example of workspace provisioning using the generic function.
+
+```python
+import logging
+
+# Import the WorkflowType enum and the generic function from GoodData Pipelines
+from gooddata_pipelines import WorkflowType, provision
+
+# Optional: set up a logger to pass to the function. The logger will be subscribed
+# to the logs emitted by the provisioning scripts.
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+host = "http://localhost:3000"
+token = "some_user_token"
+
+# Prepare your raw data
+raw_data: list[dict] = [
+    {
+        "parent_id": "parent_workspace_id",
+        "workspace_id": "workspace_id_1",
+        "workspace_name": "Workspace 1",
+        "workspace_data_filter_id": "wdf__id",
+        "workspace_data_filter_values": ["wdf_value_1"],
+        "is_active": True,
+    },
+    {
+        "parent_id": "parent_workspace_id",
+        "workspace_id": "workspace_id_2",
+        "workspace_name": "Workspace 2",
+        "workspace_data_filter_id": "wdf__id",
+        "workspace_data_filter_values": ["wdf_value_2"],
+        "is_active": True,
+    },
+    {
+        "parent_id": "parent_workspace_id",
+        "workspace_id": "child_workspace_id_1",
+        "workspace_name": "Workspace 3",
+        "workspace_data_filter_id": "wdf__id",
+        "workspace_data_filter_values": ["wdf_value_3"],
+        "is_active": True,
+    },
+]
+
+# Run the provisioning function
+provision(
+    data=raw_data,
+    workflow_type=WorkflowType.WORKSPACE_INCREMENTAL_LOAD,
+    host=host,
+    token=token,
+    logger=logger,
+)
+```
diff --git a/docs/content/en/latest/pipelines/provisioning/user_groups.md b/docs/content/en/latest/pipelines/provisioning/user_groups.md
@@ -10,6 +10,8 @@ User groups enable you to organize users and manage permissions at scale by assi
 
 You can provision user groups using full or incremental load methods. Each of these methods requires a specific input type.
 
+{{% alert color="info" %}} This section covers the usage with manual data validation. You can also take advantage of the generic provisioning function. You can read more about it on the [Provisioning](../#generic-function) page. {{% /alert %}}
+
 ## Usage
 
 Start by importing and initializing the UserGroupProvisioner.
@@ -26,10 +28,10 @@ provisioner = UserGroupProvisioner.create(host=host, token=token)
 
 ```
 
-
 Then validate your data using an input model corresponding to the provisioned resource and selected workflow type, i.e., `UserGroupFullLoad` if you intend to run the provisioning in full load mode, or `UserGroupIncrementalLoad` if you want to provision incrementally.
 
 The models expect the following fields:
+
 - **user_group_id**: ID of the user group.
 - **user_group_name**: Name of the user group.
 - **parent_user_groups**: A list of parent user group IDs.
@@ -130,7 +132,6 @@ provisioner.full_load(validated_data)
 
 ```
 
-
 ### Incremental Load
 
 ```python
diff --git a/docs/content/en/latest/pipelines/provisioning/users.md b/docs/content/en/latest/pipelines/provisioning/users.md
@@ -4,11 +4,12 @@ linkTitle: "Users"
 weight: 2
 ---
 
-
 User provisioning allows you to create, update, or delete user profiles in your GoodData environment.
 
 You can provision users using full or incremental load methods. Each of these methods requires a specific input type.
 
+{{% alert color="info" %}} This section covers the usage with manual data validation. You can also take advantage of the generic provisioning function. You can read more about it on the [Provisioning](../#generic-function) page. {{% /alert %}}
+
 ## Usage
 
 Start by importing and initializing the UserProvisioner.
@@ -25,7 +26,6 @@ provisioner = UserProvisioner.create(host=host, token=token)
 
 ```
 
-
 Then validate your data using an input model corresponding to the provisioned resource and selected workflow type, i.e., `UserFullLoad` if you intend to run the provisioning in full load mode, or `UserIncrementalLoad` if you want to provision incrementally.
 
 The models expect the following fields:
@@ -147,7 +147,6 @@ provisioner.full_load(validated_data)
 
 ```
 
-
 ### Incremental Load
 
 ```python
diff --git a/docs/content/en/latest/pipelines/provisioning/workspace-permissions.md b/docs/content/en/latest/pipelines/provisioning/workspace-permissions.md
@@ -8,6 +8,8 @@ Workspace permission provisioning allows you to create, update, or delete user p
 
 You can provision workspace permissions using full or incremental load methods. Each of these methods requires a specific input type.
 
+{{% alert color="info" %}} This section covers the usage with manual data validation. You can also take advantage of the generic provisioning function. You can read more about it on the [Provisioning](../#generic-function) page. {{% /alert %}}
+
 ## Usage
 
 Start by importing and initializing the PermissionProvisioner.
@@ -24,10 +26,10 @@ provisioner = PermissionProvisioner.create(host=host, token=token)
 
 ```
 
-
 Then validate your data using an input model corresponding to the provisioned resource and selected workflow type, i.e., `PermissionFullLoad` if you intend to run the provisioning in full load mode, or `PermissionIncrementalLoad` if you want to provision incrementally.
 
 The models expect the following fields:
+
 - **permission**: Permission you want to grant, e.g., `VIEW`, `ANALYZE`, `MANAGE`.
 - **workspace_id**: ID of the workspace the permission will be applied to.
 - **entity_id**: ID of the entity (user or user group) which will receive the permission.
@@ -138,7 +140,6 @@ provisioner.full_load(validated_data)
 
 ```
 
-
 ### Incremental Load
 
 ```python
diff --git a/docs/content/en/latest/pipelines/provisioning/workspaces.md b/docs/content/en/latest/pipelines/provisioning/workspaces.md
@@ -12,6 +12,7 @@ See [Multitenancy: One Platform, Many Customers](https://www.gooddata.com/resour
 
 You can provision child workspaces using full or incremental load methods. Each of these methods requires a specific input type.
 
+{{% alert color="info" %}} This section covers the usage with manual data validation. You can also take advantage of the generic provisioning function. You can read more about it on the [Provisioning](../#generic-function) page. {{% /alert %}}
 
 ## Usage
 
@@ -29,7 +30,6 @@ provisioner = WorkspaceProvisioner.create(host=host, token=token)
 
 ```
 
-
 Then validate your data using an input model corresponding to the provisioned resource and selected workflow type, i.e., `WorkspaceFullLoad` if you intend to run the provisioning in full load mode, or `WorkspaceIncrementalLoad` if you want to provision incrementally.
 
 The models expect the following fields:
@@ -93,7 +93,6 @@ Now with the provisioner initialized and your data validated, you can run the pr
 provisioner.full_load(validated_data)
 ```
 
-
 ## Workspace Data Filters
 
 If you want to apply Workspace Data Filters to a child workspace, the filter must be set up on the parent workspace before you run the provisioning.
diff --git a/gooddata-pipelines/README.md b/gooddata-pipelines/README.md
@@ -28,33 +28,37 @@ The provisioning module exposes _Provisioner_ classes reflecting the different e
 
 ```python
 import os
+import logging
+
 from csv import DictReader
 from pathlib import Path
 
 # Import the Entity Provisioner class and corresponding model from gooddata_pipelines library
 from gooddata_pipelines import UserFullLoad, UserProvisioner
-from gooddata_pipelines.logger.logger import LogObserver
-
-# Optionally, subscribe a standard Python logger to the LogObserver
-import logging
-logger = logging.getLogger(__name__)
-LogObserver().subscribe(logger)
 
 # Create the Provisioner instance - you can also create the instance from a GDC yaml profile
 provisioner = UserProvisioner(
     host=os.environ["GDC_HOSTNAME"], token=os.environ["GDC_AUTH_TOKEN"]
 )
 
+# Optional: set up logging and subscribe to logs emitted by the provisioner
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+provisioner.logger.subscribe(logger)
+
 # Load your data from your data source
 source_data_path: Path = Path("path/to/some.csv")
 source_data_reader = DictReader(source_data_path.read_text().splitlines())
 source_data = [row for row in source_data_reader]
 
-# Validate your input data with
+# Validate your input data
 full_load_data: list[UserFullLoad] = UserFullLoad.from_list_of_dicts(
     source_data
 )
+
+# Run the provisioning
 provisioner.full_load(full_load_data)
+
 ```
 
 ## Bugs & Requests
@@ -64,5 +68,5 @@ or request features.
 
 ## Changelog
 
-See  [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
+See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
 and a list of changes.
diff --git a/gooddata-pipelines/gooddata_pipelines/__init__.py b/gooddata-pipelines/gooddata_pipelines/__init__.py
@@ -51,6 +51,10 @@
 )
 from .provisioning.entities.workspaces.workspace import WorkspaceProvisioner
 
+# -------- Generic Provisioning --------
+from .provisioning.generic.config import WorkflowType
+from .provisioning.generic.provision import provision
+
 __all__ = [
     "BackupManager",
     "BackupRestoreConfig",
@@ -79,5 +83,7 @@
     "CustomFieldDefinition",
     "ColumnDataType",
     "CustomFieldType",
+    "provision",
+    "WorkflowType",
     "__version__",
 ]
diff --git a/gooddata-pipelines/gooddata_pipelines/provisioning/generic/__init__.py b/gooddata-pipelines/gooddata_pipelines/provisioning/generic/__init__.py
@@ -0,0 +1 @@
+# (C) 2025 GoodData Corporation
diff --git a/gooddata-pipelines/gooddata_pipelines/provisioning/generic/config.py b/gooddata-pipelines/gooddata_pipelines/provisioning/generic/config.py
diff --git a/gooddata-pipelines/gooddata_pipelines/provisioning/generic/provision.py b/gooddata-pipelines/gooddata_pipelines/provisioning/generic/provision.py