-
Notifications
You must be signed in to change notification settings - Fork 180
Description
publish_table() returns a PublishedTable object with a pipeline_id field that is inconsistently populated. Identical operations produce valid pipeline IDs in some cases and None in others.
A None value is ambiguous:
- Publish succeeded, pipeline exists, ID not returned
- Publish failed silently, no pipeline created
- Race condition where pipeline registration is incomplete
Reproduction
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
online_store = fe.get_online_store(name="my-online-store")
result = fe.publish_table(
online_store=online_store,
source_table_name="catalog.schema.offline_feature_table",
online_table_name="catalog.schema.online_feature_table"
)
# Inconsistent results:
# <PublishedTable: online_table_name='...', pipeline_id='abc123-def456'>
# <PublishedTable: online_table_name='...', pipeline_id=None>Observed Behavior
| Scenario | pipeline_id |
Actual Outcome |
|---|---|---|
| Successful publish | Valid ID | Table syncs correctly |
| Successful publish | None |
Table syncs correctly (verified in UI) |
| Silent failure | None |
Table never created |
pipeline_id=None maps to both success and failure states. Conditional logic based on this field is useless.
Expected Behavior
pipeline_id must be populated whenever a sync pipeline is created:
result = fe.publish_table(...)
if result.pipeline_id:
print(f"Tracking pipeline: {result.pipeline_id}")
else:
raise Exception("Publish failed - no pipeline created")Alternative: provide a separate method to retrieve sync status if pipeline_id cannot be reliably returned.
Impact
- No programmatic way to track sync progress
- No way to distinguish successful publish from failed publish
- CI/CD pipelines and monitoring cannot rely on return values
- Defensive checks like
if result.pipeline_idhave no semantic meaning
Environment
- Databricks Runtime: 16.4 LTS ML
- databricks-feature-engineering: tested with
>=0.13.0a3and>=0.13.0 - Compute: Serverless
Debug Logs
Available on request, busy switching application architecture.
Additional Context
When pipeline_id is None but publish succeeded, the table and sync status are visible in Unity Catalog UI. The pipeline ID probably exists in the backend but is not returned by the API.