Skip to content

[ISSUE] publish_table() inconsistently returns pipeline_id=None - unreliable for sync tracking #1133

@msetsma

Description

@msetsma

publish_table() returns a PublishedTable object with a pipeline_id field that is inconsistently populated. Identical operations produce valid pipeline IDs in some cases and None in others.

A None value is ambiguous:

  • Publish succeeded, pipeline exists, ID not returned
  • Publish failed silently, no pipeline created
  • Race condition where pipeline registration is incomplete

Reproduction

from databricks.feature_engineering import FeatureEngineeringClient

fe = FeatureEngineeringClient()
online_store = fe.get_online_store(name="my-online-store")

result = fe.publish_table(
    online_store=online_store,
    source_table_name="catalog.schema.offline_feature_table",
    online_table_name="catalog.schema.online_feature_table"
)

# Inconsistent results:
# <PublishedTable: online_table_name='...', pipeline_id='abc123-def456'>
# <PublishedTable: online_table_name='...', pipeline_id=None>

Observed Behavior

Scenario pipeline_id Actual Outcome
Successful publish Valid ID Table syncs correctly
Successful publish None Table syncs correctly (verified in UI)
Silent failure None Table never created

pipeline_id=None maps to both success and failure states. Conditional logic based on this field is useless.

Expected Behavior

pipeline_id must be populated whenever a sync pipeline is created:

result = fe.publish_table(...)
if result.pipeline_id:
    print(f"Tracking pipeline: {result.pipeline_id}")
else:
    raise Exception("Publish failed - no pipeline created")

Alternative: provide a separate method to retrieve sync status if pipeline_id cannot be reliably returned.

Impact

  • No programmatic way to track sync progress
  • No way to distinguish successful publish from failed publish
  • CI/CD pipelines and monitoring cannot rely on return values
  • Defensive checks like if result.pipeline_id have no semantic meaning

Environment

  • Databricks Runtime: 16.4 LTS ML
  • databricks-feature-engineering: tested with >=0.13.0a3 and >=0.13.0
  • Compute: Serverless

Debug Logs

Available on request, busy switching application architecture.

Additional Context

When pipeline_id is None but publish succeeded, the table and sync status are visible in Unity Catalog UI. The pipeline ID probably exists in the backend but is not returned by the API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions