Skip to content

[lake/iceberg] Add rest catalog cache#2622

Open
bakjos wants to merge 1 commit intoapache:mainfrom
bakjos:bakjos/iceberg_rest_cache
Open

[lake/iceberg] Add rest catalog cache#2622
bakjos wants to merge 1 commit intoapache:mainfrom
bakjos:bakjos/iceberg_rest_cache

Conversation

@bakjos
Copy link
Contributor

@bakjos bakjos commented Feb 9, 2026

Purpose

Query performance for the data lake table is very slow compared to querying remote storage. Add per-task lazy caching of Iceberg Catalog and Table inside IcebergLakeSource so that createRecordReader reuses one loadTable for all lake splits in a Flink source task, eliminating O(splits) REST round-trips when using a REST catalog.

Before: N splits → N × (createCatalog + loadTable) → N REST calls per task.
After: N splits → 1 × (createCatalog + loadTable) on first split, then N-1 reuses → 1 REST loadTable per task. With TTL enabled, the cache is refreshed after the TTL period so externally changed table metadata is picked up.

Linked issue: close #2619

Brief change log

Tests

API and Format

Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg s3 table tearing bugs

1 participant