Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Accept table function as destination for part export, inherit schema if not explicitly provided. Example:

ALTER TABLE src_merge_tree EXPORT PART '2020_1_1_0' TO TABLE FUNCTION s3(s3_conn, filename='...', partition_strategy...) PARTITION BY ..

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Jan 19, 2026

Workflow [PR], commit [d613c80]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c7f7c7c90

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +72 to 74
const auto & destination_storage = manifest.destination_storage_ptr;
const auto destination_storage_id = destination_storage->getStorageID();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Revalidate destination table before export task runs

The export task now uses the captured destination_storage_ptr directly without rechecking whether the destination table still exists or is dropped/detached. If the destination table is dropped after the ALTER EXPORT is queued but before this background task executes, destination_storage->import(...) will still run and can write data for a table that no longer exists (or has been recreated with a different definition). Previously the task re-resolved the destination from the catalog and aborted on UNKNOWN_TABLE; consider re-looking it up or at least checking is_dropped/locking before proceeding.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting one. This is kind of an existing problem, not exclusive to this PR.

Plain object storage tables don't have state, and there is no trigger / event for when these tables are dropped. At the same time, it is safe (in terms of thread and memory safety) to write to S3 even after the table has been dropped.

Being that said, I think we have a few options:

  1. Document it and leave it as is, meaning that if a table gets deleted after the export process has started, it will continue writing to the S3 bucket.
  2. Somehow trigger an event when the table gets deleted, and cancel exports. Might leak abstractions
  3. Constantly check in the pipeline if the table remains alive

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I don't think this is a major issue. We probably have bigger fish to fry

Comment on lines 80 to 84
bool operator<(const MergeTreePartExportManifest & rhs) const
{
const auto lhs_storage_id = destination_storage_ptr->getStorageID();
const auto rhs_storage_id = rhs.destination_storage_ptr->getStorageID();
// Lexicographic comparison: first compare destination storage, then part name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid set ordering on mutable StorageID

The manifest comparison now derives ordering from destination_storage_ptr->getStorageID(), but StorageID for a table can change on RENAME/DETACH while an export is in progress. Because export_manifests is a std::set, mutating the key of an element already in the set violates ordering invariants and can lead to missed lookups/erasures or undefined behavior. Consider storing a stable StorageID (or a hash of table-function arguments) inside the manifest for ordering, even if you keep the StoragePtr for execution.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly interesting one, and kind of related to #1320 (comment).

@arthurpassos
Copy link
Collaborator Author

arthurpassos commented Jan 19, 2026

There is one interesting thing to validate on this one: exports are considered duplicate based on source table + part name + destination table name. I need to check what is the name used for table functions.

Edit1:

arthur :) alter table replicated_source export part '2020_0_0_0' to table function s3(s3_conn, filename='function', structure='id UInt32, year UInt16', format=Parquet, partition_strategy='hive') PARTITION BY year;

ALTER TABLE replicated_source
    (EXPORT PART '2020_0_0_0' TO TABLE FUNCTION s3(s3_conn, filename = 'function', structure = 'id UInt32, year UInt16', format = Parquet, partition_strategy = 'hive') PARTITION BY year)

Query id: 5ca93a65-5ab3-45b3-97c0-b0f87d10e060

Ok.

0 rows in set. Elapsed: 50.131 sec. 

arthur :) alter table replicated_source export part '2020_0_0_0' to table function s3(s3_conn, filename='function2', structure='id UInt32, year UInt16', format=Parquet, partition_strategy='hive') PARTITION BY year;

ALTER TABLE replicated_source
    (EXPORT PART '2020_0_0_0' TO TABLE FUNCTION s3(s3_conn, filename = 'function2', structure = 'id UInt32, year UInt16', format = Parquet, partition_strategy = 'hive') PARTITION BY year)

Query id: 89710dcd-0e8c-48c7-9c3a-f003a82b1dcb


Elapsed: 2.795 sec. 

Received exception from server (version 25.8.12):
Code: 236. DB::Exception: Received from localhost:9000. DB::Exception: Data part '2020_0_0_0' is already being exported to table '_table_function.s3'. (ABORTED)


Edit2:

Table function names, by default, are all the same as observed above.

As stated earlier, the list of manifests is unique based on part name + destination table name. But why? IIRC, the only reason is that we wanted to avoid colliding exports (i.e, exporting the same part to the same file path concurrently). According to the S3 docs (chatgpt, actually), the last upload will win in non-versioned buckets. So whoever sends the last PutObject or CompleteMultiPartUpload, wins.

This isn't a big problem I suppose, but it sounds like a cool protection to have. At the same time, it is not a very good protection as it is based on the table name, not the destination storage. For instance: if you have two s3 tables with different names pointing to the same location, it will export it.

The options we have from the top of my head:

  1. Drop this protection. Easiest, up to the user.
  2. Generate a unique id (query id) for each table function, this will allow multiple exports to table functions, but does not prevent the user from exporting the same part to the same location.
  3. Use a key based on the s3 path.

@arthurpassos
Copy link
Collaborator Author

arthurpassos commented Jan 19, 2026

There is one interesting thing to validate on this one: exports are considered duplicate based on source table + part name + destination table name. I need to check what is the name used for table functions.

Edit1:

arthur :) alter table replicated_source export part '2020_0_0_0' to table function s3(s3_conn, filename='function', structure='id UInt32, year UInt16', format=Parquet, partition_strategy='hive') PARTITION BY year;

ALTER TABLE replicated_source
    (EXPORT PART '2020_0_0_0' TO TABLE FUNCTION s3(s3_conn, filename = 'function', structure = 'id UInt32, year UInt16', format = Parquet, partition_strategy = 'hive') PARTITION BY year)

Query id: 5ca93a65-5ab3-45b3-97c0-b0f87d10e060

Ok.

0 rows in set. Elapsed: 50.131 sec. 

arthur :) alter table replicated_source export part '2020_0_0_0' to table function s3(s3_conn, filename='function2', structure='id UInt32, year UInt16', format=Parquet, partition_strategy='hive') PARTITION BY year;

ALTER TABLE replicated_source
    (EXPORT PART '2020_0_0_0' TO TABLE FUNCTION s3(s3_conn, filename = 'function2', structure = 'id UInt32, year UInt16', format = Parquet, partition_strategy = 'hive') PARTITION BY year)

Query id: 89710dcd-0e8c-48c7-9c3a-f003a82b1dcb


Elapsed: 2.795 sec. 

Received exception from server (version 25.8.12):
Code: 236. DB::Exception: Received from localhost:9000. DB::Exception: Data part '2020_0_0_0' is already being exported to table '_table_function.s3'. (ABORTED)

Edit2:

Table function names, by default, are all the same as observed above.

As stated earlier, the list of manifests is unique based on part name + destination table name. But why? IIRC, the only reason is that we wanted to avoid colliding exports (i.e, exporting the same part to the same file path concurrently). According to the S3 docs (chatgpt, actually), the last upload will win in non-versioned buckets. So whoever sends the last PutObject or CompleteMultiPartUpload, wins.

This isn't a big problem I suppose, but it sounds like a cool protection to have. At the same time, it is not a very good protection as it is based on the table name, not the destination storage. For instance: if you have two s3 tables with different names pointing to the same location, it will export it.

The options we have from the top of my head:

  1. Drop this protection. Easiest, up to the user.
  2. Generate a unique id (query id) for each table function, this will allow multiple exports to table functions, but does not prevent the user from exporting the same part to the same location.
  3. Use a key based on the s3 path.

Discussed with Misha and Dima, two options:

  1. Just use part name as the key, meaning a given part can't be exported to different locations at the same time.
  2. Use a key based on the s3 path (n3 above).

For now, I implemented n1.

@arthurpassos
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d613c808e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 80 to 83
bool operator<(const MergeTreePartExportManifest & rhs) const
{
// Lexicographic comparison: first compare destination storage, then part name
auto lhs_storage = destination_storage_id.getQualifiedName();
auto rhs_storage = rhs.destination_storage_id.getQualifiedName();

if (lhs_storage != rhs_storage)
return lhs_storage < rhs_storage;

return data_part->name < rhs.data_part->name;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include destination in export manifest ordering

Because MergeTreeData::exportPartToTable stores manifests in a std::set and relies on the comparator for uniqueness, comparing only data_part->name makes all exports of the same part equivalent even if the destination differs. This means concurrent exports of the same part to two different tables or table functions will now collide and the second will fail with “already being exported,” which is a regression from the previous per-destination behavior. Consider including the destination (e.g., storage ID or pointer) in operator</operator== so the set only deduplicates identical part+destination pairs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants