From 15488ac4717a83b35abd0846b891b496808e4133 Mon Sep 17 00:00:00 2001 From: Aditya41150 Date: Sun, 4 Jan 2026 13:49:47 +0530 Subject: [PATCH 1/4] [FLUSS-2298] Move 'Merge Engine' to the top-level of 'Table Design' This commit restructures the documentation hierarchy to elevate Merge Engines to a top-level concept under Table Design, improving visibility and discoverability. Changes: - Moved merge-engines directory from table-design/table-types/pk-table/merge-engines to table-design/merge-engines - Updated sidebar positions: Overview (1), Table Types (2), Merge Engines (3), Data Distribution (4), Data Types (5) - Fixed all internal documentation links to reflect the new structure - Updated references in engine-flink/options.md and pk-table/index.md This change highlights the strategic importance of merge logic in Fluss's architecture, especially the newly introduced Aggregation Merge Engine for real-time feature engineering. Closes #2298 --- website/docs/engine-flink/options.md | 2 +- website/docs/table-design/data-types.md | 2 +- .../pk-table => }/merge-engines/_category_.json | 2 +- .../pk-table => }/merge-engines/aggregation.md | 4 ++-- .../{table-types/pk-table => }/merge-engines/default.md | 0 .../{table-types/pk-table => }/merge-engines/first-row.md | 0 .../{table-types/pk-table => }/merge-engines/index.md | 8 ++++---- .../{table-types/pk-table => }/merge-engines/versioned.md | 0 website/docs/table-design/overview.md | 2 +- website/docs/table-design/table-types/pk-table/index.md | 8 ++++---- 10 files changed, 14 insertions(+), 14 deletions(-) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/_category_.json (66%) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/aggregation.md (99%) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/default.md (100%) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/first-row.md (100%) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/index.md (57%) rename website/docs/table-design/{table-types/pk-table => }/merge-engines/versioned.md (100%) diff --git a/website/docs/engine-flink/options.md b/website/docs/engine-flink/options.md index a7114af492..1d1d48cba6 100644 --- a/website/docs/engine-flink/options.md +++ b/website/docs/engine-flink/options.md @@ -83,7 +83,7 @@ See more details about [ALTER TABLE ... SET](engine-flink/ddl.md#set-properties) | table.datalake.freshness | Duration | 3min | It defines the maximum amount of time that the datalake table's content should lag behind updates to the Fluss table. Based on this target freshness, the Fluss service automatically moves data from the Fluss table and updates to the datalake table, so that the data in the datalake table is kept up to date within this target. If the data does not need to be as fresh, you can specify a longer target freshness time to reduce costs. | | table.datalake.auto-compaction | Boolean | false | If true, compaction will be triggered automatically when tiering service writes to the datalake. It is disabled by default. | | table.datalake.auto-expire-snapshot | Boolean | false | If true, snapshot expiration will be triggered automatically when tiering service commits to the datalake. It is disabled by default. | -| table.merge-engine | Enum | (None) | Defines the merge engine for the primary key table. By default, primary key table uses the [default merge engine(last_row)](table-design/table-types/pk-table/merge-engines/default.md). It also supports two merge engines are `first_row`, `versioned` and `aggregation`. The [first_row merge engine](table-design/table-types/pk-table/merge-engines/first-row.md) will keep the first row of the same primary key. The [versioned merge engine](table-design/table-types/pk-table/merge-engines/versioned.md) will keep the row with the largest version of the same primary key. The `aggregation` merge engine will aggregate rows with the same primary key using field-level aggregate functions. | +| table.merge-engine | Enum | (None) | Defines the merge engine for the primary key table. By default, primary key table uses the [default merge engine(last_row)](table-design/merge-engines/default.md). It also supports two merge engines are `first_row`, `versioned` and `aggregation`. The [first_row merge engine](table-design/merge-engines/first-row.md) will keep the first row of the same primary key. The [versioned merge engine](table-design/merge-engines/versioned.md) will keep the row with the largest version of the same primary key. The `aggregation` merge engine will aggregate rows with the same primary key using field-level aggregate functions. | | table.merge-engine.versioned.ver-column | String | (None) | The column name of the version column for the `versioned` merge engine. If the merge engine is set to `versioned`, the version column must be set. | | table.delete.behavior | Enum | ALLOW | Controls the behavior of delete operations on primary key tables. Three modes are supported: `ALLOW` (default for default merge engine) - allows normal delete operations; `IGNORE` - silently ignores delete requests without errors; `DISABLE` - rejects delete requests and throws explicit errors. This configuration provides system-level guarantees for some downstream pipelines (e.g., Flink Delta Join) that must not receive any delete events in the changelog of the table. For tables with `first_row` or `versioned` or `aggregation` merge engines, this option is automatically set to `IGNORE` and cannot be overridden. Note: For `aggregation` merge engine, when set to `allow`, delete operations will remove the entire record. This configuration only applicable to primary key tables. | | table.changelog.image | Enum | FULL | Defines the changelog image mode for primary key tables. This configuration is inspired by similar settings in database systems like MySQL's `binlog_row_image` and PostgreSQL's `replica identity`. Two modes are supported: `FULL` (default) - produces both UPDATE_BEFORE and UPDATE_AFTER records for update operations, capturing complete information about updates and allowing tracking of previous values; `WAL` - does not produce UPDATE_BEFORE records. Only INSERT, UPDATE_AFTER (and DELETE if allowed) records are emitted. When WAL mode is enabled with default merge engine (no merge engine configured) and full row updates (not partial update), an optimization is applied to skip looking up old values, and in this case INSERT operations are converted to UPDATE_AFTER events. This mode reduces storage and transmission costs but loses the ability to track previous values. Only applicable to primary key tables. | diff --git a/website/docs/table-design/data-types.md b/website/docs/table-design/data-types.md index 0cbf9b9e61..9a20550080 100644 --- a/website/docs/table-design/data-types.md +++ b/website/docs/table-design/data-types.md @@ -1,6 +1,6 @@ --- title: "Data Types" -sidebar_position: 10 +sidebar_position: 5 --- # Data Types diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/_category_.json b/website/docs/table-design/merge-engines/_category_.json similarity index 66% rename from website/docs/table-design/table-types/pk-table/merge-engines/_category_.json rename to website/docs/table-design/merge-engines/_category_.json index 1fd102371d..12edf31d62 100644 --- a/website/docs/table-design/table-types/pk-table/merge-engines/_category_.json +++ b/website/docs/table-design/merge-engines/_category_.json @@ -1,4 +1,4 @@ { "label": "Merge Engines", - "position": 2 + "position": 3 } diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/aggregation.md b/website/docs/table-design/merge-engines/aggregation.md similarity index 99% rename from website/docs/table-design/table-types/pk-table/merge-engines/aggregation.md rename to website/docs/table-design/merge-engines/aggregation.md index ed44e2e256..2c8a983478 100644 --- a/website/docs/table-design/table-types/pk-table/merge-engines/aggregation.md +++ b/website/docs/table-design/merge-engines/aggregation.md @@ -1015,5 +1015,5 @@ For detailed information about Exactly-Once implementation, please refer to: [FI - [Default Merge Engine](./default.md) - [FirstRow Merge Engine](./first-row.md) - [Versioned Merge Engine](./versioned.md) -- [Primary Key Tables](../index.md) -- [Fluss Client API](../../../../apis/java-client.md) +- [Primary Key Tables](../table-types/pk-table/index.md) +- [Fluss Client API](../../apis/java-client.md) diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/default.md b/website/docs/table-design/merge-engines/default.md similarity index 100% rename from website/docs/table-design/table-types/pk-table/merge-engines/default.md rename to website/docs/table-design/merge-engines/default.md diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/first-row.md b/website/docs/table-design/merge-engines/first-row.md similarity index 100% rename from website/docs/table-design/table-types/pk-table/merge-engines/first-row.md rename to website/docs/table-design/merge-engines/first-row.md diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/index.md b/website/docs/table-design/merge-engines/index.md similarity index 57% rename from website/docs/table-design/table-types/pk-table/merge-engines/index.md rename to website/docs/table-design/merge-engines/index.md index dfb6798853..1fc7f9bb13 100644 --- a/website/docs/table-design/table-types/pk-table/merge-engines/index.md +++ b/website/docs/table-design/merge-engines/index.md @@ -11,7 +11,7 @@ However, users can specify a different merge engine to customize the merging beh The following merge engines are supported: -1. [Default Merge Engine (LastRow)](table-design/table-types/pk-table/merge-engines/default.md) -2. [FirstRow Merge Engine](table-design/table-types/pk-table/merge-engines/first-row.md) -3. [Versioned Merge Engine](table-design/table-types/pk-table/merge-engines/versioned.md) -4. [Aggregation Merge Engine](table-design/table-types/pk-table/merge-engines/aggregation.md) +1. [Default Merge Engine (LastRow)](default.md) +2. [FirstRow Merge Engine](first-row.md) +3. [Versioned Merge Engine](versioned.md) +4. [Aggregation Merge Engine](aggregation.md) diff --git a/website/docs/table-design/table-types/pk-table/merge-engines/versioned.md b/website/docs/table-design/merge-engines/versioned.md similarity index 100% rename from website/docs/table-design/table-types/pk-table/merge-engines/versioned.md rename to website/docs/table-design/merge-engines/versioned.md diff --git a/website/docs/table-design/overview.md b/website/docs/table-design/overview.md index 700d40749c..4afb74ef16 100644 --- a/website/docs/table-design/overview.md +++ b/website/docs/table-design/overview.md @@ -1,7 +1,7 @@ --- sidebar_label: Overview title: Table Overview -sidebar_position: 2 +sidebar_position: 1 --- # Table Overview diff --git a/website/docs/table-design/table-types/pk-table/index.md b/website/docs/table-design/table-types/pk-table/index.md index 261424b5b9..d9ab0ba350 100644 --- a/website/docs/table-design/table-types/pk-table/index.md +++ b/website/docs/table-design/table-types/pk-table/index.md @@ -82,10 +82,10 @@ However, users can specify a different merge engine to customize the merging beh The following merge engines are supported: -1. [Default Merge Engine (LastRow)](merge-engines/default.md) -2. [FirstRow Merge Engine](merge-engines/first-row.md) -3. [Versioned Merge Engine](merge-engines/versioned.md) -4. [Aggregation Merge Engine](merge-engines/aggregation.md) +1. [Default Merge Engine (LastRow)](../../merge-engines/default.md) +2. [FirstRow Merge Engine](../../merge-engines/first-row.md) +3. [Versioned Merge Engine](../../merge-engines/versioned.md) +4. [Aggregation Merge Engine](../../merge-engines/aggregation.md) ## Changelog Generation From b2c8ecc0a30e9d3d82846451962fcc274a709cfd Mon Sep 17 00:00:00 2001 From: Aditya41150 Date: Sun, 4 Jan 2026 13:52:43 +0530 Subject: [PATCH 2/4] docs: fix relative link in default merge engine documentation --- website/docs/table-design/merge-engines/default.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/table-design/merge-engines/default.md b/website/docs/table-design/merge-engines/default.md index 189582f9c2..ffe2d1b1e8 100644 --- a/website/docs/table-design/merge-engines/default.md +++ b/website/docs/table-design/merge-engines/default.md @@ -9,7 +9,7 @@ sidebar_position: 2 ## Overview The **Default Merge Engine** behaves as a LastRow merge engine that retains the latest record for a given primary key. It supports all the operations: `INSERT`, `UPDATE`, `DELETE`. -Additionally, the default merge engine supports [Partial Update](table-design/table-types/pk-table/index.md#partial-update), which preserves the latest values for the specified update columns. +Additionally, the default merge engine supports [Partial Update](../table-types/pk-table/index.md#partial-update), which preserves the latest values for the specified update columns. If the `'table.merge-engine'` property is not explicitly defined in the table properties when creating a Primary Key Table, the default merge engine will be applied automatically. From 83fee5fbf01cf4043c73d56d4a69927aecca1f64 Mon Sep 17 00:00:00 2001 From: Aditya41150 Date: Sun, 4 Jan 2026 22:03:20 +0530 Subject: [PATCH 3/4] docs: convert pk-table directory to pk-table.md and fix internal link paths --- website/docs/concepts/architecture.md | 2 +- website/docs/engine-flink/ddl.md | 4 ++-- website/docs/engine-flink/delta-joins.md | 4 ++-- website/docs/engine-flink/options.md | 2 +- website/docs/engine-flink/procedures.md | 2 +- .../maintenance/operations/graceful-shutdown.md | 4 ++-- .../maintenance/operations/updating-configs.md | 2 +- website/docs/streaming-lakehouse/overview.md | 4 ++-- .../docs/table-design/merge-engines/aggregation.md | 10 +++++----- website/docs/table-design/merge-engines/default.md | 2 +- website/docs/table-design/overview.md | 4 ++-- .../table-types/{pk-table/index.md => pk-table.md} | 14 +++++++------- .../table-types/pk-table/_category_.json | 4 ---- 13 files changed, 27 insertions(+), 31 deletions(-) rename website/docs/table-design/table-types/{pk-table/index.md => pk-table.md} (91%) delete mode 100644 website/docs/table-design/table-types/pk-table/_category_.json diff --git a/website/docs/concepts/architecture.md b/website/docs/concepts/architecture.md index 74f616fa77..54f8bd6aa8 100644 --- a/website/docs/concepts/architecture.md +++ b/website/docs/concepts/architecture.md @@ -6,7 +6,7 @@ sidebar_position: 1 # Architecture A Fluss cluster consists of two main processes: the **CoordinatorServer** and the **TabletServer**. -![Fluss Architecture](../assets/architecture.png) +![Fluss Architecture](assets/architecture.png) ## CoordinatorServer The **CoordinatorServer** serves as the central control and management component of the cluster. It is responsible for maintaining metadata, managing tablet allocation, listing nodes, and handling permissions. diff --git a/website/docs/engine-flink/ddl.md b/website/docs/engine-flink/ddl.md index 324c753050..b333a14001 100644 --- a/website/docs/engine-flink/ddl.md +++ b/website/docs/engine-flink/ddl.md @@ -27,7 +27,7 @@ The following properties can be set if using the Fluss catalog: | bootstrap.servers | required | (none) | Comma separated list of Fluss servers. | | default-database | optional | fluss | The default database to use when switching to this catalog. | | client.security.protocol | optional | PLAINTEXT | The security protocol used to communicate with brokers. Currently, only `PLAINTEXT` and `SASL` are supported, the configuration value is case insensitive. | -| `client.security.{protocol}.*` | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](../security/authentication.md) | +| `client.security.{protocol}.*` | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](security/authentication.md) | | `{lake-format}.*` | optional | (none) | Extra properties to be passed to the lake catalog. This is useful for configuring sensitive settings, such as the username and password required for lake catalog authentication. E.g., `paimon.jdbc.password = pass`. | The following statements assume that the current catalog has been switched to the Fluss catalog using the `USE CATALOG ` statement. @@ -62,7 +62,7 @@ DROP DATABASE my_db; ### Primary Key Table -The following SQL statement will create a [Primary Key Table](table-design/table-types/pk-table/index.md) with a primary key consisting of shop_id and user_id. +The following SQL statement will create a [Primary Key Table](table-design/table-types/pk-table.md) with a primary key consisting of shop_id and user_id. ```sql title="Flink SQL" CREATE TABLE my_pk_table ( shop_id BIGINT, diff --git a/website/docs/engine-flink/delta-joins.md b/website/docs/engine-flink/delta-joins.md index deeb9ed74f..5a20ef5a67 100644 --- a/website/docs/engine-flink/delta-joins.md +++ b/website/docs/engine-flink/delta-joins.md @@ -19,7 +19,7 @@ Starting with **Apache Fluss 0.8**, streaming join jobs running on **Flink 2.1 o Traditional streaming joins in Flink require maintaining both input sides entirely in state to match records across streams. Delta join, by contrast, uses a **index-key lookup mechanism** to transform the behavior of querying data from the state into querying data from the Fluss source table, thereby avoiding redundant storage of the same data in both the Fluss source table and the state. This drastically reduces state size and improves performance for many streaming analytics and enrichment workloads. -![](../assets/delta_join.jpg) +![](assets/delta_join.jpg) ## Example: Delta Join in Flink 2.1 @@ -130,7 +130,7 @@ For example: - Full primary key: `(city_id, order_id)` - Bucket key: `city_id` -This yields an **index** on the prefix key `city_id`, so that you can perform [Prefix Key Lookup](/docs/engine-flink/lookups/#prefix-lookup) by the `city_id`. +This yields an **index** on the prefix key `city_id`, so that you can perform [Prefix Key Lookup](engine-flink/lookups.md#prefix-lookup) by the `city_id`. In this setup: * The delta join operator uses the prefix key (`city_id`) to retrieve only relevant right-side records matching each left-side event. diff --git a/website/docs/engine-flink/options.md b/website/docs/engine-flink/options.md index 1d1d48cba6..3c06c3ccbe 100644 --- a/website/docs/engine-flink/options.md +++ b/website/docs/engine-flink/options.md @@ -157,4 +157,4 @@ See more details about [ALTER TABLE ... SET](engine-flink/ddl.md#set-properties) | client.filesystem.security.token.renewal.time-ratio | Double | 0.75 | Ratio of the token's expiration time when new credentials for access filesystem should be re-obtained. | | client.metrics.enabled | Boolean | false | Enable metrics for client. When metrics is enabled, the client will collect metrics and report by the JMX metrics reporter. | | client.security.protocol | String | PLAINTEXT | The security protocol used to communicate with brokers. Currently, only `PLAINTEXT` and `SASL` are supported, the configuration value is case insensitive. | -| client.security.\{protocol\}.* | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](../security/authentication.md) | +| client.security.\{protocol\}.* | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](security/authentication.md) | diff --git a/website/docs/engine-flink/procedures.md b/website/docs/engine-flink/procedures.md index ac63113346..2dfda96d8d 100644 --- a/website/docs/engine-flink/procedures.md +++ b/website/docs/engine-flink/procedures.md @@ -18,7 +18,7 @@ SHOW PROCEDURES; ## Access Control Procedures -Fluss provides procedures to manage Access Control Lists (ACLs) for security and authorization. See the [Security](../security/overview.md) documentation for more details. +Fluss provides procedures to manage Access Control Lists (ACLs) for security and authorization. See the [Security](security/overview.md) documentation for more details. ### add_acl diff --git a/website/docs/maintenance/operations/graceful-shutdown.md b/website/docs/maintenance/operations/graceful-shutdown.md index feb4120cb5..423409fc79 100644 --- a/website/docs/maintenance/operations/graceful-shutdown.md +++ b/website/docs/maintenance/operations/graceful-shutdown.md @@ -131,6 +131,6 @@ Monitor shutdown-related metrics: ## See Also -- [Configuration](../configuration.md) -- [Monitoring and Observability](../observability/monitor-metrics.md) +- [Configuration](maintenance/configuration.md) +- [Monitoring and Observability](maintenance/observability/monitor-metrics.md) - [Upgrading Fluss](upgrading.md) \ No newline at end of file diff --git a/website/docs/maintenance/operations/updating-configs.md b/website/docs/maintenance/operations/updating-configs.md index 8b7a6a1e10..22cc77fa95 100644 --- a/website/docs/maintenance/operations/updating-configs.md +++ b/website/docs/maintenance/operations/updating-configs.md @@ -18,7 +18,7 @@ Currently, the supported dynamically updatable server configurations include: - `kv.rocksdb.shared-rate-limiter.bytes-per-sec`: Control RocksDB flush and compaction write rate shared across all RocksDB instances on the TabletServer. The rate limiter is always enabled. Set to a lower value (e.g., 100MB) to limit the rate, or a very high value to effectively disable rate limiting. -You can update the configuration of a cluster with [Java client](#using-java-client) or [Flink Procedures](../../engine-flink/procedures.md#cluster-configuration-procedures). +You can update the configuration of a cluster with [Java client](#using-java-client) or [Flink Procedures](engine-flink/procedures.md#cluster-configuration-procedures). ### Using Java Client diff --git a/website/docs/streaming-lakehouse/overview.md b/website/docs/streaming-lakehouse/overview.md index 1b6f9088f5..2fcf388753 100644 --- a/website/docs/streaming-lakehouse/overview.md +++ b/website/docs/streaming-lakehouse/overview.md @@ -32,7 +32,7 @@ To build a Streaming Lakehouse, Fluss maintains a tiering service that compacts The data in the Fluss cluster, stored in streaming Arrow format, is optimized for low-latency read and write operations, making it ideal for short-term data storage. In contrast, the compacted data in the Lakehouse, stored in Parquet format with higher compression, is optimized for efficient analytics and long-term storage. The data in the Fluss cluster serves as a real-time data layer, retaining days of data with sub-second-level freshness. In contrast, the data in the Lakehouse serves as a historical data layer, retaining months of data with minute-level freshness. -![streamhouse](../assets/streamhouse.png) +![streamhouse](assets/streamhouse.png) The core idea of Streaming Lakehouse is shared data and shared metadata between stream and Lakehouse, avoiding data duplication and metadata inconsistency. Some powerful features it provides are: @@ -43,4 +43,4 @@ Some powerful features it provides are: - **Analytical Streams**: The union reads help data streams to have the powerful analytics capabilities. This reduces complexity when developing streaming applications, simplifies debugging, and allows for immediate access to live data insights. - **Connect to Lakehouse Ecosystem**: Fluss keeps the table metadata in sync with data lake catalogs while compacting data into Lakehouse. As a result, external engines like Spark, StarRocks, Flink, and Trino can read the data directly. They simply connect to the data lake catalog. -Currently, Fluss supports [Paimon](integrate-data-lakes/paimon.md), [Iceberg](integrate-data-lakes/iceberg.md), and [Lance](integrate-data-lakes/lance.md) as Lakehouse Storage, more kinds of data lake formats are on the roadmap. +Currently, Fluss supports [Paimon](streaming-lakehouse/integrate-data-lakes/paimon.md), [Iceberg](streaming-lakehouse/integrate-data-lakes/iceberg.md), and [Lance](streaming-lakehouse/integrate-data-lakes/lance.md) as Lakehouse Storage, more kinds of data lake formats are on the roadmap. diff --git a/website/docs/table-design/merge-engines/aggregation.md b/website/docs/table-design/merge-engines/aggregation.md index 2c8a983478..c416c8a399 100644 --- a/website/docs/table-design/merge-engines/aggregation.md +++ b/website/docs/table-design/merge-engines/aggregation.md @@ -1012,8 +1012,8 @@ For detailed information about Exactly-Once implementation, please refer to: [FI ## See Also -- [Default Merge Engine](./default.md) -- [FirstRow Merge Engine](./first-row.md) -- [Versioned Merge Engine](./versioned.md) -- [Primary Key Tables](../table-types/pk-table/index.md) -- [Fluss Client API](../../apis/java-client.md) +- [Default Merge Engine](table-design/merge-engines/default.md) +- [FirstRow Merge Engine](table-design/merge-engines/first-row.md) +- [Versioned Merge Engine](table-design/merge-engines/versioned.md) +- [Primary Key Tables](table-design/table-types/pk-table.md) +- [Fluss Client API](apis/java-client.md) diff --git a/website/docs/table-design/merge-engines/default.md b/website/docs/table-design/merge-engines/default.md index ffe2d1b1e8..d4bc4c8c65 100644 --- a/website/docs/table-design/merge-engines/default.md +++ b/website/docs/table-design/merge-engines/default.md @@ -9,7 +9,7 @@ sidebar_position: 2 ## Overview The **Default Merge Engine** behaves as a LastRow merge engine that retains the latest record for a given primary key. It supports all the operations: `INSERT`, `UPDATE`, `DELETE`. -Additionally, the default merge engine supports [Partial Update](../table-types/pk-table/index.md#partial-update), which preserves the latest values for the specified update columns. +Additionally, the default merge engine supports [Partial Update](table-design/table-types/pk-table.md#partial-update), which preserves the latest values for the specified update columns. If the `'table.merge-engine'` property is not explicitly defined in the table properties when creating a Primary Key Table, the default merge engine will be applied automatically. diff --git a/website/docs/table-design/overview.md b/website/docs/table-design/overview.md index 4afb74ef16..3cf177c99a 100644 --- a/website/docs/table-design/overview.md +++ b/website/docs/table-design/overview.md @@ -20,7 +20,7 @@ Tables are classified into two types based on the presence of a primary key: - Used for updating and managing data in business databases. - Support INSERT, UPDATE, and DELETE operations based on the defined primary key. -A Table becomes a [Partitioned Table](data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations: +A Table becomes a [Partitioned Table](table-design/data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations: - **For Log Tables**, partitioning is commonly used for log data, typically based on date columns, to facilitate data separation and cleaning. - **For Primary Key Tables**, the partition column must be a subset of the primary key to ensure uniqueness. @@ -28,7 +28,7 @@ This design ensures efficient data organization, flexibility in handling differe ## Table Data Organization -![Table Data Organization](../assets/data_organization.png) +![Table Data Organization](assets/data_organization.png) ### Partition diff --git a/website/docs/table-design/table-types/pk-table/index.md b/website/docs/table-design/table-types/pk-table.md similarity index 91% rename from website/docs/table-design/table-types/pk-table/index.md rename to website/docs/table-design/table-types/pk-table.md index d9ab0ba350..b6adc23a2e 100644 --- a/website/docs/table-design/table-types/pk-table/index.md +++ b/website/docs/table-design/table-types/pk-table.md @@ -82,10 +82,10 @@ However, users can specify a different merge engine to customize the merging beh The following merge engines are supported: -1. [Default Merge Engine (LastRow)](../../merge-engines/default.md) -2. [FirstRow Merge Engine](../../merge-engines/first-row.md) -3. [Versioned Merge Engine](../../merge-engines/versioned.md) -4. [Aggregation Merge Engine](../../merge-engines/aggregation.md) +1. [Default Merge Engine (LastRow)](table-design/merge-engines/default.md) +2. [FirstRow Merge Engine](table-design/merge-engines/first-row.md) +3. [Versioned Merge Engine](table-design/merge-engines/versioned.md) +4. [Aggregation Merge Engine](table-design/merge-engines/aggregation.md) ## Changelog Generation @@ -147,13 +147,13 @@ For primary key tables, Fluss supports various kinds of querying abilities. For a primary key table, the default read method is a full snapshot followed by incremental data. First, the snapshot data of the table is consumed, followed by the changelog data of the table. -It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](../../../engine-flink/reads.md) +It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](engine-flink/reads.md) ### Lookup -Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. It is always used in [Flink Lookup Join](../../../engine-flink/lookups.md#lookup). +Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. It is always used in [Flink Lookup Join](engine-flink/lookups.md#lookup). ### Prefix Lookup Fluss primary key table can also do prefix lookup by the prefix subset primary keys. Unlike lookup, prefix lookup -will scan data based on the prefix of primary keys and may return multiple rows. It is always used in [Flink Prefix Lookup Join](../../../engine-flink/lookups.md#prefix-lookup). +will scan data based on the prefix of primary keys and may return multiple rows. It is always used in [Flink Prefix Lookup Join](engine-flink/lookups.md#prefix-lookup). diff --git a/website/docs/table-design/table-types/pk-table/_category_.json b/website/docs/table-design/table-types/pk-table/_category_.json deleted file mode 100644 index 7374558c6a..0000000000 --- a/website/docs/table-design/table-types/pk-table/_category_.json +++ /dev/null @@ -1,4 +0,0 @@ -{ - "label": "Primary Key Table", - "position": 1 -} From a51c0b9ea34ead7bba79ad581d7bbfd1fdfba4af Mon Sep 17 00:00:00 2001 From: Aditya41150 Date: Mon, 5 Jan 2026 14:21:12 +0530 Subject: [PATCH 4/4] docs: fix broken image and documentation links - Fix all image paths to use correct relative paths (../assets/) instead of root-relative paths - Fix all internal documentation links to use root-relative paths (/path/to/doc.md) as per Fluss documentation guidelines - Affected files: - concepts/architecture.md: Fixed architecture.png path - engine-flink/delta-joins.md: Fixed delta_join.jpg path and lookups.md link - engine-flink/procedures.md: Fixed security/overview.md link - streaming-lakehouse/overview.md: Fixed streamhouse.png path - table-design/overview.md: Fixed data_organization.png path and partitioning.md link - table-design/table-types/pk-table.md: Fixed all merge engine links and Flink documentation links --- website/docs/concepts/architecture.md | 2 +- website/docs/engine-flink/delta-joins.md | 4 ++-- website/docs/engine-flink/procedures.md | 2 +- website/docs/streaming-lakehouse/overview.md | 2 +- website/docs/table-design/overview.md | 4 ++-- .../docs/table-design/table-types/pk-table.md | 16 ++++++++-------- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/website/docs/concepts/architecture.md b/website/docs/concepts/architecture.md index 54f8bd6aa8..74f616fa77 100644 --- a/website/docs/concepts/architecture.md +++ b/website/docs/concepts/architecture.md @@ -6,7 +6,7 @@ sidebar_position: 1 # Architecture A Fluss cluster consists of two main processes: the **CoordinatorServer** and the **TabletServer**. -![Fluss Architecture](assets/architecture.png) +![Fluss Architecture](../assets/architecture.png) ## CoordinatorServer The **CoordinatorServer** serves as the central control and management component of the cluster. It is responsible for maintaining metadata, managing tablet allocation, listing nodes, and handling permissions. diff --git a/website/docs/engine-flink/delta-joins.md b/website/docs/engine-flink/delta-joins.md index 5a20ef5a67..96f0974963 100644 --- a/website/docs/engine-flink/delta-joins.md +++ b/website/docs/engine-flink/delta-joins.md @@ -19,7 +19,7 @@ Starting with **Apache Fluss 0.8**, streaming join jobs running on **Flink 2.1 o Traditional streaming joins in Flink require maintaining both input sides entirely in state to match records across streams. Delta join, by contrast, uses a **index-key lookup mechanism** to transform the behavior of querying data from the state into querying data from the Fluss source table, thereby avoiding redundant storage of the same data in both the Fluss source table and the state. This drastically reduces state size and improves performance for many streaming analytics and enrichment workloads. -![](assets/delta_join.jpg) +![](../assets/delta_join.jpg) ## Example: Delta Join in Flink 2.1 @@ -130,7 +130,7 @@ For example: - Full primary key: `(city_id, order_id)` - Bucket key: `city_id` -This yields an **index** on the prefix key `city_id`, so that you can perform [Prefix Key Lookup](engine-flink/lookups.md#prefix-lookup) by the `city_id`. +This yields an **index** on the prefix key `city_id`, so that you can perform [Prefix Key Lookup](/engine-flink/lookups.md#prefix-lookup) by the `city_id`. In this setup: * The delta join operator uses the prefix key (`city_id`) to retrieve only relevant right-side records matching each left-side event. diff --git a/website/docs/engine-flink/procedures.md b/website/docs/engine-flink/procedures.md index 2dfda96d8d..95e8724eb0 100644 --- a/website/docs/engine-flink/procedures.md +++ b/website/docs/engine-flink/procedures.md @@ -18,7 +18,7 @@ SHOW PROCEDURES; ## Access Control Procedures -Fluss provides procedures to manage Access Control Lists (ACLs) for security and authorization. See the [Security](security/overview.md) documentation for more details. +Fluss provides procedures to manage Access Control Lists (ACLs) for security and authorization. See the [Security](/security/overview.md) documentation for more details. ### add_acl diff --git a/website/docs/streaming-lakehouse/overview.md b/website/docs/streaming-lakehouse/overview.md index 2fcf388753..626c6ae0ef 100644 --- a/website/docs/streaming-lakehouse/overview.md +++ b/website/docs/streaming-lakehouse/overview.md @@ -32,7 +32,7 @@ To build a Streaming Lakehouse, Fluss maintains a tiering service that compacts The data in the Fluss cluster, stored in streaming Arrow format, is optimized for low-latency read and write operations, making it ideal for short-term data storage. In contrast, the compacted data in the Lakehouse, stored in Parquet format with higher compression, is optimized for efficient analytics and long-term storage. The data in the Fluss cluster serves as a real-time data layer, retaining days of data with sub-second-level freshness. In contrast, the data in the Lakehouse serves as a historical data layer, retaining months of data with minute-level freshness. -![streamhouse](assets/streamhouse.png) +![streamhouse](../assets/streamhouse.png) The core idea of Streaming Lakehouse is shared data and shared metadata between stream and Lakehouse, avoiding data duplication and metadata inconsistency. Some powerful features it provides are: diff --git a/website/docs/table-design/overview.md b/website/docs/table-design/overview.md index 3cf177c99a..99997d244c 100644 --- a/website/docs/table-design/overview.md +++ b/website/docs/table-design/overview.md @@ -20,7 +20,7 @@ Tables are classified into two types based on the presence of a primary key: - Used for updating and managing data in business databases. - Support INSERT, UPDATE, and DELETE operations based on the defined primary key. -A Table becomes a [Partitioned Table](table-design/data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations: +A Table becomes a [Partitioned Table](/table-design/data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations: - **For Log Tables**, partitioning is commonly used for log data, typically based on date columns, to facilitate data separation and cleaning. - **For Primary Key Tables**, the partition column must be a subset of the primary key to ensure uniqueness. @@ -28,7 +28,7 @@ This design ensures efficient data organization, flexibility in handling differe ## Table Data Organization -![Table Data Organization](assets/data_organization.png) +![Table Data Organization](../assets/data_organization.png) ### Partition diff --git a/website/docs/table-design/table-types/pk-table.md b/website/docs/table-design/table-types/pk-table.md index b6adc23a2e..7331d76198 100644 --- a/website/docs/table-design/table-types/pk-table.md +++ b/website/docs/table-design/table-types/pk-table.md @@ -31,7 +31,7 @@ In Fluss primary key table, each row of data has a unique primary key. If multiple entries with the same primary key are written to the Fluss primary key table, only the last entry will be retained. -For [Partitioned Primary Key Table](table-design/data-distribution/partitioning.md), the primary key must contain the +For [Partitioned Primary Key Table](/table-design/data-distribution/partitioning.md), the primary key must contain the partition key. ## Bucket Assigning @@ -82,10 +82,10 @@ However, users can specify a different merge engine to customize the merging beh The following merge engines are supported: -1. [Default Merge Engine (LastRow)](table-design/merge-engines/default.md) -2. [FirstRow Merge Engine](table-design/merge-engines/first-row.md) -3. [Versioned Merge Engine](table-design/merge-engines/versioned.md) -4. [Aggregation Merge Engine](table-design/merge-engines/aggregation.md) +1. [Default Merge Engine (LastRow)](/table-design/merge-engines/default.md) +2. [FirstRow Merge Engine](/table-design/merge-engines/first-row.md) +3. [Versioned Merge Engine](/table-design/merge-engines/versioned.md) +4. [Aggregation Merge Engine](/table-design/merge-engines/aggregation.md) ## Changelog Generation @@ -147,13 +147,13 @@ For primary key tables, Fluss supports various kinds of querying abilities. For a primary key table, the default read method is a full snapshot followed by incremental data. First, the snapshot data of the table is consumed, followed by the changelog data of the table. -It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](engine-flink/reads.md) +It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](/engine-flink/reads.md) ### Lookup -Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. It is always used in [Flink Lookup Join](engine-flink/lookups.md#lookup). +Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. It is always used in [Flink Lookup Join](/engine-flink/lookups.md#lookup). ### Prefix Lookup Fluss primary key table can also do prefix lookup by the prefix subset primary keys. Unlike lookup, prefix lookup -will scan data based on the prefix of primary keys and may return multiple rows. It is always used in [Flink Prefix Lookup Join](engine-flink/lookups.md#prefix-lookup). +will scan data based on the prefix of primary keys and may return multiple rows. It is always used in [Flink Prefix Lookup Join](/engine-flink/lookups.md#prefix-lookup).