From afc97927eb02c595fb080678e030afafc231269c Mon Sep 17 00:00:00 2001 From: Priyamanjare54 <163539431+Priyamanjare54@users.noreply.github.com> Date: Sat, 27 Dec 2025 12:54:15 +0530 Subject: [PATCH 1/7] [docs] Document COMPACTED table format --- website/docs/table-design/compacted-format.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 website/docs/table-design/compacted-format.md diff --git a/website/docs/table-design/compacted-format.md b/website/docs/table-design/compacted-format.md new file mode 100644 index 0000000000..e69de29bb2 From 677e242b6e86d85fd31c3f4cdb0ed961799f614f Mon Sep 17 00:00:00 2001 From: Priyamanjare54 <163539431+Priyamanjare54@users.noreply.github.com> Date: Tue, 30 Dec 2025 18:31:24 +0530 Subject: [PATCH 2/7] [docs] Add detailed explanation for COMPACTED format --- website/docs/table-design/compacted-format.md | 72 +++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/website/docs/table-design/compacted-format.md b/website/docs/table-design/compacted-format.md index e69de29bb2..309dbcf9c6 100644 --- a/website/docs/table-design/compacted-format.md +++ b/website/docs/table-design/compacted-format.md @@ -0,0 +1,72 @@ +--- +title: COMPACTED Table Format +--- + +## Overview + +The COMPACTED table format is designed for key-based workloads where only the +latest value per key is required. It is supported by both Log tables and KV +tables. + +In COMPACTED format, older records with the same key are compacted away, +resulting in a lightweight and efficient storage layout. + +--- + +## Supported Table Types + +- Log tables +- KV tables + +--- + +## Configuration + +The COMPACTED format can be enabled using the following table options: + +```sql +CREATE TABLE kv_store ( + k STRING, + v STRING, + PRIMARY KEY (k) NOT ENFORCED +) WITH ( + 'table.format' = 'COMPACTED', + 'table.changelog.image' = 'WAL' +); + +--- + +## COMPACTED with WAL Changelog Image + +When combined with `table.changelog.image = WAL`, the COMPACTED format enables +an efficient and lightweight KV store. + +In this mode: + +- Only the latest value per key is stored +- Previous values do not need to be looked up +- Records are not deserialized into an intermediate log format + +This reduces both **latency** and **CPU overhead**, and is especially suitable +for internal and system-level tables. + +--- + +## Use Cases + +The COMPACTED format is well suited for: + +- System tables +- Metadata storage +- Lookup-heavy KV workloads +- Tables that do not require full changelog reads + +--- + +## Limitations + +The COMPACTED format is not recommended when: + +- Full changelog history needs to be read +- Column projections are required +- Historical state changes must be preserved \ No newline at end of file From 2a8b6eb39a2b5fe0122fb7782d8289d293f04cb6 Mon Sep 17 00:00:00 2001 From: Priyamanjare54 <163539431+Priyamanjare54@users.noreply.github.com> Date: Wed, 31 Dec 2025 18:34:46 +0530 Subject: [PATCH 3/7] [docs] Add Data Encodings documentation --- website/docs/table-design/data-encodings.md | 125 ++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 website/docs/table-design/data-encodings.md diff --git a/website/docs/table-design/data-encodings.md b/website/docs/table-design/data-encodings.md new file mode 100644 index 0000000000..2ee2d5d64a --- /dev/null +++ b/website/docs/table-design/data-encodings.md @@ -0,0 +1,125 @@ +--- +title: Data Encodings +--- + +## Overview + +Fluss supports multiple **data encodings** to optimize storage layout and access +patterns for different workloads. Each encoding represents a set of trade-offs +between storage efficiency, read performance, and supported access patterns. + +Encodings can be applied to different table types, such as **Log tables** and +**KV tables**, depending on the encoding’s capabilities and design goals. + +This page provides an overview of the encoding landscape in Fluss and guidance +on when to choose a particular encoding. + +--- + +## Log Encoding vs KV Encoding + +Encodings in Fluss can be used in two different contexts: + +- **Log encoding** focuses on efficient sequential reads and streaming + consumption. It is commonly used for append-only or changelog-style access + patterns. +- **KV encoding** optimizes for key-based access patterns, such as point lookups + and updates, where only the latest value per key is relevant. + +Not all encodings support both table types. The following sections describe the +supported encodings and their applicable use cases. + +--- + +## ARROW Encoding + +### Characteristics + +ARROW is a **columnar encoding** designed for analytical and streaming workloads. +It can **only be used as a log encoding**. + +By storing data in a columnar layout, ARROW enables efficient column access and +better CPU cache utilization. It also integrates well with Arrow-based systems. + +### Primary Use Cases + +- Streaming analytical workloads +- Queries that benefit from column projection +- Integration with Arrow-based processing frameworks + +### Trade-offs + +- Not supported for KV tables +- Not optimized for key-based point lookups + +--- + +## COMPACTED Encoding + +### Characteristics + +COMPACTED is a **row-oriented encoding** that leverages variable-length coding to +minimize binary data size. It is designed for key-based workloads where only the +**latest value per key** is required. + +COMPACTED can be used as both a **log encoding** and a **KV encoding**. + +### Supported Table Types + +- Log tables +- KV tables + +### Configuration + +The COMPACTED encoding can be enabled using the following table options: + +```sql +CREATE TABLE kv_store ( + k STRING, + v STRING, + PRIMARY KEY (k) NOT ENFORCED +) WITH ( + 'table.format' = 'COMPACTED', + 'table.changelog.image' = 'WAL' +); + +### COMPACTED with WAL Changelog Image + +When combined with `table.changelog.image = WAL`, the COMPACTED encoding enables +an efficient and lightweight KV store optimized for key-based access patterns. + +In this mode: + +- Only the latest value for each key is stored +- Previous values do not need to be looked up +- Records are not deserialized into an intermediate log format + +As a result, this approach reduces both **latency** and **CPU overhead**, making +it especially suitable for internal or system-level tables. + +### Primary Use Cases: + +- System tables +- Internal metadata storage +- Lookup-heavy KV workloads +- Use cases that do not require full changelog reads + +### Trade-offs and Limitations + +The COMPACTED encoding is not recommended when: + +- Full changelog history needs to be read +- Column projection is required +- Historical state changes must be preserved + +## INDEXED Encoding (Deprecated) + +### Characteristics + +INDEXED is a **row-oriented encoding** that supports both log and KV tables. +However, it is **deprecated** and should not be used for new applications. + +### Recommendation + +Existing users may continue to rely on INDEXED, but new tables should prefer +COMPACTED or ARROW, depending on workload characteristics. From bc5e50aabacfc11ffaedb0484ff5f90df9bf0d30 Mon Sep 17 00:00:00 2001 From: Priyamanjare54 <163539431+Priyamanjare54@users.noreply.github.com> Date: Wed, 31 Dec 2025 18:39:07 +0530 Subject: [PATCH 4/7] [docs] Remove standalone COMPACTED doc in favor of Data Encodings page --- website/docs/table-design/compacted-format.md | 72 ------------------- 1 file changed, 72 deletions(-) delete mode 100644 website/docs/table-design/compacted-format.md diff --git a/website/docs/table-design/compacted-format.md b/website/docs/table-design/compacted-format.md deleted file mode 100644 index 309dbcf9c6..0000000000 --- a/website/docs/table-design/compacted-format.md +++ /dev/null @@ -1,72 +0,0 @@ ---- -title: COMPACTED Table Format ---- - -## Overview - -The COMPACTED table format is designed for key-based workloads where only the -latest value per key is required. It is supported by both Log tables and KV -tables. - -In COMPACTED format, older records with the same key are compacted away, -resulting in a lightweight and efficient storage layout. - ---- - -## Supported Table Types - -- Log tables -- KV tables - ---- - -## Configuration - -The COMPACTED format can be enabled using the following table options: - -```sql -CREATE TABLE kv_store ( - k STRING, - v STRING, - PRIMARY KEY (k) NOT ENFORCED -) WITH ( - 'table.format' = 'COMPACTED', - 'table.changelog.image' = 'WAL' -); - ---- - -## COMPACTED with WAL Changelog Image - -When combined with `table.changelog.image = WAL`, the COMPACTED format enables -an efficient and lightweight KV store. - -In this mode: - -- Only the latest value per key is stored -- Previous values do not need to be looked up -- Records are not deserialized into an intermediate log format - -This reduces both **latency** and **CPU overhead**, and is especially suitable -for internal and system-level tables. - ---- - -## Use Cases - -The COMPACTED format is well suited for: - -- System tables -- Metadata storage -- Lookup-heavy KV workloads -- Tables that do not require full changelog reads - ---- - -## Limitations - -The COMPACTED format is not recommended when: - -- Full changelog history needs to be read -- Column projections are required -- Historical state changes must be preserved \ No newline at end of file From dc8bf90994d399be11be6087d6ff31c0b597c1d6 Mon Sep 17 00:00:00 2001 From: Priyamanjare54 <163539431+Priyamanjare54@users.noreply.github.com> Date: Sun, 4 Jan 2026 11:54:33 +0530 Subject: [PATCH 5/7] docs: updated Data Encodings page --- website/docs/table-design/data-encodings.md | 157 +++++++++----------- 1 file changed, 72 insertions(+), 85 deletions(-) diff --git a/website/docs/table-design/data-encodings.md b/website/docs/table-design/data-encodings.md index 2ee2d5d64a..40d375ebad 100644 --- a/website/docs/table-design/data-encodings.md +++ b/website/docs/table-design/data-encodings.md @@ -4,122 +4,109 @@ title: Data Encodings ## Overview -Fluss supports multiple **data encodings** to optimize storage layout and access -patterns for different workloads. Each encoding represents a set of trade-offs -between storage efficiency, read performance, and supported access patterns. +Fluss supports multiple **data encodings** to optimize how data is stored and read for different workloads. Each encoding is designed to +balance storage efficiency, read performance, and query capabilities. -Encodings can be applied to different table types, such as **Log tables** and -**KV tables**, depending on the encoding’s capabilities and design goals. - -This page provides an overview of the encoding landscape in Fluss and guidance -on when to choose a particular encoding. +This page describes the available encodings in Fluss and provides guidance on selecting the appropriate encoding based on workload characteristics. --- -## Log Encoding vs KV Encoding +## Log Encoding and KV Encoding -Encodings in Fluss can be used in two different contexts: +In Fluss, data encodings can be used in two different ways, depending on how the data is accessed. -- **Log encoding** focuses on efficient sequential reads and streaming - consumption. It is commonly used for append-only or changelog-style access - patterns. -- **KV encoding** optimizes for key-based access patterns, such as point lookups - and updates, where only the latest value per key is relevant. +- **Log encoding** is designed for reading data in order, as it is written. + It is commonly used for streaming workloads, append-only tables, and changelog-style data. -Not all encodings support both table types. The following sections describe the -supported encodings and their applicable use cases. +- **KV encoding** is designed for accessing data by key. + It is used for workloads where queries look up or update values using a key and only the most recent value for each key is needed. ---- +ARROW can be used as log encoding, while COMPACTED supports both. -## ARROW Encoding +## ARROW Encoding (Default) -### Characteristics +### Overview -ARROW is a **columnar encoding** designed for analytical and streaming workloads. -It can **only be used as a log encoding**. +ARROW is the **default encoding** in Fluss. It stores data in a columnar layout, organizing information by columns rather than rows. This layout is well suited for analytical and streaming workloads. -By storing data in a columnar layout, ARROW enables efficient column access and -better CPU cache utilization. It also integrates well with Arrow-based systems. +### Key Features -### Primary Use Cases +- **Column pruning**: Reads only the columns required by a query +- **Predicate pushdown**: Applies filters efficiently at the storage layer +- **Arrow ecosystem integration**: Compatible with Arrow-based processing frameworks -- Streaming analytical workloads -- Queries that benefit from column projection -- Integration with Arrow-based processing frameworks +### When to Use ARROW -### Trade-offs +ARROW is recommended for: +- Analytical queries that access a subset of columns +- Streaming workloads with selective column reads +- General-purpose tables with varying query patterns +- Workloads that benefit from predicate pushdown -- Not supported for KV tables -- Not optimized for key-based point lookups +### ARROW Trade-offs ---- +ARROW is less efficient for workloads that: +- Always read all columns +- Workloads that mostly access individual rows by key + +--- ## COMPACTED Encoding -### Characteristics +### Overview -COMPACTED is a **row-oriented encoding** that leverages variable-length coding to -minimize binary data size. It is designed for key-based workloads where only the -**latest value per key** is required. +COMPACTED uses a **row-oriented encoding** that focuses on reducing storage size and CPU usage. It is optimized for workloads where queries typically access entire rows rather than individual columns. -COMPACTED can be used as both a **log encoding** and a **KV encoding**. +### Key Features -### Supported Table Types +- **Reduced storage overhead**: Variable-length encoding minimizes disk usage +- **Lower CPU overhead**: Efficient when all columns are accessed together +- **Row-oriented access**: Optimized for full-row reads +- **Key-value support**: Can be configured for key-based access patterns -- Log tables -- KV tables +### When to Use COMPACTED + +COMPACTED is recommended for: +- Tables where queries usually select all columns +- Large vector or embedding tables +- Pre-aggregated results or materialized views +- Denormalized or joined tables +- Workloads that prioritize storage efficiency over selective column access + +--- -### Configuration +## Configuration -The COMPACTED encoding can be enabled using the following table options: +To enable the COMPACTED encoding, set the `table.format` option: ```sql -CREATE TABLE kv_store ( - k STRING, - v STRING, - PRIMARY KEY (k) NOT ENFORCED +CREATE TABLE my_table ( + id BIGINT, + data STRING, + PRIMARY KEY (id) NOT ENFORCED ) WITH ( - 'table.format' = 'COMPACTED', - 'table.changelog.image' = 'WAL' + 'table.format' = 'COMPACTED' ); +``` ### COMPACTED with WAL Changelog Image -When combined with `table.changelog.image = WAL`, the COMPACTED encoding enables -an efficient and lightweight KV store optimized for key-based access patterns. - -In this mode: - -- Only the latest value for each key is stored -- Previous values do not need to be looked up -- Records are not deserialized into an intermediate log format - -As a result, this approach reduces both **latency** and **CPU overhead**, making -it especially suitable for internal or system-level tables. - -### Primary Use Cases: +For key-based workloads that only require the **latest value per key**, the COMPACTED encoding can be combined with the WAL changelog image mode. -- System tables -- Internal metadata storage -- Lookup-heavy KV workloads -- Use cases that do not require full changelog reads - -### Trade-offs and Limitations - -The COMPACTED encoding is not recommended when: - -- Full changelog history needs to be read -- Column projection is required -- Historical state changes must be preserved - -## INDEXED Encoding (Deprecated) - -### Characteristics - -INDEXED is a **row-oriented encoding** that supports both log and KV tables. -However, it is **deprecated** and should not be used for new applications. - -### Recommendation - -Existing users may continue to rely on INDEXED, but new tables should prefer -COMPACTED or ARROW, depending on workload characteristics. +```sql +CREATE TABLE kv_table ( + key STRING, + value STRING, + PRIMARY KEY (key) NOT ENFORCED +) WITH ( + 'table.format' = 'COMPACTED', + 'table.changelog.image' = 'WAL' +); +``` +### COMPACTED Trade-offs + +COMPACTED is not recommended when: +- Queries need to read only a few columns from a table +- Filters are applied to reduce the amount of data read +- Analytical workloads require flexible access to individual columns +- Historical changes or full changelog data must be preserved From b1961447b72dd57090991f71f3dc047132bc6689 Mon Sep 17 00:00:00 2001 From: Priya Manjare <163539431+Priyamanjare54@users.noreply.github.com> Date: Mon, 5 Jan 2026 10:40:57 +0000 Subject: [PATCH 6/7] docs: improve Data Encodings overview --- website/docs/table-design/data-encodings.md | 36 ++++++++++++++++++--- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/website/docs/table-design/data-encodings.md b/website/docs/table-design/data-encodings.md index 40d375ebad..1b373642fd 100644 --- a/website/docs/table-design/data-encodings.md +++ b/website/docs/table-design/data-encodings.md @@ -2,10 +2,22 @@ title: Data Encodings --- -## Overview +### How to Think About Encodings in Fluss -Fluss supports multiple **data encodings** to optimize how data is stored and read for different workloads. Each encoding is designed to -balance storage efficiency, read performance, and query capabilities. +In Fluss, a data encoding primarily determines: + +- How data is laid out on disk (columnar vs row-oriented) +- How efficiently data can be scanned, filtered, or projected +- Whether the workload is optimized for streaming scans or key-based access + +Encodings in Fluss determine: + +- **CPU vs IO trade-offs** +- **Scan-heavy vs lookup-heavy workloads** +- **Analytical vs operational access patterns** + + +In Fluss, a data encoding primarily defines **how data is stored and accessed**. Each encoding is designed to balance storage efficiency, read performance, and query capabilities. This page describes the available encodings in Fluss and provides guidance on selecting the appropriate encoding based on workload characteristics. @@ -21,7 +33,7 @@ In Fluss, data encodings can be used in two different ways, depending on how the - **KV encoding** is designed for accessing data by key. It is used for workloads where queries look up or update values using a key and only the most recent value for each key is needed. -ARROW can be used as log encoding, while COMPACTED supports both. +ARROW can be used as log encoding, while COMPACTED supports both log and KV encodings. ## ARROW Encoding (Default) @@ -47,7 +59,7 @@ ARROW is recommended for: ARROW is less efficient for workloads that: - Always read all columns -- Workloads that mostly access individual rows by key +- Mostly access individual rows by key --- @@ -110,3 +122,17 @@ COMPACTED is not recommended when: - Filters are applied to reduce the amount of data read - Analytical workloads require flexible access to individual columns - Historical changes or full changelog data must be preserved + +## ARROW vs COMPACTED + +| Feature | ARROW | COMPACTED | +|------------------------|-------------------------------------|------------------------------------| +| Physical layout | Columnar | Row-oriented | +| Typical access pattern | Scans with projection & filters | Full-row reads or key lookups | +| Column pruning | ✅ Yes | ❌ No | +| Predicate pushdown | ✅ Yes | ❌ No | +| Storage efficiency | Good | Excellent | +| CPU efficiency | Better for selective reads | Better for full-row reads | +| Log encoding | ✅ Yes | ✅ Yes | +| KV encoding | ❌ No | ✅ Yes | +| Best suited for | Analytics workloads | State tables / materialized data | \ No newline at end of file From 32651f2d646fdc3d87fe59571cb382c1e6f4fb4c Mon Sep 17 00:00:00 2001 From: ipolyzos Date: Mon, 5 Jan 2026 21:01:30 +0200 Subject: [PATCH 7/7] small improvements --- website/docs/engine-flink/options.md | 2 +- .../{data-encodings.md => data-formats.md} | 53 +++++++++---------- 2 files changed, 27 insertions(+), 28 deletions(-) rename website/docs/table-design/{data-encodings.md => data-formats.md} (65%) diff --git a/website/docs/engine-flink/options.md b/website/docs/engine-flink/options.md index 2fecb25dd6..3eb1f198ed 100644 --- a/website/docs/engine-flink/options.md +++ b/website/docs/engine-flink/options.md @@ -73,7 +73,7 @@ See more details about [ALTER TABLE ... SET](engine-flink/ddl.md#set-properties) | table.auto-partition.num-retention | Integer | 7 | The number of history partitions to retain for auto created partitions in each check for auto partition. For example, if the current check time is 2024-11-11, time-unit is DAY, and the value is configured as 3, then the history partitions 20241108, 20241109, 20241110 will be retained. The partitions earlier than 20241108 will be deleted. The default value is 7, which means that 7 partitions will be retained. | | table.auto-partition.time-zone | String | the system time zone | The time zone for auto partitions, which is by default the same as the system time zone. | | table.replication.factor | Integer | (None) | The replication factor for the log of the new table. When it's not set, Fluss will use the cluster's default replication factor configured by default.replication.factor. It should be a positive number and not larger than the number of tablet servers in the Fluss cluster. A value larger than the number of tablet servers in Fluss cluster will result in an error when the new table is created. | -| table.log.format | Enum | ARROW | The format of the log records in log store. The default value is `ARROW`. The supported formats are `ARROW` and `INDEXED`. | +| table.log.format | Enum | ARROW | The format of the log records in log store. The default value is `ARROW`. The supported formats are `ARROW`, `INDEXED` and `COMPACTED`. | | table.log.arrow.compression.type | Enum | ZSTD | The compression type of the log records if the log format is set to `ARROW`. The candidate compression type is `NONE`, `LZ4_FRAME`, `ZSTD`. The default value is `ZSTD`. | | table.log.arrow.compression.zstd.level | Integer | 3 | The compression level of the log records if the log format is set to `ARROW` and the compression type is set to `ZSTD`. The valid range is 1 to 22. The default value is 3. | | table.kv.format | Enum | COMPACTED | The format of the kv records in kv store. The default value is `COMPACTED`. The supported formats are `COMPACTED` and `INDEXED`. | diff --git a/website/docs/table-design/data-encodings.md b/website/docs/table-design/data-formats.md similarity index 65% rename from website/docs/table-design/data-encodings.md rename to website/docs/table-design/data-formats.md index 1b373642fd..d509312a29 100644 --- a/website/docs/table-design/data-encodings.md +++ b/website/docs/table-design/data-formats.md @@ -1,45 +1,43 @@ --- -title: Data Encodings +title: Storage Formats --- -### How to Think About Encodings in Fluss +In Fluss, a storage format primarily defines **how data is stored and accessed**. Each format is designed to balance storage efficiency, read performance, and query capabilities. -In Fluss, a data encoding primarily determines: +This page describes the available formats in Fluss and provides guidance on selecting the appropriate format based on workload characteristics. +### How to Think About Formats in Fluss + +At a high level, a format determines: - How data is laid out on disk (columnar vs row-oriented) - How efficiently data can be scanned, filtered, or projected - Whether the workload is optimized for streaming scans or key-based access -Encodings in Fluss determine: - -- **CPU vs IO trade-offs** -- **Scan-heavy vs lookup-heavy workloads** -- **Analytical vs operational access patterns** - - -In Fluss, a data encoding primarily defines **how data is stored and accessed**. Each encoding is designed to balance storage efficiency, read performance, and query capabilities. +Formats in Fluss determine: -This page describes the available encodings in Fluss and provides guidance on selecting the appropriate encoding based on workload characteristics. +- CPU vs IO trade-offs +- Scan-heavy vs lookup-heavy workloads +- Analytical vs operational access patterns --- -## Log Encoding and KV Encoding +## Log Format and KV Format -In Fluss, data encodings can be used in two different ways, depending on how the data is accessed. +In Fluss, storage formats can be used in two different ways, depending on how the data is accessed. -- **Log encoding** is designed for reading data in order, as it is written. +- **Log format** is designed for reading data in order, as it is written. It is commonly used for streaming workloads, append-only tables, and changelog-style data. -- **KV encoding** is designed for accessing data by key. +- **KV format** is designed for accessing data by key. It is used for workloads where queries look up or update values using a key and only the most recent value for each key is needed. -ARROW can be used as log encoding, while COMPACTED supports both log and KV encodings. +ARROW can be used as log format, while COMPACTED supports both log and KV formats. -## ARROW Encoding (Default) +## ARROW Format (Default) ### Overview -ARROW is the **default encoding** in Fluss. It stores data in a columnar layout, organizing information by columns rather than rows. This layout is well suited for analytical and streaming workloads. +ARROW is the **default log format** in Fluss. It stores data in a columnar layout, organizing information by columns rather than rows. This layout is well suited for analytical and streaming workloads. ### Key Features @@ -63,11 +61,11 @@ ARROW is less efficient for workloads that: --- -## COMPACTED Encoding +## COMPACTED Format ### Overview -COMPACTED uses a **row-oriented encoding** that focuses on reducing storage size and CPU usage. It is optimized for workloads where queries typically access entire rows rather than individual columns. +COMPACTED uses a **row-oriented format** that focuses on reducing storage size and CPU usage. It is optimized for workloads where queries typically access entire rows rather than individual columns. ### Key Features @@ -89,7 +87,7 @@ COMPACTED is recommended for: ## Configuration -To enable the COMPACTED encoding, set the `table.format` option: +To enable the COMPACTED format for log data, set the `table.log.format` option: ```sql CREATE TABLE my_table ( @@ -97,13 +95,13 @@ CREATE TABLE my_table ( data STRING, PRIMARY KEY (id) NOT ENFORCED ) WITH ( - 'table.format' = 'COMPACTED' + 'table.log.format' = 'COMPACTED' ); ``` ### COMPACTED with WAL Changelog Image -For key-based workloads that only require the **latest value per key**, the COMPACTED encoding can be combined with the WAL changelog image mode. +For key-based workloads that only require the **latest value per key**, the COMPACTED format can be used for both log and kv data, combined with the WAL changelog image mode. ```sql CREATE TABLE kv_table ( @@ -111,7 +109,8 @@ CREATE TABLE kv_table ( value STRING, PRIMARY KEY (key) NOT ENFORCED ) WITH ( - 'table.format' = 'COMPACTED', + 'table.log.format' = 'COMPACTED', + 'table.kv.format' = 'COMPACTED', 'table.changelog.image' = 'WAL' ); ``` @@ -133,6 +132,6 @@ COMPACTED is not recommended when: | Predicate pushdown | ✅ Yes | ❌ No | | Storage efficiency | Good | Excellent | | CPU efficiency | Better for selective reads | Better for full-row reads | -| Log encoding | ✅ Yes | ✅ Yes | -| KV encoding | ❌ No | ✅ Yes | +| Log format | ✅ Yes | ✅ Yes | +| KV format | ❌ No | ✅ Yes | | Best suited for | Analytics workloads | State tables / materialized data | \ No newline at end of file