[common] Introduce MAP type for ARROW, COMPACTED and INDEXED formats #2190

XuQianJin-Stars · 2025-12-17T06:34:30Z

Purpose

Linked issue: close #1973

Introduce MAP type support for INDEXED format, achieving feature parity with ARROW and COMPACTED formats for MAP data type handling.

Brief change log

Added readMap() method to IndexedRowReader for MAP type deserialization, following the same pattern as readArray()
Removed UnsupportedOperationException for MAP type in IndexedRowReader.createFieldReader()
Added missing InternalMap import to IndexedRowReader
Updated BinaryWriterTest.testCreateValueSetterForMapThrowsException() to testValueSetterWithMapType() - changed from exception test to positive validation test
Extended IndexedRowTest.assertAllTypeEquals() to include MAP type assertions (validates size, key array, value array, and null handling)
Extended IndexedRowReaderTest.assertAllTypeEqualsForReader() to verify MAP reading

Tests

Unit Tests:

IndexedRowTest (7 tests) - Validates MAP read/write operations with IndexedRow
IndexedRowReaderTest (2 tests) - Validates MAP deserialization with IndexedRowReader
BinaryWriterTest (18 tests) - Validates MAP type ValueWriter creation and usage
ArrowReaderWriterTest (2 tests) - Confirms ARROW format MAP support continues to work
CompactedRowTest (20 tests) - Confirms COMPACTED format MAP support continues to work

Test Results: All 49 tests passed successfully (0 failures, 0 errors, 0 skipped)

Test Command:
mvn test -pl fluss-common -Dtest=IndexedRowTest,IndexedRowReaderTest,BinaryWriterTest,ArrowReaderWriterTest,CompactedRowTest -Dcheckstyle.skip=true -Dspotless.check.skip=true### API and Format

API: No breaking changes. This is purely additive functionality that removes an UnsupportedOperationException and enables existing MAP type infrastructure for INDEXED format.

Storage Format: No format changes. The MAP storage format in INDEXED rows was already defined and uses the same BinaryMap serialization mechanism as other formats. This change only enables reading MAP data that was already writeable.

Documentation

No documentation changes required. This change brings INDEXED format to feature parity with ARROW and COMPACTED formats for MAP type support. MAP type usage is already documented in existing Fluss documentation. The change completes internal implementation of existing functionality rather than introducing new user-facing features.

Copilot

Pull request overview

This PR introduces MAP type support for the INDEXED row format, achieving feature parity with ARROW and COMPACTED formats. The implementation adds MAP serialization/deserialization capabilities across the Fluss common layer, Flink integration, and lake integrations (Paimon, Iceberg placeholder).

Key Changes:

Added MAP type support to IndexedRow format with IndexedRowReader.readMap() and IndexedRowWriter.writeMap() methods
Introduced GenericMap, BinaryMap, and format-specific map implementations (IndexedMap, CompactedMap, AlignedMap) with corresponding serializers
Extended Flink, Paimon, and Arrow integrations to handle MAP type conversions and vector operations

Reviewed changes

Copilot reviewed 66 out of 66 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`IndexedRowReader.java`	Added `readMap()` method and MAP case in field reader factory
`IndexedRowWriter.java`	Added `writeMap()` method for MAP serialization
`BinaryMap.java`	New binary representation of MAP type with key/value arrays
`GenericMap.java`	Generic MAP implementation wrapping Java maps
`MapSerializer.java`	Serializer for converting InternalMap to BinaryMap
`FlinkAsFlussMap.java`	Adapter for Flink MapData to Fluss InternalMap
`PaimonMapAsFlussMap.java`	Adapter for Paimon map to Fluss InternalMap
`ArrowMapWriter.java` / `ArrowMapColumnVector.java`	Arrow format MAP support
Various test files	Comprehensive test coverage for MAP operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-26T13:53:35Z

fluss-common/src/main/java/org/apache/fluss/row/BinaryArray.java

            Object arr, int offset, int length, int elementSize) {
        final long headerInBytes = calculateHeaderInBytes(length);
-        final long valueRegionInBytes = elementSize * length;
+        final long valueRegionInBytes = (long) elementSize * length;


The cast to (long) is applied only to elementSize, not to the entire multiplication result. This could still cause integer overflow when elementSize * length exceeds Integer.MAX_VALUE before the cast is applied. The cast should wrap the entire multiplication expression: (long) elementSize * length.

Copilot · 2025-12-26T13:53:35Z

fluss-common/src/main/java/org/apache/fluss/row/GenericMap.java

+
+        java.util.Map<Object, Object> javaMap = new java.util.LinkedHashMap<>();
+        for (int i = 0; i < values.length; i += 2) {
+            javaMap.put(values[i], (i + 1 < values.length) ? values[i + 1] : null);


Test is always true, because of this condition.

Suggested change

javaMap.put(values[i], (i + 1 < values.length) ? values[i + 1] : null);

javaMap.put(values[i], values[i + 1]);

Copilot · 2025-12-26T13:53:36Z

fluss-common/src/main/java/org/apache/fluss/row/BinaryMap.java

+ * @since 0.9
+ */
+@PublicEvolving
+public class BinaryMap extends BinarySection implements InternalMap {


Class BinaryMap overrides hashCode but not equals.

wuchong

Thanks @XuQianJin-Stars for the contribution. I left some comments.

wuchong · 2025-12-28T09:39:16Z

...lient/src/test/java/org/apache/fluss/client/table/scanner/log/DefaultCompletedFetchTest.java

+
+            assertThat(row.getMap(4)).isInstanceOf(GenericMap.class);
+            GenericMap simpleMap = (GenericMap) row.getMap(4);
+            assertThat(simpleMap.size()).isGreaterThan(0);
+
+            assertThat(row.getMap(5)).isInstanceOf(GenericMap.class);
+            GenericMap nestedMap = (GenericMap) row.getMap(5);
+            assertThat(nestedMap.size()).isGreaterThan(0);
+
+            assertThat(row.getMap(6)).isInstanceOf(GenericMap.class);
+            GenericMap mapWithArray = (GenericMap) row.getMap(6);
+            assertThat(mapWithArray.size()).isGreaterThan(0);


Actually, this approach doesn’t verify the actual field values, and it significantly inflates the line count, making the code harder to maintain.

Moreover, we’ll soon need to add assertions for column projection here as well.

My suggestion: prepare a list of expected rows upfront and perform a direct assert equals comparison with the actual result.

This keeps the test concise, expressive, and much easier to maintain.

wuchong · 2025-12-28T09:40:41Z

fluss-common/src/main/java/org/apache/fluss/row/aligned/AlignedRow.java

+    public InternalMap getMap(int pos) {
+        assertIndexIsValid(pos);
+        int fieldOffset = getFieldOffset(pos);
+        final long offsetAndSize = BinarySegmentUtils.getLong(segments, fieldOffset);


keep align with others to use final long offsetAndSize = segments[0].getLong(fieldOffset) as we don't support multi-segments yet.

wuchong · 2025-12-28T09:43:09Z

fluss-common/src/main/java/org/apache/fluss/row/arrow/vectors/ArrowMapColumnVector.java

+    private void init() {
+        if (!inited) {
+            FieldVector mapVector = ((MapVector) vector).getDataVector();
+            StructVector structVector = (StructVector) mapVector;
+            FieldVector keyVector = structVector.getChildrenFromFields().get(0);
+            FieldVector valueVector = structVector.getChildrenFromFields().get(1);
+            this.keyColumnVector = ArrowUtils.createArrowColumnVector(keyVector, keyType);
+            this.valueColumnVector = ArrowUtils.createArrowColumnVector(valueVector, valueType);
+            inited = true;
+        }
+    }


initialize in the constructor and remove this method.

wuchong · 2025-12-28T09:43:20Z

fluss-common/src/main/java/org/apache/fluss/row/arrow/vectors/ArrowMapColumnVector.java

+    public ColumnVector getKeyColumnVector() {
+        init();
+        return keyColumnVector;
+    }
+
+    public ColumnVector getValueColumnVector() {
+        init();
+        return valueColumnVector;
+    }


unsed, remove

wuchong · 2025-12-28T09:47:48Z

fluss-common/src/main/java/org/apache/fluss/row/arrow/vectors/ArrowMapColumnVector.java

+    @Override
+    public InternalMap getMap(int i) {
+        init();
+        MapVector mapVector = (MapVector) vector;


Direct store vector as MapVector in ArrowMapColumnVector to avoid casting per record.

wuchong · 2025-12-28T12:18:28Z

fluss-common/src/test/java/org/apache/fluss/row/BinaryWriterTest.java

+        BinaryWriter.ValueWriter setter =
+                BinaryWriter.createValueWriter(
+                        DataTypes.MAP(DataTypes.INT().copy(false), DataTypes.STRING()),
+                        BinaryRow.BinaryRowFormat.ALIGNED);


Use COMPACTED format in tests by default as it is more widely used than ALIGNED and INDEXED.

wuchong · 2025-12-28T12:20:33Z

fluss-common/src/test/java/org/apache/fluss/row/columnar/ColumnarMapTest.java

+    }
+
+    @Test
+    public void testSize() {


remove! as this has been tested in other test methods.

wuchong · 2025-12-28T12:31:52Z

...uss-flink-common/src/main/java/org/apache/fluss/flink/utils/FlussRowToFlinkRowConverter.java

+                    InternalArray keyArray = flussMap.keyArray();
+                    InternalArray valueArray = flussMap.valueArray();
+                    int size = flussMap.size();
+                    java.util.Map<Object, Object> javaMap = new java.util.HashMap<>();


wuchong · 2025-12-28T12:41:12Z

...ink/fluss-flink-common/src/test/java/org/apache/fluss/flink/sink/FlinkComplexTypeITCase.java

    }

+    @Test
+    void testMapTypesInLogTable() throws Exception {


I found that the current tests not cover all the cases, like missing PK table tests for Map and Row type, missing partitioned table test for Row type.

Could you merge the map type tests and row type tests into the existing testArrayTypesInPrimaryKeyTable, testArrayTypesInPartitionedLogTable, and testArrayTypesInLogTable, and rename them to testComplexTypesInPrimaryKeyTable, testComplexTypesInPartitionedLogTable, and testComplexTypesInLogTable? This can also reduce the test time.

Besides, could you add more nested type tests? Like

array<map<string, double>>

row<map<int, array<float>>

map<bigint, row<string, array<string>, array<int>>

wuchong · 2025-12-28T12:42:09Z

...flink-common/src/test/java/org/apache/fluss/flink/utils/FlussRowToFlinkRowConverterTest.java


+            // map
+            assertThat(flinkRow.getMap(22)).isNotNull();
+            assertThat(flinkRow.getMap(22).size()).isEqualTo(3);


assert map values.

XuQianJin-Stars force-pushed the feature/issue-1973-support-map-format branch 3 times, most recently from 26bf237 to ed1f8fc Compare December 24, 2025 15:57

[common] Introduce MAP type for ARROW, COMPACTED and INDEXED formats

a7642db

XuQianJin-Stars force-pushed the feature/issue-1973-support-map-format branch from ed1f8fc to a7642db Compare December 25, 2025 02:28

This was referenced Dec 25, 2025

Support Map type in KV table #1976

Open

[Feature] Support for Arrow map data type #834

Open

wuchong added the priority=critical label Dec 26, 2025

wuchong requested a review from Copilot December 26, 2025 13:44

Copilot started reviewing on behalf of wuchong December 26, 2025 13:45 View session

Copilot AI reviewed Dec 26, 2025

View reviewed changes

fix some comments

d32c901

wuchong reviewed Dec 28, 2025

View reviewed changes

	javaMap.put(values[i], (i + 1 < values.length) ? values[i + 1] : null);
	javaMap.put(values[i], values[i + 1]);

[common] Introduce MAP type for ARROW, COMPACTED and INDEXED formats #2190

Are you sure you want to change the base?

[common] Introduce MAP type for ARROW, COMPACTED and INDEXED formats #2190

Conversation

XuQianJin-Stars commented Dec 17, 2025

Purpose

Brief change log

Tests

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants