Skip to content

Introduce UUID data type as a Extension Type (move from test) #902

@jhrotko

Description

@jhrotko

Move UUID Extension Type from Test Module to Main Source

Summary

Move the UUID extension type implementation from the test module to the main source code alongside other extension types like OpaqueType. This will make UUID available as a first-class extension type for users of the Arrow Java library.

Background

UUID (Universally Unique Identifier) is a widely-used data type in many applications and databases. Arrow Java currently has a complete UUID extension type implementation, but it's located in the test module rather than being available for general use.

Current State

The UUID extension type is currently implemented in the test module with the following components:

Type and Vector:

  • vector/src/test/java/org/apache/arrow/vector/types/pojo/UuidType.java - Extension type definition
  • vector/src/test/java/org/apache/arrow/vector/UuidVector.java - Vector implementation

Reader/Writer Support:

  • vector/src/test/java/org/apache/arrow/vector/complex/impl/UuidReaderImpl.java - Reader implementation
  • vector/src/test/java/org/apache/arrow/vector/complex/impl/UuidWriterImpl.java - Writer implementation
  • vector/src/test/java/org/apache/arrow/vector/complex/impl/UuidWriterFactory.java - Writer factory

Holder:

  • vector/src/test/java/org/apache/arrow/vector/holder/UuidHolder.java - Data holder

Desired State

Move these components to the main source tree following the pattern established by OpaqueType:

Type and Vector:

  • vector/src/main/java/org/apache/arrow/vector/extension/UuidType.java
  • vector/src/main/java/org/apache/arrow/vector/extension/UuidVector.java

Reader/Writer Support:

  • vector/src/main/java/org/apache/arrow/vector/complex/impl/UuidReaderImpl.java
  • vector/src/main/java/org/apache/arrow/vector/complex/impl/UuidWriterImpl.java
  • vector/src/main/java/org/apache/arrow/vector/complex/impl/UuidWriterFactory.java

Holder:

  • vector/src/main/java/org/apache/arrow/vector/holder/UuidHolder.java

Motivation

  1. Standardization: UUID is a common data type used across many systems (PostgreSQL, MySQL, Cassandra, etc.). Having it available as a standard extension type improves interoperability.

  2. Consistency: The current implementation is fully functional and well-tested. It should be available to users rather than hidden in test code.

  3. Follows Established Pattern: The OpaqueType extension type is already in org.apache.arrow.vector.extension, establishing a clear pattern for where extension types should live.

  4. Reduces Code Duplication: Currently, projects that need UUID support must either:

    • Copy the test implementation into their own codebase
    • Depend on the test JAR (which is not recommended)
    • Implement their own UUID extension type

Benefits

  1. User-Friendly: Users can directly use UUID extension type without copying test code
  2. Better Testing: The implementation will be tested as part of the main codebase
  3. Improved Interoperability: Standardized UUID support across Arrow implementations
  4. Future-Proof: Establishes a pattern for adding more extension types in the future

Compatibility

This change is backward compatible:

  • Existing code using the test implementation will continue to work
  • The test module can keep stub classes that extend/reference the main implementation for a deprecation period
  • No breaking changes to the Arrow IPC format or wire protocol

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions