Skip to content

Conversation

@haridsv
Copy link
Contributor

@haridsv haridsv commented Jan 12, 2026

This PR implements the key management feature for HBase encryption at rest, building on the API surface and refactoring introduced in the precursor PR (#7584). It supersedes PR #7421 which originally had most of the changes from this PR as well PR #7584.

Jira: HBASE-29368
Design doc: https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?usp=sharing
Discussion thread: https://lists.apache.org/thread/q7g2rr2xcgl64rkn9j3mnokf6fvohp2y

Cumulative changes from feature branch corresponding to the following sub-tasks:

  1. Phase 1: Key caching and minimal service
  2. Phase 2: Integrate key management with existing encryption
  3. Phase 2: Migration path from current encryption to managed encryption
  4. Phase 2: Admin API to trigger for System Key rotation detection as an alternative to failover.
  5. Phase 3: Additional key management APIs

This feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes.

1. Managed Keys Infrastructure

  • Introduction of ManagedKeyProvider interface for pluggable key provider implementations on the lines of the existing KeyProvider interface.
  • The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys.
  • Comes with the default ManagedKeyStoreKeyProvider implementation using Java KeyStore, similar to the existing KeyStoreKeyProvider.
  • Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian.
  • Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed table/family namespace → table name → global namespace

2. System Key (STK) Management

  • Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly.
  • Secure storage in HDFS with support for automatic key rotation during boot up.
  • Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart.
  • Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS

3. KeymetaAdmin API

  • enableKeyManagement(keyCust, keyNamespace) - Enable key management for a custodian/namespace pair
  • getManagedKeys(keyCust, keyNamespace) - Query key status and metadata
  • rotateSTK() - Check for and propagate new system keys
  • disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for a custodian/namespace
  • disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a specific key
  • rotateManagedKey(keyCust, keyNamespace) - Rotate the active key
  • refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS to validate all the keys.
  • Internal cache management operations for convenience and meeting SLAs.

4. Persistent Key Metadata Storage

  • New system table hbase:keymeta for storing key metadata and state which acts as an L2 cache.
  • Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states
  • Stores wrapped DEKs and metadata for key lookup without depending on external KMS.
  • Optimized for high-priority access with in-memory column families
  • Key metadata tracking with cryptographic hashes for integrity verification

5. Multi-Layer Caching

  • L1: In-memory Caffeine cache on RegionServers for hot key data
  • L2: Keymeta table for persistent key metadata that is shared across all RegionServers.
  • L3: Dynamic lookup from external KMS as fallback when not found in L2.
  • Cache invalidation mechanism for key rotation scenarios

6. HBase Shell Integration

  • enable_key_management - Enable key management for a custodian and namespace

  • show_key_status - Display key status and metadata

  • rotate_stk - Trigger system key rotation

  • disable_key_management - Disable key management for a custodian and namespace

  • disable_managed_key - Disable a specific key

  • rotate_managed_key - Rotate the active key

  • refresh_managed_keys - Refresh all keys for a custodian and namespace

  • Backward Compatibility: Changes are fully compatible with existing encryption-at-rest configuration

  • Gradual step-by-step migration: Well defined migration path from existing configuration to new configuration

  • Performance: Minimal overhead through efficient caching and lazy key loading

  • Security: Cryptographic verification of key metadata, secure key wrapping

  • Operability: Administrative tools for key life cycle and cache management

  • Extensibility: Plugin architecture for custom key provider implementations

  • Testing: Comprehensive unit and integration tests coverage

The implementation follows a layered architecture:

  1. Provider Layer: Pluggable ManagedKeyProvider for KMS integration
  2. Management Layer: KeyMetaAdmin API for administrative operations
  3. Persistence Layer: KeymetaTableAccessor for metadata storage
  4. Cache Layer: ManagedKeyDataCache and SystemKeyCache for performance
  5. Service Layer: Coprocessor endpoints for client-server communication

I would particularly appreciate feedback on:

  1. API Design: Is the KeymetaAdmin API intuitive and complete for common key management scenarios?
  2. Security Model: Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees?
  3. Performance: Are there potential bottlenecks in the caching strategy or table access patterns?
  4. Operational Aspects: Are the administrative commands sufficient for the needs of operations and monitoring?
  5. Testing Coverage: Are there additional test scenarios we should cover?
  6. Documentation: Is the design document clear? What additional documentation would be helpful?
  7. Compatibility: Any concerns about interaction with existing HBase features?

After incorporating community feedback, I plan to:

  1. Address any issues identified during review
  2. Implement the work identified for future phases
  3. Add additional documentation to the reference guide

This PR introduces changes across multiple modules, so I recommend focusing on these core components first:

Core Architecture:

  1. Design document (linked above) - architectural overview
  2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces (hbase-common)
  3. ManagedKeys.proto - protocol definitions
  4. HMaster and misc. procedure changes - initialization of keymeta in a predictable order
  5. FixedFileTrailer + reader/writer changes - encode/decode additional encryption key in store files

Key Implementation:

  1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils, SystemKeyManager, SystemKeyAccessor - admin operations and persistence
  2. ManagedKeyDataCache, SystemKeyCache - caching layer
  3. SecurityUtil - encryption context creation

Client & Shell:

  1. KeymetaAdminClient - client API
  2. Shell commands and Ruby wrappers

Tests & Examples:

  1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns
  2. key_provider_keymeta_migration_test.rb - E2E migration steps

This PR implements the key management feature for HBase encryption at rest,
building on the API surface and refactoring introduced in the precursor PR (apache#7584).

Jira: [HBASE-29368](https://issues.apache.org/jira/browse/HBASE-29368)
Design doc: https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?usp=sharing
Discussion thread: https://lists.apache.org/thread/q7g2rr2xcgl64rkn9j3mnokf6fvohp2y

Cumulative changes from feature branch corresponding to the following sub-tasks:
1. [Phase 1: Key caching and minimal service](https://issues.apache.org/jira/browse/HBASE-29402)
2. [Phase 2: Integrate key management with existing encryption](https://issues.apache.org/jira/browse/HBASE-29495)
3. [Phase 2: Migration path from current encryption to managed encryption](https://issues.apache.org/jira/browse/HBASE-29617)
4. [Phase 2: Admin API to trigger for System Key rotation detection as an alternative to failover.](https://issues.apache.org/jira/browse/HBASE-29643)
5. [Phase 3: Additional key management APIs](https://issues.apache.org/jira/browse/HBASE-29666)

This feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes.

**1. Managed Keys Infrastructure**

-   Introduction of `ManagedKeyProvider` interface for pluggable key provider implementations on the lines of the existing `KeyProvider` interface.
-   The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys.
-   Comes with the default `ManagedKeyStoreKeyProvider` implementation using Java KeyStore, similar to the existing `KeyStoreKeyProvider`.
-   Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian.
-   Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed `table/family` namespace → table name → global namespace

**2. System Key (STK) Management**

-   Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly.
-   Secure storage in HDFS with support for automatic key rotation during boot up.
-   Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart.
-   Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS

**3. KeymetaAdmin API**

-   `enableKeyManagement(keyCust, keyNamespace)` - Enable key management for a custodian/namespace pair
-   `getManagedKeys(keyCust, keyNamespace)` - Query key status and metadata
-   `rotateSTK()` - Check for and propagate new system keys
-   `disableKeyManagement(keyCust, keyNamespace)` - Disable all the keys for a custodian/namespace
-   `disableManagedKey(keyCust, keyNamespace, keyMetadataHash)` - Disable a specific key
-   `rotateManagedKey(keyCust, keyNamespace)` - Rotate the active key
-   `refreshManagedKeys(keyCust, keyNamespace)` - Refresh from external KMS to validate all the keys.
-   Internal cache management operations for convenience and meeting SLAs.

**4. Persistent Key Metadata Storage**

-   New system table `hbase:keymeta` for storing key metadata and state which acts as an `L2` cache.
-   Tracks key lifecycle: `ACTIVE`, `INACTIVE`, `DISABLED`, `FAILED` states
-   Stores wrapped DEKs and metadata for key lookup without depending on external KMS.
-   Optimized for high-priority access with in-memory column families
-   Key metadata tracking with cryptographic hashes for integrity verification

**5. Multi-Layer Caching**

-   L1: In-memory Caffeine cache on RegionServers for hot key data
-   L2: Keymeta table for persistent key metadata that is shared across all RegionServers.
-   L3: Dynamic lookup from external KMS as fallback when not found in L2.
-   Cache invalidation mechanism for key rotation scenarios

**6. HBase Shell Integration**

-   `enable_key_management` - Enable key management for a custodian and namespace
-   `show_key_status` - Display key status and metadata
-   `rotate_stk` - Trigger system key rotation
-   `disable_key_management` - Disable key management for a custodian and namespace
-   `disable_managed_key` - Disable a specific key
-   `rotate_managed_key` - Rotate the active key
-   `refresh_managed_keys` - Refresh all keys for a custodian and namespace

-   **Backward Compatibility:** Changes are fully compatible with existing encryption-at-rest configuration
-   **Gradual step-by-step migration**: Well defined migration path from existing configuration to new configuration
-   **Performance:** Minimal overhead through efficient caching and lazy key loading
-   **Security:** Cryptographic verification of key metadata, secure key wrapping
-   **Operability:** Administrative tools for key life cycle and cache management
-   **Extensibility:** Plugin architecture for custom key provider implementations
-   **Testing:** Comprehensive unit and integration tests coverage

The implementation follows a layered architecture:

1.  **Provider Layer:** Pluggable `ManagedKeyProvider` for KMS integration
2.  **Management Layer:** `KeyMetaAdmin` API for administrative operations
3.  **Persistence Layer:** `KeymetaTableAccessor` for metadata storage
4.  **Cache Layer:** `ManagedKeyDataCache` and `SystemKeyCache` for performance
5.  **Service Layer:** Coprocessor endpoints for client-server communication

I would particularly appreciate feedback on:

1.  **API Design:** Is the `KeymetaAdmin` API intuitive and complete for common key management scenarios?
2.  **Security Model:** Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees?
3.  **Performance:** Are there potential bottlenecks in the caching strategy or table access patterns?
4.  **Operational Aspects:** Are the administrative commands sufficient for the needs of operations and monitoring?
5.  **Testing Coverage:** Are there additional test scenarios we should cover?
6.  **Documentation:** Is the design document clear? What additional documentation would be helpful?
7.  **Compatibility:** Any concerns about interaction with existing HBase features?

After incorporating community feedback, I plan to:

1.  Address any issues identified during review
2.  Implement the work identified for future phases
3.  Add additional documentation to the reference guide

This PR introduces changes across multiple modules, so I recommend focusing on these **core components** first:

**Core Architecture:**

1.  Design document (linked above) - architectural overview
2.  `ManagedKeyProvider`, `KeymetaAdmin`, `ManagedKeyData` interfaces (hbase-common)
3.  `ManagedKeys.proto` - protocol definitions
4.  `HMaster` and misc. procedure changes - initialization of `keymeta` in a predictable order
5.  `FixedFileTrailer` + reader/writer changes - encode/decode additional encryption key in store files

**Key Implementation:**

1.  `KeymetaAdminImpl`, `KeymetaTableAccessor`, `ManagedKeyUtils`, `SystemKeyManager`, `SystemKeyAccessor` - admin operations and persistence
2.  `ManagedKeyDataCache`, `SystemKeyCache` - caching layer
3.  `SecurityUtil` - encryption context creation

**Client & Shell:**

1.  `KeymetaAdminClient` - client API
2.  Shell commands and Ruby wrappers

**Tests & Examples:**

1.  `TestKeymetaAdminImpl`, `TestManagedKeymeta` - for usage patterns
2.  `key_provider_keymeta_migration_test.rb` - E2E migration steps
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 yamllint 0m 0s yamllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 2m 36s master passed
+1 💚 compile 6m 41s master passed
+1 💚 checkstyle 1m 46s master passed
+1 💚 spotbugs 9m 8s master passed
+1 💚 spotless 0m 40s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 2m 18s the patch passed
+1 💚 compile 6m 40s the patch passed
-0 ⚠️ javac 6m 40s /results-compile-javac-root.txt root generated 3 new + 1888 unchanged - 1 fixed = 1891 total (was 1889)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 46s /results-checkstyle-root.txt root: The patch generated 1 new + 41 unchanged - 2 fixed = 42 total (was 43)
-0 ⚠️ rubocop 0m 21s /results-rubocop.txt The patch generated 28 new + 71 unchanged - 13 fixed = 99 total (was 84)
+1 💚 spotbugs 9m 38s the patch passed
+1 💚 hadoopcheck 8m 38s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
-1 ❌ spotless 0m 10s patch has 21 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
59m 20s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7618
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless rubocop yamllint
uname Linux 82292f446d1b 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4ed7c00
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 193 (vs. ulimit of 30000)
modules C: hbase-common hbase-client hbase-server hbase-thrift hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 29s master passed
+1 💚 compile 2m 18s master passed
+1 💚 javadoc 3m 36s master passed
+1 💚 shadedjars 5m 59s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 compile 2m 21s the patch passed
+1 💚 javac 2m 21s the patch passed
-0 ⚠️ javadoc 0m 15s /results-javadoc-javadoc-hbase-common.txt hbase-common generated 1 new + 4 unchanged - 0 fixed = 5 total (was 4)
-0 ⚠️ javadoc 1m 58s /results-javadoc-javadoc-root.txt root generated 1 new + 212 unchanged - 0 fixed = 213 total (was 212)
+1 💚 shadedjars 5m 50s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 256m 56s /patch-unit-root.txt root in the patch failed.
298m 17s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7618
Optional Tests javac javadoc unit compile shadedjars
uname Linux 78ea855160f1 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 4ed7c00
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/testReport/
Max. process+thread count 7624 (vs. ulimit of 30000)
modules C: hbase-common hbase-client hbase-server hbase-thrift hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7618/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@virajjasani virajjasani self-requested a review January 13, 2026 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants