Skip to content

Conversation

@haridsv
Copy link
Contributor

@haridsv haridsv commented Oct 29, 2025

Jira: HBASE-29368
Design doc: https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?usp=sharing
Discussion thread: https://lists.apache.org/thread/q7g2rr2xcgl64rkn9j3mnokf6fvohp2y

Cumulative changes from feature branch corresponding to the following sub-tasks:

  1. Phase 1: Key caching and minimal service
  2. Phase 2: Integrate key management with existing encryption
  3. Phase 2: Migration path from current encryption to managed encryption
  4. Phase 2: Admin API to trigger for System Key rotation detection as an alternative to failover.
  5. Phase 3: Additional key management APIs

Overview

This feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes.

Key Features

1. Managed Keys Infrastructure

  • Introduction of ManagedKeyProvider interface for pluggable key provider implementations on the lines of the existing KeyProvider interface.
  • The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys.
  • Comes with the default ManagedKeyStoreKeyProvider implementation using Java KeyStore, similar to the existing KeyStoreKeyProvider.
  • Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian.
  • Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed table/family namespace → table name → global namespace

2. System Key (STK) Management

  • Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly.
  • Secure storage in HDFS with support for automatic key rotation during boot up.
  • Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart.
  • Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS

3. KeymetaAdmin API

  • enableKeyManagement(keyCust, keyNamespace) - Enable key management for a custodian/namespace pair
  • getManagedKeys(keyCust, keyNamespace) - Query key status and metadata
  • rotateSTK() - Check for and propagate new system keys
  • disableKeyManagement(keyCust, keyNamespace) - Disable all the keys for a custodian/namespace
  • disableManagedKey(keyCust, keyNamespace, keyMetadataHash) - Disable a specific key
  • rotateManagedKey(keyCust, keyNamespace) - Rotate the active key
  • refreshManagedKeys(keyCust, keyNamespace) - Refresh from external KMS to validate all the keys.
  • Internal cache management operations for convenience and meeting SLAs.

4. Persistent Key Metadata Storage

  • New system table hbase:keymeta for storing key metadata and state which acts as an L2 cache.
  • Tracks key lifecycle: ACTIVE, INACTIVE, DISABLED, FAILED states
  • Stores wrapped DEKs and metadata for key lookup without depending on external KMS.
  • Optimized for high-priority access with in-memory column families
  • Key metadata tracking with cryptographic hashes for integrity verification

5. Multi-Layer Caching

  • L1: In-memory Caffeine cache on RegionServers for hot key data
  • L2: Keymeta table for persistent key metadata that is shared across all RegionServers.
  • L3: Dynamic lookup from external KMS as fallback when not found in L2.
  • Cache invalidation mechanism for key rotation scenarios

6. HBase Shell Integration

  • enable_key_management - Enable key management for a custodian and namespace
  • show_key_status - Display key status and metadata
  • rotate_stk - Trigger system key rotation
  • disable_key_management - Disable key management for a custodian and namespace
  • disable_managed_key - Disable a specific key
  • rotate_managed_key - Rotate the active key
  • refresh_managed_keys - Refresh all keys for a custodian and namespace

Implementation Highlights

  • Backward Compatibility: Changes are fully compatible with existing encryption-at-rest configuration
  • Gradual step-by-step migration: Well defined migration path from existing configuration to new configuration
  • Performance: Minimal overhead through efficient caching and lazy key loading
  • Security: Cryptographic verification of key metadata, secure key wrapping
  • Operability: Administrative tools for key life cycle and cache management
  • Extensibility: Plugin architecture for custom key provider implementations
  • Testing: Comprehensive unit and integration tests coverage

Architecture

The implementation follows a layered architecture:

  1. Provider Layer: Pluggable ManagedKeyProvider for KMS integration
  2. Management Layer: KeyMetaAdmin API for administrative operations
  3. Persistence Layer: KeymetaTableAccessor for metadata storage
  4. Cache Layer: ManagedKeyDataCache and SystemKeyCache for performance
  5. Service Layer: Coprocessor endpoints for client-server communication

Areas for Review

I would particularly appreciate feedback on:

  1. API Design: Is the KeymetaAdmin API intuitive and complete for common key management scenarios?
  2. Security Model: Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees?
  3. Performance: Are there potential bottlenecks in the caching strategy or table access patterns?
  4. Operational Aspects: Are the administrative commands sufficient for the needs of operations and monitoring?
  5. Testing Coverage: Are there additional test scenarios we should cover?
  6. Documentation: Is the design document clear? What additional documentation would be helpful?
  7. Compatibility: Any concerns about interaction with existing HBase features?

Next Steps

After incorporating community feedback, I plan to:

  1. Address any issues identified during review
  2. Implement the work identified for future phases
  3. Add additional documentation to the reference guide

How to Review

This PR introduces changes across multiple modules. Rather than reviewing all 143 files, I recommend focusing on these core components first:

Core Architecture:

  1. Design document (linked above) - architectural overview
  2. ManagedKeyProvider, KeymetaAdmin, ManagedKeyData interfaces (hbase-common)
  3. ManagedKeys.proto - protocol definitions
  4. HMaster and misc. procedure changes - initialization of keymeta in a predictable order
  5. FixedFileTrailer + reader/writer changes - encode/decode additional encryption key in store files

Key Implementation:

  1. KeymetaAdminImpl, KeymetaTableAccessor, ManagedKeyUtils, SystemKeyManager, SystemKeyAccessor - admin operations and persistence
  2. ManagedKeyDataCache, SystemKeyCache - caching layer
  3. SecurityUtil - encryption context creation

Client & Shell:

  1. KeymetaAdminClient - client API
  2. Shell commands and Ruby wrappers

Tests & Examples:

  1. TestKeymetaAdminImpl, TestManagedKeymeta - for usage patterns
  2. key_provider_keymeta_migration_test.rb - E2E migration steps

Note: The remaining files contain secondary changes (API updates, test helpers, configuration constants, etc.) that can be reviewed later or skipped for initial feedback.

@Apache-HBase

This comment has been minimized.

haridsv added a commit to haridsv/hbase that referenced this pull request Oct 30, 2025
@haridsv
Copy link
Contributor Author

haridsv commented Oct 30, 2025

Fixing misc. issues flagged in the PR validation build via PR #7423

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@haridsv haridsv marked this pull request as ready for review November 10, 2025 13:48
@haridsv haridsv changed the title HBASE-29368: Key management for encryption at rest (phase 1 and 2 changes) HBASE-29368: Key management for encryption at rest (MVP changes) Nov 28, 2025
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 yamllint 0m 0s yamllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 36s Maven dependency ordering for branch
+1 💚 mvninstall 4m 28s master passed
+1 💚 compile 10m 48s master passed
+1 💚 checkstyle 2m 47s master passed
+1 💚 spotbugs 17m 19s master passed
+1 💚 spotless 1m 7s branch has no errors when running spotless:check.
-0 ⚠️ patch 2m 7s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 49s the patch passed
+1 💚 compile 10m 36s the patch passed
+1 💚 cc 10m 36s the patch passed
-0 ⚠️ javac 10m 36s /results-compile-javac-root.txt root generated 2 new + 1896 unchanged - 0 fixed = 1898 total (was 1896)
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 32s /results-checkstyle-root.txt root: The patch generated 1 new + 102 unchanged - 1 fixed = 103 total (was 103)
-0 ⚠️ rubocop 0m 47s /results-rubocop.txt The patch generated 34 new + 488 unchanged - 17 fixed = 522 total (was 505)
+1 💚 xmllint 0m 1s No new issues.
+1 💚 spotbugs 20m 7s the patch passed
+1 💚 hadoopcheck 14m 29s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 hbaseprotoc 6m 31s the patch passed
-1 ❌ spotless 0m 48s patch has 23 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 1m 33s The patch does not generate ASF License warnings.
110m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7421/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7421
JIRA Issue HBASE-29368
Optional Tests dupname asflicense codespell detsecrets spotless javac spotbugs checkstyle compile hadoopcheck hbaseanti cc buflint bufcompat hbaseprotoc xmllint rubocop yamllint
uname Linux a74441611868 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ffc1743
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7421/5/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 190 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-client hbase-procedure hbase-server hbase-testing-util hbase-thrift hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7421/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Umeshkumar9414
Copy link
Contributor

You can run mvn spotless:apply for spotless failures

haridsv added a commit to haridsv/hbase that referenced this pull request Dec 18, 2025
…ement feature

This commit prepares the codebase for the upcoming key management feature
(HBASE-29368) by introducing the necessary API definitions, protocol buffer
changes, and infrastructure refactoring. No functional changes are included;
all implementation will follow in the feature PR.

This precursor PR essentially extracts the API surface definitions and
infrastructure refactoring from the main feature PR (apache#7421) to facilitate
easier review.  By separating the ~15k line feature PR into a smaller precursor
containing interface definitions, protocol changes, and method signature
updates, the subsequent feature PR will focus purely on implementation logic.

API Surface Additions:
* New interfaces:
  - KeymetaAdmin: Admin API for key management operations
  - Server methods for cache management (getManagedKeyDataCache, getSystemKeyCache)

* Protocol buffer definitions:
  - ManagedKeys.proto: Definitions for managed key data and operations
  - Admin.proto: RPC methods for key management admin operations
  - Procedure.proto: Key rotation procedure support

Infrastructure Refactoring:
* Encryption context creation:
  - Moved createEncryptionContext from EncryptionUtil (client) to SecurityUtil (server)
    where it properly belongs, as it requires server-side resources
  - Added overloads to support future key encryption key (KEK) parameters

* Method signature updates:
  - Added ManagedKeyDataCache and SystemKeyCache parameters to encryption-related
    methods throughout HRegion, HStore, HStoreFile, and HFile classes
  - Updated constructors and factory methods to thread cache references
  - All cache parameters are currently null/unused, enabling gradual feature rollout

* New utility methods:
  - Encryption.encryptWithGivenKey() / decryptWithGivenKey(): Extract method
    refactoring to support both subject-based and KEK-based encryption
  - EncryptionUtil.wrapKey() / unwrapKey() overloads with KEK parameter
  - Bytes.add() 4-argument overload for concatenation

Stub Infrastructure:
* Blank place holder shells for some public data classes such as
  ManagedKeyData and KeymetaAdminClient
* Stub implementations for key management services and caches that return null
  or throw UnsupportedOperationException, clearly documented as placeholders
* New package org.apache.hadoop.hbase.keymeta for key management classes
* Mock services updated to support new cache getter methods for testing

Code Organization:
* Procedure framework: Added support for region-level server name tracking
  to support future key rotation procedures
* Testing infrastructure updated to support new constructor signatures

All stub implementations clearly document they are placeholders for the
upcoming feature PR. Existing encryption functionality remains unchanged
and continues to work as before.

Testing:
* All existing tests pass (precursor introduces no functional changes)
* Build completes successfully with new API surface
* Backward compatibility maintained for non-key-management code paths
@haridsv
Copy link
Contributor Author

haridsv commented Jan 12, 2026

Closing as PR #7618 supersedes this PR.

@haridsv haridsv closed this Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants