Skip to content

Conversation

@haridsv
Copy link
Contributor

@haridsv haridsv commented Dec 30, 2025

This commit prepares the codebase for the upcoming key management feature (HBASE-29368) by introducing the necessary API definitions, protocol buffer changes, and infrastructure refactoring. No functional changes are included; all implementation will follow in the feature PR.

This precursor PR essentially extracts the API surface definitions and infrastructure refactoring from the main feature PR (#7421) to facilitate easier review. By separating the ~15k line feature PR into a smaller precursor containing interface definitions, protocol changes, and method signature updates, the subsequent feature PR will focus purely on implementation logic.

API Surface Additions:

  • New interfaces:

    • KeymetaAdmin: Admin API for key management operations
    • Server methods for cache management (getManagedKeyDataCache, getSystemKeyCache)
  • Protocol buffer definitions:

    • ManagedKeys.proto: Definitions for managed key data and operations
    • Admin.proto: RPC methods for key management admin operations
    • Procedure.proto: Key rotation procedure support

Infrastructure Refactoring:

  • Encryption context creation:

    • Moved createEncryptionContext from EncryptionUtil (client) to SecurityUtil (server) where it properly belongs, as it requires server-side resources
    • Added overloads to support future key encryption key (KEK) parameters
  • Method signature updates:

    • Added ManagedKeyDataCache and SystemKeyCache parameters to encryption-related methods throughout HRegion, HStore, HStoreFile, and HFile classes
    • Updated constructors and factory methods to thread cache references
    • All cache parameters are currently null/unused, enabling gradual feature rollout
  • New utility methods:

    • Encryption.encryptWithGivenKey() / decryptWithGivenKey(): Extract method refactoring to support both subject-based and KEK-based encryption
    • EncryptionUtil.wrapKey() / unwrapKey() overloads with KEK parameter
    • Bytes.add() 4-argument overload for concatenation

Stub Infrastructure:

  • Blank place holder shells for some public data classes such as ManagedKeyData and KeymetaAdminClient
  • Stub implementations for key management services and caches that return null or throw UnsupportedOperationException, clearly documented as placeholders
  • New package org.apache.hadoop.hbase.keymeta for key management classes
  • Mock services updated to support new cache getter methods for testing

Code Organization:

  • Procedure framework: Added support for region-level server name tracking to support future key rotation procedures
  • Testing infrastructure updated to support new constructor signatures

All stub implementations clearly document they are placeholders for the upcoming feature PR. Existing encryption functionality remains unchanged and continues to work as before.

Testing:

  • Build completes successfully with new API surface
  • All existing tests pass (precursor introduces no functional changes)

…ement feature

This commit prepares the codebase for the upcoming key management feature
(HBASE-29368) by introducing the necessary API definitions, protocol buffer
changes, and infrastructure refactoring. No functional changes are included;
all implementation will follow in the feature PR.

This precursor PR essentially extracts the API surface definitions and
infrastructure refactoring from the main feature PR (apache#7421) to facilitate
easier review.  By separating the ~15k line feature PR into a smaller precursor
containing interface definitions, protocol changes, and method signature
updates, the subsequent feature PR will focus purely on implementation logic.

API Surface Additions:
* New interfaces:
  - KeymetaAdmin: Admin API for key management operations
  - Server methods for cache management (getManagedKeyDataCache, getSystemKeyCache)

* Protocol buffer definitions:
  - ManagedKeys.proto: Definitions for managed key data and operations
  - Admin.proto: RPC methods for key management admin operations
  - Procedure.proto: Key rotation procedure support

Infrastructure Refactoring:
* Encryption context creation:
  - Moved createEncryptionContext from EncryptionUtil (client) to SecurityUtil (server)
    where it properly belongs, as it requires server-side resources
  - Added overloads to support future key encryption key (KEK) parameters

* Method signature updates:
  - Added ManagedKeyDataCache and SystemKeyCache parameters to encryption-related
    methods throughout HRegion, HStore, HStoreFile, and HFile classes
  - Updated constructors and factory methods to thread cache references
  - All cache parameters are currently null/unused, enabling gradual feature rollout

* New utility methods:
  - Encryption.encryptWithGivenKey() / decryptWithGivenKey(): Extract method
    refactoring to support both subject-based and KEK-based encryption
  - EncryptionUtil.wrapKey() / unwrapKey() overloads with KEK parameter
  - Bytes.add() 4-argument overload for concatenation

Stub Infrastructure:
* Blank place holder shells for some public data classes such as
  ManagedKeyData and KeymetaAdminClient
* Stub implementations for key management services and caches that return null
  or throw UnsupportedOperationException, clearly documented as placeholders
* New package org.apache.hadoop.hbase.keymeta for key management classes
* Mock services updated to support new cache getter methods for testing

Code Organization:
* Procedure framework: Added support for region-level server name tracking
  to support future key rotation procedures
* Testing infrastructure updated to support new constructor signatures

All stub implementations clearly document they are placeholders for the
upcoming feature PR. Existing encryption functionality remains unchanged
and continues to work as before.

Testing:
* All existing tests pass (precursor introduces no functional changes)
* Build completes successfully with new API surface
* Backward compatibility maintained for non-key-management code paths
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@haridsv
Copy link
Contributor Author

haridsv commented Dec 30, 2025

There are 4 test failures in (TestGzipFilter,TestNamespacesInstanceResource,TestScannersWithFilters,TestSchemaResource) but I can't repro them locally so they appear to have flapped. The 4 checkstyle errors can be ignored, they will go away in my next PR. There are apparently 28 errors from spotless:check, but when I ran spotless:apply, only the 2 unused imports were removed nothing else, so CI seems to be overreporting it (I have observed this previously too).

@virajjasani virajjasani self-requested a review December 30, 2025 16:23
@virajjasani
Copy link
Contributor

There are apparently 28 errors from spotless:check, but when I ran spotless:apply, only the 2 unused imports were removed nothing else, so CI seems to be overreporting it (I have observed this previously too).

I also think sometimes it does overreport.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, so far looks good overall, left few comments

* @param d fourth fourth
* @return New array made from a, b, c, and d
*/
public static byte[] add(final byte[] a, final byte[] b, final byte[] c, final byte[] d) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding more byte[], how about we let core logic to keep using existing utilities? e.g. if we have to concatenate 4 byte[] (a, b, c, d), we can first concatenate c + d, and then concatenate (a + b + result of c+d)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is possible, in fact from this angle the current 3 param variant can also be done using the 2 param variant, but I would assume that a dedicated method was created to make it more efficient and produces less garbage, so I extended the same philosophy to the new 4 param variant. IMHO, the duplication is worth having from the point of squeezing out additional performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, for now we can add note in the javadoc that if any use case needs more than 4 bytes, divide and use any of the existing utilities. So that in future, no one keeps adding more variants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the javadoc of all the current variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed to see another variant that takes an array of byte[] (currently unused) that should suffice, perhaps we an take a vararg there so I can revert this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* STUB IMPLEMENTATION - Feature not yet complete. This class represents encryption key data. Full
* implementation will be in HBASE-29368 feature PR.
*/
@InterfaceAudience.Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting downstreamers to directly use ManagerKeyData class? If no, we should keep IA.Private. If yes, we should still add IS.Unstable with IA.Public for now.

These are not final consumable APIs anyway so we can't only keep IA.Public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ManagedKeyData is exposed to the hbase clients so I think it qualifies as a public interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add IS.Unstable per your suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't IS.Evolving a better fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evolving is also fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* KeymetaAdmin is an interface for administrative functions related to managed keys. It handles the
* following methods:
*/
@InterfaceAudience.Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, applies to all new interfaces and apis

Copy link
Contributor Author

@haridsv haridsv Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used outside the hbase-common module so I marked it as public, is that not a good enough reason to make it public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this too is exposed to the client applications so seems appropriate. I will double check any other usages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phoenix will directly use the class? If so, yes we can expose it but still we should mark it IS.Unstable because it is in dev phase and not stabilitized. We can also change signatures in future and after a couple years and few big releases, we can mark it stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +424 to +431
rpc RefreshSystemKeyCache(EmptyMsg)
returns(EmptyMsg);

rpc EjectManagedKeyDataCacheEntry(ManagedKeyEntryRequest)
returns(BooleanMsg);

rpc ClearManagedKeyDataCache(EmptyMsg)
returns(EmptyMsg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting more apis in future?
We might want to consider having ManagedKeyAdmin interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need to add more APIs. Adding these here made it simpler to take advantage of the existing async framework and the primitives for parallel execution (across all the RS) for the admin API. The higher level KeymetaAdmin RPC interface is blocking and makes use of these primitives.

Comment on lines 7693 to 7697
} catch (Throwable e) {
// Try the old signature for the sake of test code.
return createInstance(conf, ctorArgTypes.subList(0, ctorArgTypes.size() - 1),
ctorArgs.subList(0, ctorArgs.size() - 1));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fallback necessary? We can catching any type of Throwable here. Or is it going to be a lot of changes in tests to use new constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you look for the usage of HConstants.REGION_IMPL, you would see many test classes each with their own test/mock class so I wanted to reduce the change. However, I think I can be more specific on the exception types.

Copy link
Contributor Author

@haridsv haridsv Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the existing createInstance is already doing a catch all and convert it to IllegalStateException, so there is technically no difference between catching Throwable or more specific IllegalStateException in this catch block. I will still change it to be clear about the intention and avoid any potential confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, IllegalStateException will be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 7859 to 7862
public static Region openHRegion(final Region other, final CancelableProgressable reporter)
throws IOException {
return openHRegion((HRegion) other, reporter);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While HRegion is IA.Private, downstreamers with coproc still tend to use some utilities. It would be better to keep this. It is more generic Region instead of HRegion anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can restore this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

decryptWithGivenKey(key, out, in, outLen, alterCipher, iv);
} else {
throw new IOException(e);
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to retain throw new IOException(e)? unless this is easier to catch later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, e is already of type IOException so I didn't think wrapping it again with another IOException is adding any value. Do you see any issue here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought that it should not be a problem, i was just curious if it breaks any exception catching and validation of the error message based on the current logic. If everything is fine, we are good.

@InterfaceStability.Evolving
public class SecurityUtil {
public final class SecurityUtil {
private static final Logger LOG = LoggerFactory.getLogger(SecurityUtil.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is being used in the feature PR.

@haridsv haridsv force-pushed the HBASE-29368-precursor branch from 3ab4b81 to fbacd7d Compare January 9, 2026 04:59
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 2m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for branch
+1 💚 mvninstall 3m 49s master passed
+1 💚 compile 12m 58s master passed
+1 💚 checkstyle 2m 42s master passed
+1 💚 spotbugs 17m 2s master passed
+1 💚 spotless 0m 58s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 30s the patch passed
+1 💚 compile 12m 49s the patch passed
+1 💚 cc 12m 49s the patch passed
+1 💚 javac 12m 49s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 47s /results-checkstyle-root.txt root: The patch generated 2 new + 92 unchanged - 1 fixed = 94 total (was 93)
-0 ⚠️ rubocop 0m 10s /results-rubocop.txt The patch generated 6 new + 417 unchanged - 4 fixed = 423 total (was 421)
+1 💚 xmllint 0m 0s No new issues.
+1 💚 spotbugs 18m 7s the patch passed
+1 💚 hadoopcheck 13m 47s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 hbaseprotoc 5m 53s the patch passed
+1 💚 spotless 0m 54s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 1m 28s The patch does not generate ASF License warnings.
111m 0s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7584
Optional Tests dupname asflicense codespell detsecrets spotless javac spotbugs checkstyle compile hadoopcheck hbaseanti cc buflint bufcompat hbaseprotoc xmllint rubocop
uname Linux f3516b5bcdd1 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 3ab4b81
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 192 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-client hbase-procedure hbase-server hbase-testing-util hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 3m 15s master passed
+1 💚 compile 2m 9s master passed
+1 💚 javadoc 3m 31s master passed
+1 💚 shadedjars 6m 3s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 2m 7s the patch passed
+1 💚 javac 2m 7s the patch passed
+1 💚 javadoc 3m 32s the patch passed
+1 💚 shadedjars 5m 59s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 298m 13s /patch-unit-root.txt root in the patch failed.
337m 19s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7584
Optional Tests javac javadoc unit compile shadedjars
uname Linux 9a6ebe025106 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 3ab4b81
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/2/testReport/
Max. process+thread count 6285 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-client hbase-procedure hbase-server hbase-testing-util hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 20s Maven dependency ordering for branch
+1 💚 mvninstall 3m 17s master passed
+1 💚 compile 2m 12s master passed
+1 💚 javadoc 3m 35s master passed
+1 💚 shadedjars 6m 1s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 2m 56s the patch passed
+1 💚 compile 2m 10s the patch passed
+1 💚 javac 2m 10s the patch passed
+1 💚 javadoc 3m 32s the patch passed
+1 💚 shadedjars 5m 58s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 254m 9s /patch-unit-root.txt root in the patch failed.
290m 43s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7584
Optional Tests javac javadoc unit compile shadedjars
uname Linux 07b17627adca 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fbacd7d
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/3/testReport/
Max. process+thread count 4582 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-client hbase-procedure hbase-server hbase-testing-util hbase-shell . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7584/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@virajjasani virajjasani changed the title HBASE-29368 [Precursor] Add API surface and refactoring for key management feature HBASE-29822 Add API surface and refactoring for key management feature (HBASE-29368) Jan 11, 2026
@virajjasani virajjasani merged commit b54cb7d into apache:master Jan 11, 2026
1 check failed
haridsv added a commit to haridsv/hbase that referenced this pull request Jan 12, 2026
This PR implements the key management feature for HBase encryption at rest,
building on the API surface and refactoring introduced in the precursor PR (apache#7584).

Jira: [HBASE-29368](https://issues.apache.org/jira/browse/HBASE-29368)
Design doc: https://docs.google.com/document/d/1ToW_rveXHXUc1F6eFNQfu5LOeMAjzgq6FcYUDbdZrSM/edit?usp=sharing
Discussion thread: https://lists.apache.org/thread/q7g2rr2xcgl64rkn9j3mnokf6fvohp2y

Cumulative changes from feature branch corresponding to the following sub-tasks:
1. [Phase 1: Key caching and minimal service](https://issues.apache.org/jira/browse/HBASE-29402)
2. [Phase 2: Integrate key management with existing encryption](https://issues.apache.org/jira/browse/HBASE-29495)
3. [Phase 2: Migration path from current encryption to managed encryption](https://issues.apache.org/jira/browse/HBASE-29617)
4. [Phase 2: Admin API to trigger for System Key rotation detection as an alternative to failover.](https://issues.apache.org/jira/browse/HBASE-29643)
5. [Phase 3: Additional key management APIs](https://issues.apache.org/jira/browse/HBASE-29666)

This feature introduces a comprehensive key management system that extends HBase's existing encryption-at-rest capabilities. The implementation provides enterprise-grade key lifecycle management with support for key rotation, hierarchical namespace resolution for key lookup, key caching and improved integration with key management systems to handle key life cycles and external key changes.

**1. Managed Keys Infrastructure**

-   Introduction of `ManagedKeyProvider` interface for pluggable key provider implementations on the lines of the existing `KeyProvider` interface.
-   The new interface can also return Data Encryption Keys (DEKs) and a lot more details on the keys.
-   Comes with the default `ManagedKeyStoreKeyProvider` implementation using Java KeyStore, similar to the existing `KeyStoreKeyProvider`.
-   Enables logical key isolation for multi-tenant scenarios through custodian identifiers (future use cases) and the special default global custodian.
-   Hierarchical namespace resolution for DEKs with automatic fallback: explicit CF namespace attribute → constructed `table/family` namespace → table name → global namespace

**2. System Key (STK) Management**

-   Cluster-wide system key for wrapping data encryption keys (DEKs). This is equivalent to the existing master key, but better managed and operation friendly.
-   Secure storage in HDFS with support for automatic key rotation during boot up.
-   Admin API to trigger key rotation and propagation to all RegionServers without needing to do a rolling restart.
-   Preserves the current double-wrapping architecture: DEKs wrapped by STK, STK sourced from external KMS

**3. KeymetaAdmin API**

-   `enableKeyManagement(keyCust, keyNamespace)` - Enable key management for a custodian/namespace pair
-   `getManagedKeys(keyCust, keyNamespace)` - Query key status and metadata
-   `rotateSTK()` - Check for and propagate new system keys
-   `disableKeyManagement(keyCust, keyNamespace)` - Disable all the keys for a custodian/namespace
-   `disableManagedKey(keyCust, keyNamespace, keyMetadataHash)` - Disable a specific key
-   `rotateManagedKey(keyCust, keyNamespace)` - Rotate the active key
-   `refreshManagedKeys(keyCust, keyNamespace)` - Refresh from external KMS to validate all the keys.
-   Internal cache management operations for convenience and meeting SLAs.

**4. Persistent Key Metadata Storage**

-   New system table `hbase:keymeta` for storing key metadata and state which acts as an `L2` cache.
-   Tracks key lifecycle: `ACTIVE`, `INACTIVE`, `DISABLED`, `FAILED` states
-   Stores wrapped DEKs and metadata for key lookup without depending on external KMS.
-   Optimized for high-priority access with in-memory column families
-   Key metadata tracking with cryptographic hashes for integrity verification

**5. Multi-Layer Caching**

-   L1: In-memory Caffeine cache on RegionServers for hot key data
-   L2: Keymeta table for persistent key metadata that is shared across all RegionServers.
-   L3: Dynamic lookup from external KMS as fallback when not found in L2.
-   Cache invalidation mechanism for key rotation scenarios

**6. HBase Shell Integration**

-   `enable_key_management` - Enable key management for a custodian and namespace
-   `show_key_status` - Display key status and metadata
-   `rotate_stk` - Trigger system key rotation
-   `disable_key_management` - Disable key management for a custodian and namespace
-   `disable_managed_key` - Disable a specific key
-   `rotate_managed_key` - Rotate the active key
-   `refresh_managed_keys` - Refresh all keys for a custodian and namespace

-   **Backward Compatibility:** Changes are fully compatible with existing encryption-at-rest configuration
-   **Gradual step-by-step migration**: Well defined migration path from existing configuration to new configuration
-   **Performance:** Minimal overhead through efficient caching and lazy key loading
-   **Security:** Cryptographic verification of key metadata, secure key wrapping
-   **Operability:** Administrative tools for key life cycle and cache management
-   **Extensibility:** Plugin architecture for custom key provider implementations
-   **Testing:** Comprehensive unit and integration tests coverage

The implementation follows a layered architecture:

1.  **Provider Layer:** Pluggable `ManagedKeyProvider` for KMS integration
2.  **Management Layer:** `KeyMetaAdmin` API for administrative operations
3.  **Persistence Layer:** `KeymetaTableAccessor` for metadata storage
4.  **Cache Layer:** `ManagedKeyDataCache` and `SystemKeyCache` for performance
5.  **Service Layer:** Coprocessor endpoints for client-server communication

I would particularly appreciate feedback on:

1.  **API Design:** Is the `KeymetaAdmin` API intuitive and complete for common key management scenarios?
2.  **Security Model:** Does the double-wrapping architecture (DEK wrapped by STK, STK from KMS) provide appropriate security guarantees?
3.  **Performance:** Are there potential bottlenecks in the caching strategy or table access patterns?
4.  **Operational Aspects:** Are the administrative commands sufficient for the needs of operations and monitoring?
5.  **Testing Coverage:** Are there additional test scenarios we should cover?
6.  **Documentation:** Is the design document clear? What additional documentation would be helpful?
7.  **Compatibility:** Any concerns about interaction with existing HBase features?

After incorporating community feedback, I plan to:

1.  Address any issues identified during review
2.  Implement the work identified for future phases
3.  Add additional documentation to the reference guide

This PR introduces changes across multiple modules, so I recommend focusing on these **core components** first:

**Core Architecture:**

1.  Design document (linked above) - architectural overview
2.  `ManagedKeyProvider`, `KeymetaAdmin`, `ManagedKeyData` interfaces (hbase-common)
3.  `ManagedKeys.proto` - protocol definitions
4.  `HMaster` and misc. procedure changes - initialization of `keymeta` in a predictable order
5.  `FixedFileTrailer` + reader/writer changes - encode/decode additional encryption key in store files

**Key Implementation:**

1.  `KeymetaAdminImpl`, `KeymetaTableAccessor`, `ManagedKeyUtils`, `SystemKeyManager`, `SystemKeyAccessor` - admin operations and persistence
2.  `ManagedKeyDataCache`, `SystemKeyCache` - caching layer
3.  `SecurityUtil` - encryption context creation

**Client & Shell:**

1.  `KeymetaAdminClient` - client API
2.  Shell commands and Ruby wrappers

**Tests & Examples:**

1.  `TestKeymetaAdminImpl`, `TestManagedKeymeta` - for usage patterns
2.  `key_provider_keymeta_migration_test.rb` - E2E migration steps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants