Skip to content

[fix](cloud) add fe to ms rpc metrics#60574

Open
mymeiyi wants to merge 1 commit intoapache:masterfrom
mymeiyi:add-fe-rpc-to-ms-metric
Open

[fix](cloud) add fe to ms rpc metrics#60574
mymeiyi wants to merge 1 commit intoapache:masterfrom
mymeiyi:add-fe-rpc-to-ms-metric

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Feb 6, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings February 6, 2026 09:34
@Thearas
Copy link
Contributor

Thearas commented Feb 6, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive metrics tracking for FE (Frontend) to MS (Meta Service) RPC calls in cloud mode. The changes introduce per-method and aggregate RPC metrics including total calls, failures, retries, latency, and requests-per-second calculations.

Changes:

  • Added new metrics infrastructure in CloudMetrics for tracking meta-service RPC calls (per-method and aggregate)
  • Instrumented all RPC methods in MetaServiceProxy with metrics tracking using a new executeWithMetrics helper
  • Added metric calculation logic in MetricCalculator for computing RPS rates from counter deltas
  • Updated test signatures to match the new executeRequest method signature that includes methodName

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
fe/fe-core/src/main/java/org/apache/doris/metric/CloudMetrics.java Defines per-method and aggregate meta-service RPC metrics (total, failed, retry, latency, RPS)
fe/fe-core/src/main/java/org/apache/doris/metric/MetricCalculator.java Adds initialization and interval-based calculation logic for meta-service RPC rate metrics
fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java Adds updateMetaServiceRpcPerSecond helper method for updating per-method RPS gauges
fe/fe-core/src/main/java/org/apache/doris/cloud/rpc/MetaServiceProxy.java Instruments all RPC methods with metrics tracking, adds executeWithMetrics helper, updates executeRequest signature
fe/fe-core/src/test/java/org/apache/doris/cloud/rpc/MetaServiceProxyTest.java Updates test calls to match new executeRequest signature with methodName parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 98 to 124
public Cloud.GetInstanceResponse getInstance(Cloud.GetInstanceRequest request)
throws RpcException {
long startTime = System.currentTimeMillis();
String methodName = "getInstance";
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_ALL_TOTAL.increase(1L);
CloudMetrics.META_SERVICE_RPC_TOTAL.getOrAdd(methodName).increase(1L);
}

try {
final MetaServiceClient client = getProxy();
return client.getInstance(request);
Cloud.GetInstanceResponse response = client.getInstance(request);
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_LATENCY.getOrAdd(methodName)
.update(System.currentTimeMillis() - startTime);
}
return response;
} catch (Exception e) {
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_ALL_FAILED.increase(1L);
CloudMetrics.META_SERVICE_RPC_FAILED.getOrAdd(methodName).increase(1L);
CloudMetrics.META_SERVICE_RPC_LATENCY.getOrAdd(methodName)
.update(System.currentTimeMillis() - startTime);
}
throw new RpcException("", e.getMessage(), e);
}
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getInstance method duplicates the metrics tracking logic that exists in executeWithMetrics. This method should use executeWithMetrics like other RPC methods to avoid code duplication and ensure consistent metrics tracking.

Additionally, getInstance bypasses the retry logic in MetaServiceClientWrapper.executeRequest by directly calling getProxy() and client.getInstance(request), while all other methods benefit from the retry mechanism. This inconsistency could lead to different reliability characteristics for this method compared to others.

Copilot uses AI. Check for mistakes.
Comment on lines 286 to 341
@@ -242,10 +305,20 @@ public Future<Cloud.GetVersionResponse> getVisibleVersionAsync(Cloud.GetVersionR
new com.google.common.util.concurrent.FutureCallback<Cloud.GetVersionResponse>() {
@Override
public void onSuccess(Cloud.GetVersionResponse result) {
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_LATENCY.getOrAdd(methodName)
.update(System.currentTimeMillis() - startTime);
}
}

@Override
public void onFailure(Throwable t) {
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_ALL_FAILED.increase(1L);
CloudMetrics.META_SERVICE_RPC_FAILED.getOrAdd(methodName).increase(1L);
CloudMetrics.META_SERVICE_RPC_LATENCY.getOrAdd(methodName)
.update(System.currentTimeMillis() - startTime);
}
if (finalClient != null) {
finalClient.shutdown(true);
}
@@ -254,6 +327,12 @@ public void onFailure(Throwable t) {
}
return future;
} catch (Exception e) {
if (MetricRepo.isInit && Config.isCloudMode()) {
CloudMetrics.META_SERVICE_RPC_ALL_FAILED.increase(1L);
CloudMetrics.META_SERVICE_RPC_FAILED.getOrAdd(methodName).increase(1L);
CloudMetrics.META_SERVICE_RPC_LATENCY.getOrAdd(methodName)
.update(System.currentTimeMillis() - startTime);
}
if (client != null) {
client.shutdown(true);
}
@@ -262,162 +341,164 @@ public void onFailure(Throwable t) {
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getVisibleVersionAsync method also duplicates the metrics tracking logic instead of using a consistent pattern. Unlike the other synchronous RPC methods that now use executeWithMetrics, this async method manually tracks metrics in multiple places (before the call, in onSuccess, in onFailure, and in the catch block).

Consider creating an async-specific metrics wrapper method similar to executeWithMetrics to reduce code duplication and ensure consistent metrics tracking across both synchronous and asynchronous RPC calls.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants