Skip to content

Conversation

@chenhuansome
Copy link
Contributor

@chenhuansome chenhuansome commented Sep 28, 2025

What problem does this PR solve?

  1. Switched the OTel server: Replaced the Jaeger exporter with an OTLP gRPC exporter, connecting to the local ts-trace service
  2. Enhanced trace integration tests: Added a complete trace data verification mechanism
  3. Added fault tolerance tests: Introduced trace system failure tolerance tests
  4. Improved test workflow: Integrated ts-trace service startup into the CI workflow

Core Design

  1. Interceptor pattern: Automatically adds tracing logic before and after client operations via OtelInterceptor
  2. Asynchronous batch processing: Uses BatchSpanProcessor to improve performance and minimize impact on the main flow
  3. Resource cleanup: Automatically cleans up trace data before and after each test case to ensure test isolation

Key Implementation

// Trace verification mechanism
private void checkLastTraceCommand(String command) {
    String queryTraceCommand = "SELECT command FROM \"%s\"".formatted(measurementName);
    // Query and verify trace data
}

// Fault tolerance design
private Tracer getErrTracer() {
    // Use an incorrect endpoint to test failure tolerance
    .setEndpoint("http://127.0.0.1:38086")
}

Test Coverage

  • Basic query operation tracing: Validates tracing for commands like SHOW DATABASES
  • Write operation tracing: Validates tracing for Line Protocol writes
  • Integration tests: Complete tracing for database creation, write, and query workflows
  • Fault tolerance tests: Client behavior when the OTel service is unavailable

CI Configuration Update
Automatically starts the ts-trace service before unit tests

- name: setup ts-trace
  run: go install github.com/openGemini/observability/trace/cmd/ts-trace@latest && ts-trace &

Achieved Results

  • [✓ ] Achieved seamless integration of OpenTelemetry with the OpenGemini client

  • [✓ ] Provided end-to-end trace data verification capability

  • [✓ ] Ensured the client can still function normally when the OTel service is unavailable

  • [✓ ] Improved unit test coverage and reliability

*related documents:
openGemini/openGemini.github.io#163

@dosubot
Copy link

dosubot bot commented Sep 28, 2025

Documentation Updates

Checked 2 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

@chenhuansome chenhuansome changed the title Opentelemetry observability feat: integrate Opentelemetry observability Sep 28, 2025
@chenhuansome chenhuansome force-pushed the opentelemetry-observability branch from 6e503b9 to 0b8da23 Compare September 29, 2025 13:34
@chenhuansome chenhuansome force-pushed the opentelemetry-observability branch 2 times, most recently from 211eccd to 184a1d7 Compare October 25, 2025 11:47
@chenhuansome chenhuansome changed the title feat: integrate Opentelemetry observability feat: switch the otel server for unit test integration Oct 25, 2025
@chenhuansome chenhuansome force-pushed the opentelemetry-observability branch 3 times, most recently from 79c897c to 4eb59f0 Compare October 26, 2025 11:13
Signed-off-by: chenhuan <xiangyuyu_2024@qq.com>
@chenhuansome chenhuansome force-pushed the opentelemetry-observability branch from 4eb59f0 to 1749543 Compare October 26, 2025 11:21
@weiping-code weiping-code merged commit f0cc51c into openGemini:main Oct 26, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants