-
Notifications
You must be signed in to change notification settings - Fork 150
DO NOT MERGE: add metrics exporter for ee #4014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
408d468 to
92a7f8a
Compare
65c7cbd to
b30c70d
Compare
PR Review: Metrics Exporter for EESummary: This PR adds Kafka-based metrics export functionality for enterprise edition, allowing usage metrics to be exported periodically from the namespace to Kafka. Code Quality & Best PracticesStrengths:
Issues Found:
Performance ConsiderationsGood:
Concerns:
Security Concerns
Test CoverageCritical Missing Tests:
Recommended test coverage:
Architecture & DesignConcerns:
Specific Recommendations
Additional Notes
Verdict: This PR has good structure but needs significant work before merge:
Please address the TODO, add tests, and consider the architecture feedback before removing "DO NOT MERGE". |
PR Review: Metrics Exporter for EEThis PR adds a Kafka-based metrics exporter for enterprise edition. Overall, the implementation is well-structured, but there are several issues to address before merging. Critical Issues1. TODO placeholder in production code (engine/packages/namespace/src/workflows/metrics_exporter.rs:122)project: "TODO".to_string(),The
2. Missing
|
Code Review: Metrics Exporter for Enterprise EditionOverviewThis PR adds a Kafka-based metrics exporter for collecting and transmitting usage metrics. The implementation follows good architectural patterns but has several areas that need attention before merging. Critical Issues1. Hardcoded TODO Value (engine/packages/namespace/src/workflows/metrics_exporter.rs:127)project: "TODO".to_string(),Issue: The Recommendation: Determine where the project identifier should come from (config, namespace metadata, etc.) and implement proper resolution. This should be resolved before merge. Security Concerns2. Sensitive Credentials in LogsLocation: engine/packages/pools/src/db/kafka.rs:7 The tracing statement Recommendation: Verify error handling doesn't leak credentials and consider wrapping the connection setup with explicit error sanitization. Code Quality Issues3. Missing Display Trait DerivationLocation: engine/packages/namespace/src/keys/metric.rs:78
Current: #[derive(strum::FromRepr)]
pub enum MetricVariant { ... }
impl std::fmt::Display for MetricVariant { ... }Recommendation: #[derive(strum::FromRepr, strum::Display)]
#[strum(serialize_all = "snake_case")]
pub enum MetricVariant { ... }This would eliminate the manual Display implementation and ensure consistency. 4. Inconsistent Error HandlingLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:156 The Kafka send error handling discards the message: .map_err(|(err, _)| anyhow::Error::from(err))Issue: When a Kafka send fails, the message is lost without any retry mechanism or dead letter queue. Recommendation: Consider:
5. Transaction Timeout LogicLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:82-84 if start.elapsed() > EARLY_TXN_TIMEOUT {
tracing::warn\!("timed out processing pending actors metrics");
break;
}Issue: When the timeout occurs, partial metrics are exported and the continuation logic via Recommendation: Add more context to the warning log (how many metrics were processed, what the last key was) and consider if this timeout is appropriate or if chunking should be handled differently. Performance Considerations6. String Allocations in Hot PathLocation: engine/packages/namespace/src/keys/metric.rs:46-75 The Recommendation: Consider:
7. Unbounded Metrics CollectionLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:48-112 The outer loop collects all metrics into a single Current behavior: Chunks DB reads via timeout, but accumulates all results in memory. Recommendation: Consider streaming metrics directly to Kafka instead of accumulating them all in memory first. Process and send each chunk before fetching the next. 8. Buffer Size ConfigurationLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:159 .buffer_unordered(1024)The buffer size of 1024 concurrent Kafka sends matches the chunk size, which could create significant memory pressure and network load spikes. Recommendation: Consider making this configurable or using a smaller buffer (e.g., 32-64) for more controlled resource usage. Best Practices & Style9. Magic NumbersSeveral magic numbers lack explanation:
Recommendation: Extract to named constants with documentation. 10. Logging StandardsLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:83 tracing::warn\!("timed out processing pending actors metrics");According to CLAUDE.md, log messages should use structured logging parameters, not formatted strings. Recommendation: tracing::warn\!(
elapsed_ms = ?start.elapsed().as_millis(),
"timed out processing pending actors metrics"
);11. Comment QualityLocation: engine/packages/namespace/src/workflows/metrics_exporter.rs:118 // Chunk metrics into 1024 rowsPer CLAUDE.md guidelines, comments should be complete sentences. Recommendation: "Chunk metrics into batches of 1024 rows for Kafka transmission." Testing Gaps12. Missing Test CoverageNo tests are included for:
Recommendation: Add unit tests for:
Documentation13. Missing DocumentationThe metrics exporter workflow and activity lack doc comments explaining:
Recommendation: Add module-level and function-level doc comments, especially for public APIs like Architecture & Design14. Kafka Producer LifecycleLocation: engine/packages/pools/src/pools.rs:87 The Kafka producer is created once at pool initialization but may fail silently if Kafka is unavailable at startup. Recommendation: Consider health checks and reconnection logic, or document the fail-fast behavior. 15. Workflow Initialization TimingLocation: engine/packages/pegboard-gateway/src/lib.rs:167, 297 The metrics exporter workflow is initialized on every HTTP and WebSocket request via Concerns:
Recommendation: Consider moving workflow initialization to namespace creation time or document why per-request initialization is preferred. Minor Issues16. Workspace Dependency FormattingLocation: Cargo.toml:138-140 [workspace.dependencies.rdkafka]
version = "0.38.0"
features = [ "ssl" ]Issue: Inconsistent indentation (4 spaces vs 2 spaces used elsewhere in the file). Recommendation: Use 2-space indentation to match the file's style. 17. Unnecessary DependencyLocation: Cargo.lock:2533 dependencies = [
"cc",
+ "libc",
"pkg-config",Verify if this SummarySeverity Breakdown:
Recommendation: Address the critical TODO issue and error handling concerns before merging. Consider the performance implications for high-traffic namespaces. Positive Notes:
|
92a7f8a to
91e86f9
Compare
b30c70d to
bd52f31
Compare
91e86f9 to
58af8e5
Compare
bd52f31 to
22c21f0
Compare
58af8e5 to
399c5f7
Compare

No description provided.