-
-
Notifications
You must be signed in to change notification settings - Fork 158
add metrics collection for logs, traces and metrics separately #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metrics collection for logs, traces and metrics separately #1504
Conversation
WalkthroughTelemetryType was threaded through ingestion and event construction: Event gained a public Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Ingest as Ingestion Handlers
participant OTEL as OTEL Router/Parser
participant Proc as Flatten/Push Processor
participant Storage as Event Storage
participant Metrics as Metrics System
Client->>Ingest: POST /ingest (payload)
activate Ingest
Ingest->>OTEL: process_otel_content(body, telemetry_type)
activate OTEL
note right of OTEL: Determine TelemetryType (Logs/Metrics/Traces)
OTEL-->>Proc: flatten_and_push_logs(..., telemetry_type)
deactivate OTEL
activate Proc
note right of Proc: Build Event(s) with telemetry_type
Proc->>Storage: push events (include telemetry_type)
note right of Proc: increment_[logs|metrics|traces]_collected_by_date(count, date)
deactivate Proc
activate Storage
Storage->>Metrics: increment_events_ingested_size_by_date(size, date, telemetry_type)
deactivate Storage
activate Metrics
note right of Metrics: Route increments to type-specific\nTOTAL_{LOGS|METRICS|TRACES}_COLLECTED[_SIZE]_BY_DATE
deactivate Metrics
deactivate Ingest
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/metrics/mod.rs (1)
379-449: Clarify the purpose of the "team" label with hardcoded "all" value.All new metrics use
["team", "date"]labels, but the helper functions always pass"all"as the team value (e.g., line 738, 744, 750). This pattern suggests either:
- The "team" dimension is intended for future multi-tenancy but currently unused
- The label name might be more appropriately named (e.g., "scope" or "aggregate")
If "team" is a placeholder for future functionality, consider adding a brief comment. If it's for aggregation purposes, the label name could be misleading. Based on learnings, similar global metrics use
["format", "date"]labels for aggregation.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/event/format/json.rssrc/event/format/mod.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/handlers/http/ingest.rssrc/handlers/http/modal/utils/ingest_utils.rssrc/metrics/mod.rssrc/otel/logs.rssrc/otel/metrics.rssrc/otel/traces.rssrc/storage/field_stats.rs
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-18T19:10:11.941Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.
Applied to files:
src/storage/field_stats.rssrc/otel/logs.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/field_stats.rssrc/otel/logs.rssrc/handlers/http/modal/utils/ingest_utils.rssrc/otel/metrics.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/otel/traces.rssrc/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/field_stats.rssrc/otel/logs.rssrc/otel/metrics.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/otel/traces.rssrc/metrics/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
Applied to files:
src/storage/field_stats.rssrc/otel/logs.rssrc/otel/metrics.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/otel/traces.rssrc/handlers/http/ingest.rssrc/metrics/mod.rs
📚 Learning: 2025-10-20T17:48:53.444Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.
Applied to files:
src/otel/logs.rssrc/otel/metrics.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/metrics/mod.rs
📚 Learning: 2025-08-18T17:59:31.642Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:149-156
Timestamp: 2025-08-18T17:59:31.642Z
Learning: The time_partition parameter in push_logs() function in src/handlers/http/modal/utils/ingest_utils.rs is determined by the caller (flatten_and_push_logs). OSS callers pass None, enterprise callers pass the appropriate value (None or Some<>), and OTEL callers always pass None. The push_logs() function should not add additional time partition logic since it's already handled at the caller level.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rssrc/handlers/http/ingest.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T14:48:46.324Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:256-258
Timestamp: 2025-08-18T14:48:46.324Z
Learning: OTEL streams (OtelLogs, OtelMetrics, OtelTraces) in Parseable are intentionally designed to not support time partition functionality. The time_partition parameter should always be None for OTEL ingestion paths in flatten_and_push_logs calls.
Applied to files:
src/event/format/json.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/event/mod.rs
📚 Learning: 2025-09-25T07:12:33.401Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:727-735
Timestamp: 2025-09-25T07:12:33.401Z
Learning: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted. The increment_object_store_calls_by_date() calls should be placed after the await? to ensure they only execute on successful operations.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:40.189Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/gcs.rs:625-633
Timestamp: 2025-09-25T07:12:40.189Z
Learning: In Parseable's object storage metrics system, metrics should only be captured when API calls succeed, not when they error out. The current pattern of calling increment_object_store_calls_by_date and related metrics functions after the `?` operator is the correct approach.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:27.407Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:693-700
Timestamp: 2025-09-25T07:12:27.407Z
Learning: In the Parseable codebase, object store metrics (increment_object_store_calls_by_date, increment_files_scanned_in_object_store_calls_by_date, etc.) should only be recorded for successful operations, not for failed attempts. The metric calls should be placed after error handling operators like `?` to ensure they only execute when operations succeed.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T18:01:22.834Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.
Applied to files:
src/handlers/http/ingest.rs
🧬 Code graph analysis (7)
src/otel/logs.rs (1)
src/metrics/mod.rs (1)
increment_logs_collected_by_date(742-746)
src/otel/metrics.rs (1)
src/metrics/mod.rs (1)
increment_metrics_collected_by_date(736-740)
src/event/mod.rs (1)
src/metrics/mod.rs (1)
increment_events_ingested_size_by_date(634-659)
src/handlers/http/cluster/mod.rs (2)
src/alerts/alerts_utils.rs (2)
value(472-472)value(474-474)src/alerts/mod.rs (5)
value(414-414)value(422-422)value(430-430)value(443-443)value(454-454)
src/otel/traces.rs (1)
src/metrics/mod.rs (1)
increment_traces_collected_by_date(748-752)
src/handlers/http/ingest.rs (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
flatten_and_push_logs(51-137)
src/metrics/mod.rs (1)
src/handlers/http/mod.rs (1)
metrics_path(64-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (21)
src/event/format/mod.rs (1)
34-34: LGTM! Trait signature correctly extended with TelemetryType parameter.The addition of
telemetry_type: TelemetryTypeto theinto_eventtrait method enables telemetry-aware metrics tracking throughout the event pipeline.Also applies to: 224-225
src/storage/field_stats.rs (1)
134-134: LGTM! Internal stats correctly tagged with TelemetryType::Logs.The field statistics stream is appropriately classified as
TelemetryType::Logssince it represents internal system logging about dataset statistics.Also applies to: 164-164
src/otel/traces.rs (1)
78-80: Metrics incremented during flattening - verify this is the intended behavior.The trace count metric is incremented during the flattening phase using
Utc::now()for the date. This counts spans as they're being processed, regardless of whether downstream ingestion succeeds. This is consistent with the pattern used insrc/otel/logs.rsandsrc/otel/metrics.rs.Confirm this aligns with your billing requirements - should traces be counted at processing time or only after successful ingestion?
src/otel/logs.rs (1)
150-152: LGTM! Log count metric implementation is consistent with the traces pattern.The metric increment correctly counts log records per scope and uses
Utc::now()for date labeling, which is consistent with the UTC-normalized approach mentioned in the learnings.src/event/mod.rs (1)
56-56: LGTM! Event struct and size metric dispatch correctly updated.The
telemetry_typefield is properly threaded through the Event struct, andincrement_events_ingested_size_by_datenow routes size metrics to the appropriate per-telemetry-type counter. Based on the relevant snippet fromsrc/metrics/mod.rs, this correctly dispatches toTOTAL_LOGS_COLLECTED_SIZE_BY_DATE,TOTAL_METRICS_COLLECTED_SIZE_BY_DATE, orTOTAL_TRACES_COLLECTED_SIZE_BY_DATEdepending on the type.Also applies to: 97-97
src/otel/metrics.rs (1)
545-547: LGTM! Metrics count implementation follows the established pattern.The metric increment is correctly placed within the scope metrics processing loop, counting
metrics.len()for each scope. This is consistent with the logs and traces implementations.src/handlers/http/cluster/mod.rs (4)
103-108: LGTM! BillingMetricsCollector struct properly extended with new metric fields.The new fields for size and count metrics (logs, metrics, traces) are correctly added with
HashMap<String, u64>type, matching the existing pattern for date-based metrics. TheDefaultderive handles initialization.
210-244: LGTM! Billing event emission correctly extended for new metrics.The
add_simple_metriccalls properly emit events for all six new metric types following the established pattern.
1316-1320: LGTM! Metric name classification properly extended.All six new parseable metric names are correctly added to
is_simple_metricfor proper routing during sample processing.
1392-1416: LGTM! Sample processing correctly extended for new metrics.The match arms properly populate the corresponding collector fields for each new metric type, following the existing pattern.
src/event/format/json.rs (1)
34-36: LGTM! EventFormat trait implementation correctly updated.The
into_eventmethod properly accepts and passes through thetelemetry_typeparameter to the constructedEventstruct, implementing the updated trait signature.Also applies to: 152-152, 185-185
src/handlers/http/modal/utils/ingest_utils.rs (2)
51-137: LGTM! TelemetryType threading is consistent across all log source paths.The
telemetry_typeparameter is correctly propagated through all branches (Kinesis, OtelLogs, OtelTraces, OtelMetrics, and default) topush_logs, ensuring consistent telemetry tracking regardless of the log source.
139-184: LGTM!The
push_logsfunction correctly accepts and forwardstelemetry_typeto the event construction, enabling downstream metrics tracking.src/handlers/http/ingest.rs (5)
76-80: Verify intended behavior:ingest()accepts telemetry type from header butpost_event()defaults to Logs.In
ingest(),telemetry_typeis parsed from theTELEMETRY_TYPE_KEYheader, allowing clients to specify the type. However,post_event()(lines 417-425) hardcodesTelemetryType::Logs. This inconsistency could lead to incorrect metrics if clients expect to use the header-based approach for both endpoints.Is this intentional? If
post_eventshould also support telemetry type from headers for non-OTEL ingestion paths, consider extracting it similarly.
291-353: LGTM! OTEL handlers correctly map to their respective TelemetryTypes.Each OTEL handler consistently passes the appropriate
TelemetryTypeto bothsetup_otel_streamandprocess_otel_content:
- Logs →
TelemetryType::Logs- Metrics →
TelemetryType::Metrics- Traces →
TelemetryType::Traces
242-286: LGTM!The
process_otel_contentfunction correctly accepts and propagatestelemetry_typeto the downstreamflatten_and_push_logscall.
141-165: LGTM!Internal streams correctly use
TelemetryType::Logssince they represent Parseable's own internal logging data.
430-449: LGTM!The
push_logs_uncheckedfunction correctly initializes the event withTelemetryType::Logsas the default for unchecked batch ingestion.src/metrics/mod.rs (3)
634-658: LGTM! Telemetry-aware size metric routing is correctly implemented.The function properly routes size metrics to the appropriate counter based on
TelemetryType. The pattern matching is exhaustive, and treatingTelemetryType::Eventsthe same asTelemetryType::Logsis semantically appropriate.
553-567: LGTM!All five new metrics are properly registered with the Prometheus registry.
736-752: All three new count increment functions are already properly invoked from the OTEL processing paths.increment_logs_collected_by_dateis called insrc/otel/logs.rs:151,increment_metrics_collected_by_dateis called insrc/otel/metrics.rs:546, andincrement_traces_collected_by_dateis called insrc/otel/traces.rs:79.Likely an incorrect or invalid review comment.
add below metrics - 1. total logs collected by date 2. total size of logs collected by date 3. total metrics collected by date 4. total size of metrics collected by date 5. total traces collected by date 6. total size of traces collected by date add all of these to pbilling
b86629a to
18e455e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/connectors/kafka/processor.rs (1)
54-62: Consider usingTelemetryType::Logsexplicitly for stream creation.There's an inconsistency:
create_stream_if_not_existsusesTelemetryType::default()(line 60) whileinto_eventusesTelemetryType::Logsexplicitly (line 94). For consistency and clarity, consider usingTelemetryType::Logsin both places since Kafka ingestion is processing log data.🔎 Suggested change
PARSEABLE .create_stream_if_not_exists( stream_name, StreamType::UserDefined, None, vec![log_source_entry], - TelemetryType::default(), + TelemetryType::Logs, ) .await?;
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/connectors/kafka/processor.rssrc/event/format/json.rssrc/event/format/mod.rssrc/event/mod.rssrc/handlers/http/cluster/mod.rssrc/handlers/http/ingest.rssrc/handlers/http/modal/utils/ingest_utils.rssrc/metrics/mod.rssrc/otel/logs.rssrc/otel/metrics.rssrc/otel/traces.rssrc/storage/field_stats.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/otel/logs.rs
- src/otel/traces.rs
- src/event/format/mod.rs
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/s3.rs:837-844
Timestamp: 2025-09-25T07:13:58.909Z
Learning: In the Parseable codebase, the existing pattern of placing metrics calls after the `?` operator in S3 list_hours method is correct and should not be changed. The metrics are properly recorded only on successful operations.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
📚 Learning: 2025-08-18T14:48:46.324Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:256-258
Timestamp: 2025-08-18T14:48:46.324Z
Learning: OTEL streams (OtelLogs, OtelMetrics, OtelTraces) in Parseable are intentionally designed to not support time partition functionality. The time_partition parameter should always be None for OTEL ingestion paths in flatten_and_push_logs calls.
Applied to files:
src/event/format/json.rs
📚 Learning: 2025-08-18T19:10:11.941Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.
Applied to files:
src/storage/field_stats.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/field_stats.rssrc/otel/metrics.rssrc/event/mod.rssrc/handlers/http/modal/utils/ingest_utils.rssrc/metrics/mod.rssrc/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/field_stats.rssrc/otel/metrics.rssrc/event/mod.rssrc/metrics/mod.rssrc/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
Applied to files:
src/otel/metrics.rssrc/event/mod.rssrc/connectors/kafka/processor.rssrc/metrics/mod.rssrc/handlers/http/cluster/mod.rssrc/handlers/http/ingest.rs
📚 Learning: 2025-10-20T17:48:53.444Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.
Applied to files:
src/otel/metrics.rssrc/event/mod.rssrc/metrics/mod.rssrc/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/event/mod.rs
📚 Learning: 2025-08-18T17:59:31.642Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:149-156
Timestamp: 2025-08-18T17:59:31.642Z
Learning: The time_partition parameter in push_logs() function in src/handlers/http/modal/utils/ingest_utils.rs is determined by the caller (flatten_and_push_logs). OSS callers pass None, enterprise callers pass the appropriate value (None or Some<>), and OTEL callers always pass None. The push_logs() function should not add additional time partition logic since it's already handled at the caller level.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rssrc/handlers/http/ingest.rs
📚 Learning: 2025-03-26T06:44:53.362Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.
Applied to files:
src/handlers/http/modal/utils/ingest_utils.rs
📚 Learning: 2025-09-25T07:12:33.401Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:727-735
Timestamp: 2025-09-25T07:12:33.401Z
Learning: In Parseable's object storage implementations, metrics should only be captured when operations succeed, not when they are attempted. The increment_object_store_calls_by_date() calls should be placed after the await? to ensure they only execute on successful operations.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:40.189Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/gcs.rs:625-633
Timestamp: 2025-09-25T07:12:40.189Z
Learning: In Parseable's object storage metrics system, metrics should only be captured when API calls succeed, not when they error out. The current pattern of calling increment_object_store_calls_by_date and related metrics functions after the `?` operator is the correct approach.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-09-25T07:12:27.407Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1441
File: src/storage/azure_blob.rs:693-700
Timestamp: 2025-09-25T07:12:27.407Z
Learning: In the Parseable codebase, object store metrics (increment_object_store_calls_by_date, increment_files_scanned_in_object_store_calls_by_date, etc.) should only be recorded for successful operations, not for failed attempts. The metric calls should be placed after error handling operators like `?` to ensure they only execute when operations succeed.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.
Applied to files:
src/handlers/http/cluster/mod.rs
📚 Learning: 2025-08-18T18:01:22.834Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.
Applied to files:
src/handlers/http/ingest.rs
🧬 Code graph analysis (4)
src/otel/metrics.rs (1)
src/metrics/mod.rs (1)
increment_metrics_collected_by_date(736-740)
src/event/mod.rs (1)
src/metrics/mod.rs (1)
increment_events_ingested_size_by_date(634-659)
src/metrics/mod.rs (1)
src/handlers/http/mod.rs (1)
metrics_path(64-66)
src/handlers/http/ingest.rs (1)
src/handlers/http/modal/utils/ingest_utils.rs (1)
flatten_and_push_logs(51-137)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (29)
src/storage/field_stats.rs (1)
164-164: LGTM!The addition of
TelemetryType::Logsfor internal dataset statistics events is consistent with the broader telemetry propagation pattern. This correctly categorizes the internal stats events as log-type telemetry.src/connectors/kafka/processor.rs (1)
94-94: LGTM!The explicit
TelemetryType::Logsfor Kafka event processing is correct and aligns with the telemetry type propagation pattern.src/event/mod.rs (2)
56-56: LGTM!The
telemetry_typefield addition to theEventstruct is well-designed. It correctly propagates telemetry context through the event processing pipeline.
97-97: LGTM!The telemetry type is correctly passed to
increment_events_ingested_size_by_date, enabling per-type (Logs/Metrics/Traces) size tracking. Based on retrieved learnings, theparsed_timestampused here is correctly UTC-normalized.src/handlers/http/cluster/mod.rs (4)
103-108: LGTM!The new per-date billing metrics fields are well-structured and follow the established pattern for telemetry-type aware metrics collection. The Default derive ensures proper initialization.
210-244: LGTM!The emission logic for the new metrics (logs/traces/metrics collected and their sizes) correctly follows the established pattern. Each metric is only emitted when its map is non-empty, which is consistent with the existing billing metrics handling.
1316-1320: LGTM!The
is_simple_metricfunction correctly extended to recognize the new per-date telemetry metrics.
1392-1416: LGTM!The
process_simple_metricfunction correctly processes the new per-date telemetry metrics using.insert()to store counter values directly. Based on retrieved learnings, this is the intended behavior for billing metrics processing.src/otel/metrics.rs (1)
545-546: LGTM!The batch-level metrics counting is correctly implemented. The date is computed once per scope metric iteration, and
increment_metrics_collected_by_dateis called with the count of metrics in that scope. This approach is more efficient than per-metric incrementing.src/event/format/json.rs (2)
152-152: LGTM!The
into_eventsignature correctly extended to acceptTelemetryType, enabling telemetry context propagation through the event construction pipeline.
185-185: LGTM!The
telemetry_typefield is correctly set in the Event construction, completing the propagation chain.src/handlers/http/modal/utils/ingest_utils.rs (3)
51-57: LGTM!The
flatten_and_push_logsfunction signature correctly extended withtelemetry_typeparameter, enabling telemetry type propagation through all ingestion paths.
139-145: LGTM!The
push_logsfunction signature correctly extended to accept and forwardtelemetry_typeto the event construction.
179-179: LGTM!The
telemetry_typeis correctly passed tointo_event, completing the propagation chain from ingestion entry points to event construction.src/handlers/http/ingest.rs (9)
76-80: LGTM!The telemetry type is correctly extracted from the request header with a sensible default fallback. This allows clients to specify the telemetry type via the
TELEMETRY_TYPE_KEYheader.
128-136: LGTM!The
flatten_and_push_logscall correctly passes through the user-specifiedtelemetry_typefrom the request header.
160-160: LGTM!Internal stream events are correctly tagged with
TelemetryType::Logs.
247-247: LGTM!The
process_otel_contentfunction signature correctly extended to accepttelemetry_type, enabling per-OTEL-type metrics collection.
303-303: LGTM!OTEL logs ingestion correctly uses
TelemetryType::Logs.
323-330: LGTM!OTEL metrics ingestion correctly uses
TelemetryType::Metrics.
350-350: LGTM!OTEL traces ingestion correctly uses
TelemetryType::Traces.
417-425: LGTM!The
post_eventendpoint correctly defaults toTelemetryType::Logsfor log ingestion via the logstream endpoint.
444-444: LGTM!The
push_logs_uncheckedfunction correctly setsTelemetryType::Logsfor unchecked event processing.src/metrics/mod.rs (6)
20-23: LGTM!The imports are correctly added to support the new telemetry-type-aware metrics functionality.
391-449: LGTM!The new metrics are well-structured with clear naming, appropriate counter types for billing purposes, and consistent label structure across all telemetry types.
553-567: LGTM!All new metrics are properly registered following the existing pattern.
634-659: Verify TelemetryType::Events handling.In the match statement,
TelemetryType::Eventsis treated identically toTelemetryType::Logs, both incrementingTOTAL_LOGS_COLLECTED_SIZE_BY_DATE. Please confirm this is intentional. If Events and Logs should be tracked separately for billing purposes, they need distinct metrics.Additionally, verify that using
"all"as the team label value for global aggregation aligns with your billing requirements.
736-752: LGTM!The helper functions are well-structured with consistent signatures and correctly use the
["all", date]label pattern for global aggregation metrics.
379-389: The metric label change from["date"]to["team", "date"]is confirmed, but verify external consumer impact.The
TOTAL_METRICS_COLLECTED_BY_DATEmetric was introduced in the previous commit with labels["date"]and modified in this commit to["team", "date"]. This label structure change will break any external Prometheus queries, dashboards, or alerts using the old format.However, since this metric was added only 1-2 days ago and no Prometheus/Grafana configurations are present in the repo, the immediate impact appears limited. All internal code usages have been consistently updated to provide both
["all", date]values. Before merging, confirm that no external monitoring systems are querying this metric with the old label schema.
add below metrics -
add all of these to pbilling
Summary by CodeRabbit
New Features
Refactor
Bug Fixes / Improvements
✏️ Tip: You can customize this high-level summary in your review settings.