-
Notifications
You must be signed in to change notification settings - Fork 726
feat: add infra for segments metrics (CM-708) #3728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_dashboard_metrics_subproject_sink.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_agg_states_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_agg_states_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_agg_states_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_segment_metrics_project_sink.pipe
Outdated
Show resolved
Hide resolved
services/libs/tinybird/pipes/cdp_segment_metrics_project_sink.pipe
Outdated
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
355ef96 to
bd87c2e
Compare
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/libs/tinybird/pipes/cdp_segment_metrics_copy_pipe.pipe
Outdated
Show resolved
Hide resolved
bd87c2e to
4e416ec
Compare
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
epipav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍 added one comment and a nitpick
| FROM segments AS s | ||
| LEFT JOIN | ||
| cdp_segment_metrics_ds AS sa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: AFAIK, it's better to keep the smaller table on the right-hand side of the join for performance. If we see bad performance, let's try switching places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the moment we have something like ~10 s, would that be reasonable for now ?
Introduce Segment Aggregate States for Scalable CDP Metrics
Summary
This PR introduces a new architecture for computing CDP dashboard metrics that is designed to be scalable, predictable, and safe at large data volumes.
The main change is the introduction of a segment-level aggregate states datasource, built daily via a COPY pipe, and three lightweight sinks that derive metrics for:
Heavy aggregation and DISTINCT logic is moved out of the sinks and into a single batch pipeline.
New Components
1. Datasource
cdp_segment_metrics_agg_states_dsA new datasource that stores one row per subproject segment per daily snapshot, containing:
uniqCombined)segmentId,parentId,grandparentId)This datasource uses
AggregatingMergeTreeand is optimized for hierarchical rollups via*Merge()functions.2. COPY Pipe (Core of the Change)
cdp_segment_metrics_agg_states_copy.pipeThis is the most complex and important component introduced in this PR.
Responsibilities:
cdp_member_segment_aggregates_dscdp_organization_segment_aggregates_dssegmentscdp_segment_metrics_agg_states_dsThis pipe runs once per day and centralizes all heavy computation and DISTINCT logic.
3. Sinks
All sinks are intentionally simple and fast.
They only:
cdp_dashboard_metrics_subproject_sink.pipePublishes metrics at subproject level by finalizing segment-level states.
cdp_dashboard_metrics_project_sink.pipeRolls up subproject states to project level using
parentIdand state merging.cdp_dashboard_metrics_project_group_sink.pipeRolls up subproject states to project group
Note
Introduces a scalable, state-based pipeline for CDP segment metrics.
cdp_segment_metrics_ds(AggregatingMergeTree) storing per-segment daily states:count(total/last-30) anduniqCombined(members/orgs), with hierarchy IDscdp_segment_metrics_copy_pipecomputes states from existing member/org aggregates and latest activities snapshot; restricts to valid segments, computes once per latest snapshot, reuses empty states; writes full daily snapshotcdp_dashboard_metrics_per_segment_sink):cdp_segment_metrics_subproject_sink,cdp_segment_metrics_project_sink,cdp_segment_metrics_project_group_sink(rollups viaparentId/grandparentId)Written by Cursor Bugbot for commit 87dac21. This will update automatically on new commits. Configure here.