Skip to content

Conversation

@harp-intel
Copy link
Contributor

Summary

Fixes #569 - Prevents skewed mean metric values when collecting metrics for workloads.

When collecting metrics with a workload (e.g., perfspect metrics -- stress-ng --cpu 0 --cpu-load 60 --timeout 30), the final sample may be collected after or during workload completion. This captures the system transitioning from loaded to idle state. With short collections (e.g., 30 seconds = 6 samples), this single anomalous sample can significantly skew summary statistics.

Changes

  • Added WithWorkload field to Metadata struct - Tracks whether metrics were collected with a user-provided workload application
  • Set workload context - WithWorkload = true when workload arguments are provided via --
  • Implemented excludeFinalSample() method - Removes final timestamp's rows from all metric groups before computing summary statistics
  • Optimized logging - Checks first group only and logs once per collection to avoid log spam with many sockets/CPUs
  • Added comprehensive tests - Unit tests cover various scenarios including single samples, multiple groups, and edge cases

Solution Approach

The implementation excludes the final timestamp's samples from summary statistics when metadata.WithWorkload is true. The full CSV with all samples is still preserved for users who want to perform advanced analysis.

Testing

  • ✅ All new unit tests pass
  • ✅ All existing tests pass
  • make check passes (format, vet, staticcheck, lint, vulnerabilities, tests)

Test plan

  1. Run perfspect metrics -- stress-ng --cpu 0 --cpu-load 60 --timeout 30
  2. Verify summary statistics exclude final sample and are not skewed by post-workload data
  3. Verify full CSV still contains all samples
  4. Test with various workloads and collection durations

When collecting metrics with a workload (e.g., `perfspect metrics -- stress-ng`),
the final sample may be collected after or during workload completion, capturing
system transition from loaded to idle state. With short collections, this single
anomalous sample can significantly skew summary statistics (mean, min, max, stddev).

This change automatically excludes the final timestamp's samples from summary
statistics when metrics are collected with a workload. The full CSV with all
samples is still preserved for advanced analysis.

Implementation:
- Added `WithWorkload` field to `Metadata` struct to track workload context
- Set `WithWorkload = true` when workload application arguments are provided
- Added `excludeFinalSample()` method to remove final timestamp rows before
  computing summary statistics
- Optimized to check first group only and log once per collection

Fixes #569

Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
@harp-intel harp-intel merged commit 5ec813c into main Nov 25, 2025
7 of 8 checks passed
@harp-intel harp-intel deleted the fix-workload-metrics-skew branch November 25, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

collecting metrics for workload may result in skewed mean metric values

2 participants