Skip to content

Conversation

@morrisonlevi
Copy link
Collaborator

Description

Gathers time samples during allocation samples if interrupt_count > 0. This has two benefits:

  1. A slight performance improvement in latency, CPU, and memory.
  2. A slight accuracy improvement.

When I say a slight performance benefit, here's a comparison to v1.15.2:

library_version latency_p99 cpu% memory
1.15.2 115.45 ms 132.9% 122.32 MB
local 114.88 ms 131.5% 122.27 MB

The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

When allocation profiler takes a sample and interrupt_count > 0,
collect both allocation and time samples in a single stack walk.

Benefits:
- Eliminates redundant stack walks when samples coincide.
- Improves time sample accuracy (closer to timer expiry).
Let's do this in a dedicated PR.
@morrisonlevi morrisonlevi requested a review from a team as a code owner January 7, 2026 19:00
@github-actions github-actions bot added profiling Relates to the Continuous Profiler tracing labels Jan 7, 2026
// instead of waiting for an interrupt handler. This is slightly more
// accurate and efficient, win-win.
let interrupt_count = REQUEST_LOCALS
.try_with_borrow(|locals| locals.interrupt_count.swap(0, Ordering::SeqCst))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that SeqCst is probably too heavy-handed. Fixing this should be done in another PR as it affects everything which checks this, not just this combined sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine with postponing this to another PR, especially that benchmarks show we are slightly better anyway 🎉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes fix a warning you can get on local development about an unused use super::*;.

@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.64%. Comparing base (98a31cb) to head (5d79ecd).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3559      +/-   ##
==========================================
- Coverage   61.78%   61.64%   -0.14%     
==========================================
  Files         139      139              
  Lines       13051    13051              
  Branches     1712     1712              
==========================================
- Hits         8063     8045      -18     
- Misses       4225     4241      +16     
- Partials      763      765       +2     

see 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98a31cb...5d79ecd. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jan 7, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-01-09 17:29:35

Comparing candidate commit 5d79ecd in PR branch levi/perf-time-alloc-piggyback with baseline commit 98a31cb in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics.


function allocate_large() {
// Large allocation to trigger sampling
$data = str_repeat("a", 1024 * 200_000); // ~200MB
Copy link
Member

@realFlowControl realFlowControl Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str_repeat() is not a frameless function, substr() or str_replace() are frameless function for example and do allocate memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@morrisonlevi morrisonlevi requested a review from a team as a code owner January 7, 2026 22:27
Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?

Comment on lines 16 to 17
$s = str_repeat("x", 10_000_000);
strlen($s); // Use the string to prevent optimization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire JSON has nothing to do with how the JSON's look that the https://github.com/DataDog/prof-correctness action accepts. As such prof-correctness seems to not match anything against the pprof which means it does not fail and as such succeeds (which is something we should fix in the upstream action, it should fail if the JSON is in an invalid structure).

Have a look at the other examples, like https://github.com/DataDog/dd-trace-php/blob/4a3a98716843640b4a8d6894da8ef0072500c729/profiling/tests/correctness/timeline.json how the JSON is supposed to look.

Copy link
Collaborator Author

@morrisonlevi morrisonlevi Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sort of fixed it. How can I make this only run for PHP 8.4+? Because the % will be determined by whether it's frameless or not.

// instead of waiting for an interrupt handler. This is slightly more
// accurate and efficient, win-win.
let interrupt_count = REQUEST_LOCALS
.try_with_borrow(|locals| locals.interrupt_count.swap(0, Ordering::SeqCst))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine with postponing this to another PR, especially that benchmarks show we are slightly better anyway 🎉

unset DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE
mkdir -p profiling/tests/correctness/allocation_time_combined/
export DD_PROFILING_OUTPUT_PPROF=$PWD/profiling/tests/correctness/allocation_time_combined/test.pprof
export DD_PROFILING_ALLOCATION_ENABLED=yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export DD_PROFILING_ALLOCATION_ENABLED=yes
export DD_PROFILING_ALLOCATION_ENABLED=yes
export DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1

Just an idea, but by setting the allocation sampling distance to 1, we make sure to take enough samples and have a higher chance of seeing combined samples.

export DD_PROFILING_ENDPOINT_COLLECTION_ENABLED=no
export EXECUTION_TIME=5
php -d extension=$PWD/target/profiler-release/libdatadog_php_profiling.so profiling/tests/correctness/allocation_time_combined.php
unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME
unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE

Just if you accept the change above

@realFlowControl
Copy link
Member

I've created an upstream PR to fix the empty/malformed expected profile JSON [...]

This got merged and I re-run the correctness tests, which are now failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants