-
Notifications
You must be signed in to change notification settings - Fork 166
perf(prof): gather time samples during allocation samples #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When allocation profiler takes a sample and interrupt_count > 0, collect both allocation and time samples in a single stack walk. Benefits: - Eliminates redundant stack walks when samples coincide. - Improves time sample accuracy (closer to timer expiry).
Let's do this in a dedicated PR.
| // instead of waiting for an interrupt handler. This is slightly more | ||
| // accurate and efficient, win-win. | ||
| let interrupt_count = REQUEST_LOCALS | ||
| .try_with_borrow(|locals| locals.interrupt_count.swap(0, Ordering::SeqCst)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that SeqCst is probably too heavy-handed. Fixing this should be done in another PR as it affects everything which checks this, not just this combined sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fine with postponing this to another PR, especially that benchmarks show we are slightly better anyway 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes fix a warning you can get on local development about an unused use super::*;.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3559 +/- ##
==========================================
- Coverage 61.78% 61.64% -0.14%
==========================================
Files 139 139
Lines 13051 13051
Branches 1712 1712
==========================================
- Hits 8063 8045 -18
- Misses 4225 4241 +16
- Partials 763 765 +2 see 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ profiler ]Benchmark execution time: 2026-01-09 17:29:35 Comparing candidate commit 5d79ecd in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics. |
|
|
||
| function allocate_large() { | ||
| // Large allocation to trigger sampling | ||
| $data = str_repeat("a", 1024 * 200_000); // ~200MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str_repeat() is not a frameless function, substr() or str_replace() are frameless function for example and do allocate memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
realFlowControl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?
| $s = str_repeat("x", 10_000_000); | ||
| strlen($s); // Use the string to prevent optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire JSON has nothing to do with how the JSON's look that the https://github.com/DataDog/prof-correctness action accepts. As such prof-correctness seems to not match anything against the pprof which means it does not fail and as such succeeds (which is something we should fix in the upstream action, it should fail if the JSON is in an invalid structure).
Have a look at the other examples, like https://github.com/DataDog/dd-trace-php/blob/4a3a98716843640b4a8d6894da8ef0072500c729/profiling/tests/correctness/timeline.json how the JSON is supposed to look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've sort of fixed it. How can I make this only run for PHP 8.4+? Because the % will be determined by whether it's frameless or not.
| // instead of waiting for an interrupt handler. This is slightly more | ||
| // accurate and efficient, win-win. | ||
| let interrupt_count = REQUEST_LOCALS | ||
| .try_with_borrow(|locals| locals.interrupt_count.swap(0, Ordering::SeqCst)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fine with postponing this to another PR, especially that benchmarks show we are slightly better anyway 🎉
| unset DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE | ||
| mkdir -p profiling/tests/correctness/allocation_time_combined/ | ||
| export DD_PROFILING_OUTPUT_PPROF=$PWD/profiling/tests/correctness/allocation_time_combined/test.pprof | ||
| export DD_PROFILING_ALLOCATION_ENABLED=yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| export DD_PROFILING_ALLOCATION_ENABLED=yes | |
| export DD_PROFILING_ALLOCATION_ENABLED=yes | |
| export DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 |
Just an idea, but by setting the allocation sampling distance to 1, we make sure to take enough samples and have a higher chance of seeing combined samples.
| export DD_PROFILING_ENDPOINT_COLLECTION_ENABLED=no | ||
| export EXECUTION_TIME=5 | ||
| php -d extension=$PWD/target/profiler-release/libdatadog_php_profiling.so profiling/tests/correctness/allocation_time_combined.php | ||
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME | |
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE |
Just if you accept the change above
This got merged and I re-run the correctness tests, which are now failing. |
Description
Gathers time samples during allocation samples if
interrupt_count > 0. This has two benefits:When I say a slight performance benefit, here's a comparison to v1.15.2:
The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.
Reviewer checklist