You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: posts/2025/02/allocation_sampling.md
+19-9Lines changed: 19 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -92,7 +92,7 @@ As soon as an object doesn't fit into the nursery anymore, it will be collected.
92
92
93
93
The last section described how the nursery allocation works normally. Now we'll talk how we integrate the new allocation sampling approach into it.
94
94
95
-
To decide whether the GC should trigger a sample, the sampling logic is integrated into the bump pointer allocation logic. Usually, when there is not enough space in the nursery left to fulfill an allocation request, the nursery will be collected and the allocation will be done afterwards. We re-use that mechanism for sampling, by introducing a new pointer called `sample_point` that is calculated by `sample_point = nursery_free + sample_n_bytes` where `sample_n_bytes` is the number of bytes allocated before a sample is made (i.e. our sampling rate).
95
+
To decide whether the GC should trigger a sample, the sampling logic is integrated into the bump pointer allocation logic. Usually, when there is not enough space in the nursery left to fulfill an allocation request, the nursery will be collected and the allocation will be done afterwards. We reuse that mechanism for sampling, by introducing a new pointer called `sample_point` that is calculated by `sample_point = nursery_free + sample_n_bytes` where `sample_n_bytes` is the number of bytes allocated before a sample is made (i.e. our sampling rate).
96
96
97
97
Imagine we'd have a nursery of 2MB and want to sample every 512KB allocated, then you could imagine our nursery looking like that:
98
98
@@ -181,39 +181,49 @@ For testing and benchmarking, we usually started with a sampling rate of 128Kb a
181
181
182
182
Now let us take a look at the allocation sampling overhead, by profiling some benchmarks.
183
183
184
-
The y-axis shows the profiling overhead, while the x-axis tells the sampling rate. The overhead is computed as `runtime_with_sampling /
185
-
runtime_without_sampling`.
184
+
The x-axis shows the sampling rate, while the y-axis shows the overhead, which is computed as `runtime_with_sampling / runtime_without_sampling`.
186
185
187
-
All benchmarks were executed five times on a PyPy with JIT and native profiling enabled.
186
+
All benchmarks were executed five times on a PyPy with JIT and native profiling enabled, so that every dot in the plot is one run of a benchmark.
As you probably expected, the Overhead drops with higher allocation sampling rates.
191
+
Reaching from as high as ~390% for 32kb allocation sampling to as low as < 10% for 32mb.
192
+
193
+
Let me give one concrete example: One run of the microbenchmark at 32kb sampling took 15.596 seconds and triggered 822050 samples.
194
+
That makes a ridiculous amount of `822050 / 15.596 = ~52709` samples per second.
195
+
196
+
There is probably no need for that amount of samples per second, so that for 'real' application profiling a much higher sampling rate would be sufficient.
190
197
191
-
...
192
198
193
199
Let us compare that to time sampling.
194
200
195
-
Again we ran those benchmarks with a series of time sampling rates. That is 1000, 5000, 10000, 15000, 20000, 25000 and 30000 samples per second.
201
+
This time we ran those benchmarks with 100, 1000and 2000 samples per second.
The overhead varies with the sampling rate. Both with allocation and time sampling you can reach any overhead you want and any level of profiling precision you want. The best approach probably is to just try out a sampling rate and choose what gives you the right tradeoff between precision and overhead (and disk usage).
205
+
The overhead varies with the sampling rate. Both with allocation and time sampling, you can reach any amount of overhead and any level of profiling precision you want. The best approach probably is to just try out a sampling rate and choose what gives you the right tradeoff between precision and overhead (and disk usage).
0 commit comments