C++: Range analysis measure bounds#20645
Conversation
ab09ae5 to
4864e82
Compare
0ac00f8 to
ab836bb
Compare
ab836bb to
9502d83
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR addresses performance issues in the C++ simple range analysis library by implementing a pre-analysis to estimate bounds growth and applying widening when the estimate exceeds a threshold. The solution prevents combinatorial explosions that could cause analysis timeouts.
Key changes:
- Adds bounds estimation logic (
BoundsEstimatemodule) that estimates potential bounds count before running full analysis - Implements selective widening based on bounds estimates to prevent performance issues
- Updates test cases to reflect new analysis behavior with widening applied to expressions with many estimated bounds
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| SimpleRangeAnalysis.qll | Core implementation adding bounds estimation module and widening logic |
| test.c | New test cases demonstrating combinatorial explosion scenarios |
| upperBound.expected | Updated expected results reflecting widening behavior |
| lowerBound.expected | Updated expected results reflecting widening behavior |
| ternaryUpper.expected | Updated expected results for ternary expressions |
| ternaryLower.expected | Updated expected results for ternary expressions |
| nrOfBounds.ql | New test query for bounds estimation debugging |
Comments suppressed due to low confidence (1)
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll:1
- Corrected spelling of 'anncuracies' to 'inaccuracies'.
/**
geoffw0
left a comment
There was a problem hiding this comment.
Strategy seems sensible to me (but @MathiasVP has spent much more time working with this library). I will review the (second) DCA run when it finishes.
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/test/library-tests/rangeanalysis/SimpleRangeAnalysis/lowerBound.expected
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
|
Thanks for the review @geoffw0 with some great catches 👍. I've applied your suggestions. |
MathiasVP
left a comment
There was a problem hiding this comment.
A few minor comments. Thanks a lot for this, Simon!
I think it would be good to add an inline expectations test which shows the number of bounds for a given expression. This would also make it easier to spot problems with non-functionality of nrOfBoundsExpr. Would you mind adding such an inline expectations test while you're here?
| float getBoundsLimit() { | ||
| // This limit is arbitrary, but low enough that it prevents timeouts on | ||
| // specific observed customer databases (and the in the tests). | ||
| result = 2.0.pow(40) |
There was a problem hiding this comment.
I think this is a perfectly fine threshold to start with. FWIW when I introduce an "arbitrary threshold" like this I like to do a small amount of investigation into the underlying distribution. See for example what I did in this PR from last year where I plotted "number of nested bitwise operations" for each bitwise operation in a database. It would be interesting to see a similar plot for "number of bounds" for each expression on a database or two.
... but as I said: I think this very high arbitrary threshold is perfectly fine as a start.
There was a problem hiding this comment.
That's a good point. @andersfugmann also suggested that we could create a statistics query and potentially get telemetry for this to make a more quantified upper bound.
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll
Outdated
Show resolved
Hide resolved
| or | ||
| exists(ConditionalExpr condExpr | | ||
| e = condExpr and | ||
| result = nrOfBoundsExpr(condExpr.getThen()) * nrOfBoundsExpr(condExpr.getElse()) |
There was a problem hiding this comment.
This likely doesn't give us the best possible join ordering since this recursion is non-linear, but we've never bothered to actually fix this in the main recursion of SimpleRangeAnalysis itself so it's probably all fine.
|
I've kicked off another DCA run for good measure, but assuming that one doesn't show anything then I think we're good to merge? There is the option of doing a QA run as well. But the change seems low enough of a risk that that's worth it? Thoughts? |
|
There's some failures in DCA now and retrying didn't make them go away. Looking at the errors, they don't really look like they're caused by the QL changes. |
Those errors were fixed late yesterday afternoon. Could you re-run? |
|
Thanks @jketema. I triggered retries 2 hours ago, should I start a new DCA run or just do the retry commands again? |
|
I don't think retries work in this case: you'll need to start a new experiment. |
|
I think this is ready to merge now. Please double-check that y'all agree with my assessment of the DCA report. |
geoffw0
left a comment
There was a problem hiding this comment.
DCA LGTM.
@MathiasVP were all your questions addressed?
The only thing I am missing is this part of my review:
|
|
I agree that inline expectations would be nice (perhaps even more so for the lower/upper bounds themselves). But, is it ok if we don't do that for now? |
geoffw0
left a comment
There was a problem hiding this comment.
The additional test can be done as follow-up.
…-analysis-measure Revert "Merge pull request #20645 from paldepind/cpp/range-analysis-m…
…lysis-measure" This reverts commit e7c029a.
…lysis-measure" This reverts commit e7c029a.
…lysis-measure" This reverts commit e7c029a.
This PR intends to address (some) performance issues in the simple range analysis library.
The basic idea is rather simpler:
e1 + e2is number the number of bounds fore1times the number of bounds fore2.Note to reviewers:
nrOfBoundsExpris a good place to start.