Skip to content

Commit c0fdcf3

Browse files
authored
Merge pull request #2094 from rdmarsh2/rdmarsh/docs/cpp/advanced-library-guide
C++/Docs: Add guides to advanced AST libraries
2 parents defe995 + fc7dbeb commit c0fdcf3

File tree

4 files changed

+245
-0
lines changed

4 files changed

+245
-0
lines changed
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
Using the guards library in C and C++
2+
=====================================
3+
4+
Overview
5+
--------
6+
The guards library (defined in ``semmle.code.cpp.controlflow.Guards``) provides a class `GuardCondition <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/controlflow/Guards.qll/type.Guards$GuardCondition.html>`__ representing Boolean values that are used to make control flow decisions.
7+
A ``GuardCondition`` is considered to guard a basic block if the block can only be reached if the ``GuardCondition`` is evaluated a certain way. For instance, in the following code, ``x < 10`` is a ``GuardCondition``, and it guards all the code before the return statement.
8+
9+
.. code-block:: cpp
10+
11+
if(x < 10) {
12+
f(x);
13+
} else if (x < 20) {
14+
g(x);
15+
} else {
16+
h(x);
17+
}
18+
return 0;
19+
20+
21+
The ``controls`` predicate
22+
------------------------------------------------
23+
The ``controls`` predicate helps determine which blocks are only run when the ``GuardCondition`` evaluates a certain way. ``guard.controls(block, testIsTrue)`` holds if ``block`` is only entered if the value of this condition is ``testIsTrue``.
24+
25+
In the following code sample, the call to ``isValid`` controls the calls to ``performAction`` and ``logFailure`` but not the return statement.
26+
27+
.. code-block:: cpp
28+
29+
if(isValid(accessToken)) {
30+
performAction();
31+
succeeded = 1;
32+
} else {
33+
logFailure();
34+
succeeded = 0;
35+
}
36+
return succeeded;
37+
38+
In the following code sample, the call to `isValid` controls the body of the if and also the code after the if.
39+
40+
.. code-block:: cpp
41+
42+
if(!isValid(accessToken)) {
43+
logFailure();
44+
return 0;
45+
}
46+
performAction();
47+
return succeeded;
48+
49+
The ``ensuresEq`` and ``ensuresLt`` predicates
50+
----------------------------------------------
51+
The ``ensuresEq`` and ``ensuresLt`` predicates are the main way of determining what, if any, guarantees the ``GuardCondition`` provides for a given basic block.
52+
53+
The ``ensuresEq`` predicate
54+
***************************
55+
When ``ensuresEq(left, right, k, block, true)`` holds, then ``block`` is only executed if ``left`` was equal to ``right + k`` at their last evaluation. When ``ensuresEq(left, right, k, block, false)`` holds, then ``block`` is only executed if ``left`` was not equal to ``right + k`` at their last evaluation.
56+
57+
The ``ensuresLt`` predicate
58+
***************************
59+
When ``ensuresLt(left, right, k, block, true)`` holds, then ``block`` is only executed if ``left`` was strictly less than ``right + k`` at their last evaluation. When ``ensuresLt(left, right, k, block, false)`` holds, then ``block`` is only executed if ``left`` was greater than or equal to ``right + k`` at their last evaluation.
60+
61+
In the following code sample, the comparison on the first line ensures that ``index`` is less than ``size`` in the "then" block, and that ``index`` is greater than or equal to ``size`` in the "else" block.
62+
63+
.. code-block:: cpp
64+
65+
if(index < size) {
66+
ret = array[index];
67+
} else {
68+
ret = nullptr
69+
}
70+
return ret;
71+
72+
The ``comparesEq`` and ``comparesLt`` predicates
73+
------------------------------------------------
74+
The ``comparesEq`` and ``comparesLt`` predicates help determine if the ``GuardCondition`` evaluates to true.
75+
76+
The ``comparesEq`` predicate
77+
****************************
78+
``comparesEq(left, right, k, true, testIsTrue)`` holds if ``left`` equals ``right + k`` when the expression evaluates to ``testIsTrue``.
79+
80+
The ``comparesLt`` predicate
81+
****************************
82+
``comparesLt(left, right, k, isLessThan, testIsTrue)`` holds if ``left < right + k`` evaluates to ``isLessThan`` when the expression evaluates to ``testIsTrue``.
83+

docs/language/learn-ql/cpp/ql-for-cpp.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,23 @@ These topics provide an overview of the QL C/C++ standard libraries and show exa
3131

3232
- :doc:`Example: Checking for allocations equal to strlen(string) without space for a null terminator <zero-space-terminator>` shows how a query to detect this particular buffer issue was developed.
3333

34+
Advanced libraries
35+
----------------------------------
36+
37+
.. toctree::
38+
:hidden:
39+
40+
guards
41+
range-analysis
42+
value-numbering-hash-cons
43+
44+
- :doc:`Using the guards library in C and C++ <guards>` demonstrates how to identify conditional expressions that control the execution of other code and what guarantees they provide.
45+
46+
- :doc:`Using range analysis for C and C++ <range-analysis>` demonstrates how to determine constant upper and lower bounds and possible overflow or underflow of expressions.
47+
48+
- :doc:`Using hash consing and value numbering for C and C++ <value-numbering-hash-cons>` demonstrates how to recognize expressions that are syntactically identical or compute the same value at runtime.
49+
50+
3451
Other resources
3552
---------------
3653

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Using range analysis for C and C++
2+
==================================
3+
4+
Overview
5+
--------
6+
Range analysis determines upper and lower bounds for an expression.
7+
8+
The range analysis library (defined in ``semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis``) provides a set of predicates for determining constant upper and lower bounds on expressions, as well as recognizing integer overflows. For performance, the library performs automatic widening and therefore may not provide the tightest possible bounds.
9+
10+
Bounds predicates
11+
-----------------
12+
The ``upperBound`` and ``lowerBound`` predicates provide constant bounds on expressions. No conversions of the argument are included in the bound. In the common case that your query needs to take conversions into account, call them on the converted form, such as ``upperBound(expr.getFullyConverted())``.
13+
14+
Overflow predicates
15+
-------------------
16+
``exprMightOverflow`` and related predicates hold if the relevant expression might overflow, as determined by the range analysis library. The ``convertedExprMightOverflow`` family of predicates will take conversions into account.
17+
18+
Example
19+
-------
20+
This query uses ``upperBound`` to determine whether the result of ``snprintf`` is checked when used in a loop.
21+
22+
.. code-block:: ql
23+
24+
from FunctionCall call, DataFlow::Node source, DataFlow::Node sink, Expr convSink
25+
where
26+
// the call is an snprintf with a string format argument
27+
call.getTarget().getName() = "snprintf" and
28+
call.getArgument(2).getValue().regexpMatch(".*%s.*") and
29+
30+
// the result of the call influences its size argument in later iterations
31+
TaintTracking::localTaint(source, sink) and
32+
source.asExpr() = call and
33+
sink.asExpr() = call.getArgument(1) and
34+
35+
// there is no fixed bound on the snprintf's size argument
36+
upperBound(convSink) = typeUpperBound(convSink.getType().getUnspecifiedType()) and
37+
convSink = call.getArgument(1).getFullyConverted()
38+
39+
select call, upperBound(call.getArgument(1).getFullyConverted())
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
Hash consing and value numbering
2+
=================================================
3+
Overview
4+
--------
5+
In C and C++ QL databases, each node in the abstract syntax tree is represented by a separate object. This allows both analysis and results display to refer to specific appearances of a piece of syntax. However, it is frequently useful to determine whether two expressions are equivalent, either syntactically or semantically.
6+
7+
The `hash consing <https://en.wikipedia.org/wiki/Hash_consing>`__ library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. The `global value numbering <https://en.wikipedia.org/wiki/Value_numbering>`__ library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime.
8+
9+
Both libraries partition the expressions in each function into equivalence classes represented by objects. Each ``HashCons`` object represents a set of expressions with identical parse trees, while ``GVN`` objects represent sets of expressions that will always compute the same value.
10+
11+
12+
Example C code
13+
--------------
14+
15+
In the following C program, ``x + y`` and ``x + z`` will be assigned the same value number but different hash conses.
16+
17+
.. code-block:: c
18+
19+
int x = 1;
20+
int y = 2;
21+
int z = y;
22+
if(x + y == x + z) {
23+
...
24+
}
25+
26+
However, in the next example, the uses of ``x + y`` will have different value numbers but the same hash cons.
27+
28+
.. code-block:: c
29+
30+
int x = 1;
31+
int y = 2;
32+
if(x + y) {
33+
...
34+
}
35+
36+
x = 2;
37+
38+
if(x + y) {
39+
...
40+
}
41+
42+
Value numbering
43+
---------------
44+
The value numbering library (defined in ``semmle.code.cpp.valuenumbering.GlobalValueNumbering``) provides a mechanism for identifying expressions that compute the same value at runtime. Value numbering is useful when your primary concern is with the values being produced or the eventual machine code being run. For instance, value numbering might be used to determine whether a check is being done against the same value as the operation it is guarding.
45+
46+
The value numbering API
47+
~~~~~~~~~~~~~~~~~~~~~~~
48+
The value numbering library exposes its interface primarily through the ``GVN`` class. Each instance of ``GVN`` represents a set of expressions that will always evaluate to the same value. To get an expression in the set represented by a particular ``GVN``, use the ``getAnExpr()`` member predicate.
49+
50+
To get the ``GVN`` of an ``Expr``, use the ``globalValueNumber`` predicate.
51+
52+
.. note::
53+
54+
While the ``GVN`` class has ``toString`` and ``getLocation`` methods, these are only provided as debugging aids. They give the ``toString`` and ``getLocation`` of an arbitrary ``Expr`` within the set.
55+
56+
Why not a predicate?
57+
~~~~~~~~~~~~~~~~~~~~
58+
The obvious interface for this library would be a predicate ``equivalent(Expr e1, Expr e2)``. However, this predicate would be very large, with a quadratic number of rows for each set of equivalent expressions. By using a class as an intermediate step, the number of rows can be kept linear, and therefore can be cached.
59+
60+
Example Queries
61+
~~~~~~~~~~~~~~~
62+
63+
This query uses the ``GVN`` class to identify calls to ``strncpy`` where the size argument is derived from the source rather than the destination
64+
65+
.. code-block:: ql
66+
67+
from FunctionCall strncpy, FunctionCall strlen
68+
where
69+
strncpy.getTarget().hasGlobalName("strncpy") and
70+
strlen.getTarget().hasGlobalName("strlen") and
71+
globalValueNumber(strncpy.getArgument(1)) = globalValueNumber(strlen.getArgument(0)) and
72+
strlen = strncpy.getArgument(2)
73+
select ci, "This call to strncpy is bounded by the size of the source rather than the destination"
74+
75+
.. TODO: a second example
76+
77+
Hash consing
78+
------------
79+
The hash consing library (defined in ``semmle.code.cpp.valuenumbering.HashCons``) provides a mechanism for identifying expressions that have the same syntactic structure. Hash consing is useful when your primary concern is with the text of the code. For instance, hash consing might be used to detect duplicate code within a function.
80+
81+
The hash consing API
82+
~~~~~~~~~~~~~~~~~~~~
83+
The hash consing library exposes its interface primarily through the ``HashCons`` class. Each instance of ``HashCons`` represents a set of expressions within one function that have the same syntax (including referring to the same variables). To get an expression in the set represented by a particular ``HashCons``, use the ``getAnExpr()`` member predicate.
84+
85+
.. note::
86+
87+
While the ``HashCons`` class has ``toString`` and ``getLocation`` methods, these are only provided as debugging aids. They give the ``toString`` and ``getLocation`` of an arbitrary ``Expr`` within the set.
88+
89+
To get the ``HashCons`` of an ``Expr``, use the ``hashCons`` predicate.
90+
91+
Examples
92+
~~~~~~~~
93+
94+
.. TODO: prose explanations
95+
96+
.. code-block:: ql
97+
98+
import cpp
99+
import semmle.code.cpp.valuenumbering.HashCons
100+
101+
from IfStmt outer, IfStmt inner
102+
where
103+
outer.getElse+() = inner and
104+
hashCons(outer.getCondition()) = hashCons(inner.getCondition())
105+
select inner.getCondition(), "The condition of this if statement duplicates the condition of $@",
106+
outer.getCondition(), "an enclosing if statement"

0 commit comments

Comments
 (0)