You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/language/ql-training-rst/cpp/bad-overflow-guard.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ More resources:
29
29
30
30
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
31
31
32
-
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
32
+
Note that results generated in the query console are likely to differ to those generated in the QL plugin. LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
33
33
34
34
35
35
Checking for overflow in C
@@ -53,7 +53,7 @@ Where might this go wrong?
53
53
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
54
54
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
55
55
- When an operation overflows, the value “wraps” around.
56
-
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - i.e. the result has “wrapped”.
56
+
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - that is the result has **wrapped**.
57
57
58
58
Integer promotion
59
59
=================
@@ -174,7 +174,7 @@ We can get the size (in bytes) of a type using the ``getSize()`` method.
174
174
175
175
- An important part of the query is to determine whether a given expression has a “small” type that is going to trigger integer promotion.
176
176
- We therefore write a helper predicate for small expressions.
177
-
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, i.e. less than 32 bits.
177
+
- This predicate effectively represents the set of all expressions in the database where the size of the type of the expression is less than 4 bytes, that is less than 32 bits.
178
178
179
179
QL query: bad overflow guards
180
180
=============================
@@ -191,7 +191,7 @@ Now our query becomes:
191
191
.. note::
192
192
193
193
- Recall from earlier that what makes an overflow check a “bad” check is that all the arguments to the addition are integers smaller than 32 bits.
194
-
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (i.e. under 32 bits):
194
+
- We could write this by using our helper predicate ``isSmall`` to specify that each individual operand to the addition ``isSmall`` (that is under 32 bits):
195
195
196
196
.. code-block:: ql
197
197
@@ -206,12 +206,12 @@ Now our query becomes:
206
206
- In our case:
207
207
- The declaration introduces a variable for Expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
208
208
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
209
-
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - i.e. both the arguments to the addition
209
+
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - that is both the arguments to the addition
210
210
211
211
QL query: bad overflow guards
212
212
=============================
213
213
214
-
In some cases the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We don’t want our query to flag these instances.
214
+
Sometimes the result of the addition is cast to a small type of size less than 4 bytes, preventing automatic widening. We don’t want our query to flag these instances.
215
215
216
216
We can use predicate ``Expr.getExplicitlyConverted()`` to reason about casts that are applied to an expression, adding this restriction to our query:
Copy file name to clipboardExpand all lines: docs/language/ql-training-rst/cpp/control-flow-cpp.rst
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ More resources:
28
28
29
29
Alternatively, you can query any project (including ChakraCore) in the `query console on LGTM.com <https://lgtm.com/query/project:2034240708/lang:cpp/>`__.
30
30
31
-
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
31
+
Note that results generated in the query console are likely to differ to those generated in the QL plugin. LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
32
32
33
33
Agenda
34
34
======
@@ -79,7 +79,10 @@ Control flow graphs
79
79
80
80
.. note::
81
81
82
-
The control flow graph is a static over-approximation of possible control flow at runtime. Its nodes are program elements such as expressions and statements. If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node. Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
82
+
The control flow graph is a static over-approximation of possible control flow at runtime.
83
+
Its nodes are program elements such as expressions and statements.
84
+
If there is an edge from one node to another, then it means that the semantic operation corresponding to the first node may be immediately followed by the operation corresponding to the second node.
85
+
Some nodes (such as conditions of “if” statements or loop conditions) have more than one successor, representing conditional control flow at runtime.
83
86
84
87
Modeling control flow
85
88
=====================
@@ -101,7 +104,7 @@ The control-flow graph is *intra-procedural* - in other words, only models paths
101
104
102
105
The control flow graph is similar in concept to data flow graphs. In contrast to data flow, however, the AST nodes are directly control flow graph nodes.
103
106
104
-
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions, since a control flow node may have any number of predecessors and successors (including zero or more than one).
107
+
The predecessor/successor predicates are prime examples of member predicates with results that are used in functional syntax, but that are not actually functions. This is because a control flow node may have any number of predecessors and successors (including zero or more than one).
Copy file name to clipboardExpand all lines: docs/language/ql-training-rst/cpp/data-flow-cpp.rst
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ Agenda
44
44
Motivation
45
45
==========
46
46
47
-
Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ “Use of externally controlled format string”.
47
+
Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/data/definitions/134.html>`__ **Use of externally controlled format string**.
48
48
49
49
.. code-block:: cpp
50
50
@@ -60,7 +60,7 @@ Let’s write a query to identify instances of `CWE-134 <https://cwe.mitre.org/d
60
60
61
61
printf("Name: %s, Age: %d", "Freddie", 2);
62
62
63
-
would produce the output “Name: Freddie, Age: 2”. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
63
+
would produce the output ``"Name: Freddie, Age: 2”``. So far, so good. However, problems arise if there is a mismatch between the number of formatting specifiers, and the number of arguments. For example:
64
64
65
65
.. code-block:: cpp
66
66
@@ -123,14 +123,14 @@ Data flow analysis
123
123
124
124
- Models flow of data through the program.
125
125
- Implemented in the module ``semmle.code.cpp.dataflow.DataFlow``.
126
-
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and fucntion parameters.
126
+
- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and function parameters.
127
127
- Nodes of the data flow graph.
128
128
- Various predicated represent flow between these nodes.
129
129
Edges of the data flow graph.
130
130
131
131
.. note::
132
132
133
-
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like “does this expression ever hold a value that originates from a particular other place in the program”.
133
+
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
134
134
135
135
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
136
136
@@ -225,7 +225,7 @@ So all references will need to be qualified (that is ``DataFlow::Node``)
225
225
226
226
A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates. For example, the `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__ is contained in the ``semmle/code/cpp/dataflow/DataFlow.qll`` QLL file, and can be imported as shown above.
227
227
228
-
A **module** is a way of organizing QL code by grouping together related predicates, classes and (sub-)modules; either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
228
+
A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
229
229
230
230
For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
231
231
@@ -250,7 +250,7 @@ Data flow graph
250
250
251
251
``localFlowStep`` is the “single step” flow relation–that is it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relation–in other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
252
252
253
-
The data flow graph is completely separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (e.g. ``Variable``) classes.
253
+
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
254
254
255
255
Taint-tracking
256
256
==============
@@ -270,9 +270,9 @@ Taint-tracking
270
270
271
271
Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
272
272
273
-
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow; all we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
273
+
The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow. All we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
274
274
275
-
Exercise: Source Nodes
275
+
Exercise: source nodes
276
276
======================
277
277
278
278
Define a subclass of ``DataFlow::Node`` representing “source” nodes, that is, nodes without a (local) data flow predecessor.
@@ -329,5 +329,5 @@ Beyond local data flow
329
329
330
330
- Results are still underwhelming.
331
331
- Dealing with parameter passing becomes cumbersome.
332
-
- Instead, let’s turn the problem around and find user-controlled data that flows into a printf format argument, potentially through calls.
333
-
- This needs global data flow.
332
+
- Instead, let’s turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
Copy file name to clipboardExpand all lines: docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst
+11-10Lines changed: 11 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,20 +59,20 @@ Global data flow and taint tracking
59
59
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
60
60
61
61
- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
62
-
- Typically, this involves specifying the “source” and “sink”.
62
+
- Typically, this involves specifying the *source* and *sink*.
63
63
64
64
.. note::
65
65
66
-
As we mentioned in the previous slide deck, while local dataflow is feasible to compute for all functions in a snapshot, global dataflow is not. This is because the number of paths becomes exponentially larger for global dataflow.
66
+
As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
67
67
68
-
The global dataflow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
68
+
The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
69
69
70
70
Global taint tracking library
71
71
=============================
72
72
73
-
The semmle.code.cpp.dataflow.TaintTracking library provides a framework for implementing solvers for global taint tracking problems:
73
+
The ``semmle.code.cpp.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
74
74
75
-
#. Subclass TaintTracking::Configuration following this template:
75
+
#. Subclass ``TaintTracking::Configuration`` following this template:
76
76
77
77
.. code-block:: ql
78
78
@@ -82,7 +82,7 @@ The semmle.code.cpp.dataflow.TaintTracking library provides a framework for impl
#. Use Config.hasFlow(source, sink) to find inter-procedural paths.
85
+
#. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
86
86
87
87
.. note::
88
88
@@ -96,7 +96,7 @@ Finding tainted format strings (outline)
96
96
97
97
.. note::
98
98
99
-
Here’s the outline for a inter-procedural (i.e. “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
99
+
Here’s the outline for a inter-procedural (that is “global”) version of the tainted formatting strings query we saw in the previous slide deck. The same template will be applicable for most taint tracking problems.
100
100
101
101
Defining sources
102
102
================
@@ -118,7 +118,7 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
118
118
119
119
.. note::
120
120
121
-
We first define what it means to be a ``source`` of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the pre-defined set of “user inputs”, which includes arguments provided to command line applications.
121
+
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
122
122
123
123
124
124
Defining sinks (exercise)
@@ -167,7 +167,8 @@ Use the ``FormattingFunction`` class to fill in the definition of “isSink”
167
167
Path queries
168
168
============
169
169
170
-
Provide information about the identified paths from sources to sinks; can be examined in Path Explorer view.
170
+
Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view.
171
+
171
172
Use this template:
172
173
173
174
.. code-block:: ql
@@ -186,7 +187,7 @@ Use this template:
186
187
187
188
.. note::
188
189
189
-
In order to see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
190
+
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
0 commit comments