Skip to content

Commit a35241e

Browse files
author
james
committed
docs: 2nd round of suggestions
1 parent feb4d26 commit a35241e

File tree

9 files changed

+85
-45
lines changed

9 files changed

+85
-45
lines changed

docs/language/ql-training-rst/cpp/bad-overflow-guard.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Where might this go wrong?
4343
- In C/C++ we often need to check for whether an operation `overflows <https://en.wikipedia.org/wiki/Integer_overflow>`__.
4444
- An overflow is when an arithmetic operation, such as an addition, results in a number which is too large to be stored in the type.
4545
- When an operation overflows, the value “wraps” around.
46-
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the arguments - that is the result has **wrapped**.
46+
- A typical way to check for overflow of an addition, therefore, is whether the result is less than one of the argumentsthat is the result has **wrapped**.
4747

4848
Integer promotion
4949
=================
@@ -139,13 +139,13 @@ Let’s look for overflow guards of the form ``v + b < v``, using the classes
139139

140140
- When performing `variant analysis <https://semmle.com/variant-analysis>`__, it is usually helpful to write a simple query that finds the simple syntactic pattern, before trying to go on to describe the cases where it goes wrong.
141141
- In this case, we start by looking for all the *overflow* checks, before trying to refine the query to find all *bad overflow* checks.
142-
- The select clause defines what this query is looking for:
142+
- The ``select`` clause defines what this query is looking for:
143143

144144
- an ``AddExpr``: the expression that is being checked for overflow.
145145
- a ``RelationalOperation``: the overflow comparison check.
146146
- a ``Variable``: used as an argument to both the addition and comparison.
147147

148-
- The where part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
148+
- The ``where`` part of the query ties these three QL variables together using `predicates <https://help.semmle.com/QL/ql-handbook/predicates.html>`__ defined in the `standard QL for C/C++ library <https://help.semmle.com/qldoc/cpp/>`__.
149149

150150
QL query: bad overflow guards
151151
=============================
@@ -197,14 +197,14 @@ Now our query becomes:
197197
- However, this is a little bit repetitive. What we really want to say is that: all the operands of the addition are small. Fortunately, QL provides a ``forall`` formula that we can use in these circumstances.
198198
- A ``forall`` has three parts:
199199

200-
- A declaration part, where we can introduce variables.
200+
- A declaration part, where we can introduce variables.
201201
- A “range” part, which allows us to restrict those variables.
202202
- A “condition” part. The ``forall`` as a whole holds if the condition holds for each of the values in the range.
203203
- In our case:
204204

205205
- The declaration introduces a variable for expressions, called ``op``. At this stage, this variable represents all the expressions in the program.
206206
- The “range” part, ``op = a.getAnOperand()``, restricts ``op`` to being one of the two operands to the addition.
207-
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition - that the ``op`` is small - holds for everything in the range - that is both the arguments to the addition.
207+
- The “condition” part, ``isSmall(op)``, says that the ``forall`` holds only if the condition (that the ``op`` is small) holds for everything in the rangethat is, both the arguments to the addition.
208208

209209
QL query: bad overflow guards
210210
=============================

docs/language/ql-training-rst/cpp/control-flow-cpp.rst

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Control flow graphs
7979
Modeling control flow
8080
=====================
8181

82-
The control flow is modelled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
82+
The control flow is modeled with a QL class, ``ControlFlowNode``. Examples of control flow nodes include statements and expressions.
8383

8484
``ControlFlowNode`` provides API for traversing the control flow graph:
8585

@@ -88,7 +88,7 @@ The control flow is modelled with a QL class, ``ControlFlowNode``. Examples of c
8888
- ``ControlFlowNode ControlFlowNode.getATrueSuccessor()``
8989
- ``ControlFlowNode ControlFlowNode.getAFalseSuccessor()``
9090

91-
The control-flow graph is *intra-procedural* - in other words, only models paths within a function. To find the associated function, use
91+
The control-flow graph is *intra-procedural*in other words, only models paths within a function. To find the associated function, use
9292

9393
- ``Function ControlFlowNode.getControlFlowScope()``
9494

@@ -101,7 +101,7 @@ The control-flow graph is *intra-procedural* - in other words, only models paths
101101
Example: malloc/free pairs
102102
==========================
103103

104-
Find calls to free that are reachable from an allocation on the same variable:
104+
Find calls to ``free`` that are reachable from an allocation on the same variable:
105105

106106
.. literalinclude:: ../query-examples/cpp/control-flow-cpp-1.ql
107107
:language: ql
@@ -127,7 +127,7 @@ Based on this query, write a query that finds accesses to the variable that occu
127127
Utilizing recursion
128128
===================
129129

130-
The main problem we observed in the previous exercise was that the successors relation is unaware of changes to the variable that would invalidate our results.
130+
The main problem we observed in the previous exercise was that the successor's relation is unaware of changes to the variable that would invalidate our results.
131131

132132
We can fix this by writing our own successor predicate that stops traversing the CFG if the variable is re-defined.
133133

@@ -200,6 +200,21 @@ Write a query to find unreachable basic blocks.
200200

201201
This query has a good number of false positives on Chakra, many of them to do with templating and macros.
202202

203+
Guard conditions
204+
================
205+
206+
A ``GuardCondition`` is a ``Boolean`` condition that controls one or more basic blocks in the sense that it is known to be true/false at the entry of those blocks.
207+
208+
- ``GuardCondition.controls(BasicBlock bb, boolean outcome):`` the entry of bb can only be reached if the guard evaluates to outcome
209+
210+
- ``GuardCondition.comparesLt, GuardCondition.ensuresLt, GuardCondition.comparesEq:`` auxiliary predicates to identify conditions that guarantee that one expression is less than/equal to another
211+
212+
Further materials
213+
=================
214+
215+
- QL for C/C++: https://help.semmle.com/QL/learn-ql/ql/cpp/ql-for-cpp.html
216+
- API reference: https://help.semmle.com/qldoc/cpp
217+
203218
.. rst-class:: end-slide
204219

205220
Extra slides
@@ -274,7 +289,7 @@ Create a subclass of ``ExprCall`` that uses your query to implement ``getTarget`
274289
}
275290
}
276291
277-
Control flow graph customizations
292+
Control-flow graph customizations
278293
=================================
279294

280295
The default control-flow graph implementation recognizes a few common patterns for non-returning functions, but sometimes it fails to spot them, which can cause imprecision.

docs/language/ql-training-rst/cpp/data-flow-cpp.rst

Lines changed: 27 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ Alternatively, you can query the project in `the query console <https://lgtm.com
2424

2525
Note that results generated in the query console are likely to differ to those generated in the QL plugin as LGTM.com analyzes the most recent revisions of each project that has been added–the snapshot available to download above is based on an historical version of the code base.
2626

27-
Agenda
28-
======
27+
Overview
28+
========
2929

3030
- Non-constant format string
3131
- Data flow
@@ -108,7 +108,7 @@ We need something better.
108108
109109
Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.
110110

111-
What we need is a way to determine whether the format argument is ever set to something that is, not constant.
111+
What we need is a way to determine whether the format argument is ever set to something that is not constant.
112112

113113
Data flow analysis
114114
==================
@@ -127,7 +127,7 @@ Data flow analysis
127127

128128
The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
129129

130-
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
130+
We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in the program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
131131

132132
Data flow graphs
133133
================
@@ -246,19 +246,22 @@ Data flow graph
246246

247247
The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
248248

249-
Taint-tracking
249+
Taint tracking
250250
==============
251251

252252
- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
253253

254254
- Examples:
255255

256-
.. code-block:: cpp
256+
.. code-block:: cpp
257+
258+
sink = source; // source -> sink: data and taint
259+
strcat(sink, source); // source -> sink: taint, not data
257260
258-
sink = source; // source -> sink: data and taint
259-
strcat(sink, source); // source -> sink: taint, not data
261+
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint:
260262

261-
- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint; ``TaintTracking::localTaintStep`` represents one (local) taint step, ``TaintTracking::localTaint`` is its transitive closure.
263+
- ``TaintTracking::localTaintStep`` represents one (local) taint step
264+
- ``TaintTracking::localTaint`` is its transitive closure.
262265

263266
.. note::
264267

@@ -304,8 +307,20 @@ Refine the query to find calls to ``printf``-like functions where the format arg
304307

305308
.. rst-class:: build
306309

307-
.. literalinclude:: ../query-examples/cpp/data-flow-cpp-2.ql
308-
:language: ql
310+
.. code-block:: ql
311+
312+
import cpp
313+
import semmle.code.cpp.dataflow.DataFlow
314+
import semmle.code.cpp.commons.Printf
315+
316+
class SourceNode extends DataFlow::Node { ... }
317+
318+
from FormattingFunction f, Call c, SourceNode src, DataFlow::Node arg
319+
where c.getTarget() = f and
320+
arg.asExpr() = c.getArgument(f.getFormatParameterIndex()) and
321+
DataFlow::localFlow(src, arg) and
322+
not src.asExpr() instanceof StringLiteral
323+
select arg, "Non-constant format string."
309324
310325
Refinements (take home exercise)
311326
================================
@@ -325,4 +340,4 @@ Beyond local data flow
325340
- Results are still underwhelming.
326341
- Dealing with parameter passing becomes cumbersome.
327342
- Instead, let’s turn the problem around and find user-controlled data that flows into a ``printf`` format argument, potentially through calls.
328-
- This needs **global data flow**.
343+
- This needs :doc:`global data flow <global-data-flow-cpp>`.

docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Global data flow and taint tracking
5353
- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
5454
- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
5555

56-
- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
56+
- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
5757
- Typically, this involves specifying the *source* and *sink*.
5858

5959
.. note::
@@ -113,7 +113,7 @@ The library class ``SecurityOptions`` provides a (configurable) model of what co
113113
114114
.. note::
115115

116-
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modelling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
116+
We first define what it means to be a *source* of tainted data for this particular problem. In this case, what we care about is whether the format string can be provided by an external user to our application or service. As there are many such ways external data could be introduced into the system, the standard QL libraries for C/C++ include an extensible API for modeling user input. In this case, we will simply use the predefined set of *user inputs*, which includes arguments provided to command line applications.
117117

118118

119119
Defining sinks (exercise)
@@ -157,7 +157,7 @@ Use the ``FormattingFunction`` class to fill in the definition of “isSink”
157157
158158
.. note::
159159

160-
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive - a “real” result - because our query only reports the source and the sink, and not the path through the graph between the two.
160+
When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive (a “real” result) because our query only reports the source and the sink, and not the path through the graph between the two.
161161

162162
Path queries
163163
============
@@ -182,7 +182,7 @@ Use this template:
182182
183183
.. note::
184184

185-
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
185+
To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to workwe need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
186186

187187
Defining additional taint steps
188188
===============================
@@ -250,7 +250,7 @@ Extra slides
250250
Exercise: How not to do global data flow
251251
========================================
252252

253-
Implement a flowStep predicate extending localFlowStep with steps through function calls and returns. Why might we not want to use this?
253+
Implement a ``flowStep`` predicate extending ``localFlowStep`` with steps through function calls and returns. Why might we not want to use this?
254254

255255
.. code-block:: ql
256256

docs/language/ql-training-rst/cpp/program-representation-cpp.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,14 @@ The basic representation of an analyzed program is an *abstract syntax tree (AST
6060

6161
.. note::
6262

63-
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the program - a tree structure where program elements are nested within other program elements.
63+
When writing queries in QL it is important to have in mind the underlying representation of the program which is stored in the database. Typically queries make use of the “AST” representation of the programa tree structure where program elements are nested within other program elements.
6464

6565
The “Introducing the C/C++ libraries” help topic contains a more complete overview of important AST classes and the rest of the C++ QL libraries: https://help.semmle.com/QL/learn-ql/ql/cpp/introduce-libraries-cpp.html
6666

6767
Database representations of ASTs
6868
================================
6969

70-
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaque - all one can do with them is to check their equality.
70+
AST nodes and other program elements are encoded in the database as *entity values*. Entities are implemented as integers, but in QL they are opaqueall one can do with them is to check their equality.
7171

7272
Each entity belongs to an entity type. Entity types have names starting with “@” and are defined in the database schema (not in QL).
7373

@@ -136,7 +136,7 @@ Working with functions
136136

137137
Functions are represented by the Function QL class. Each declaration or definition of a function is represented by a ``FunctionDeclarationEntry``.
138138

139-
Calls to functions are modelled by QL class Call and its subclasses:
139+
Calls to functions are modeled by QL class Call and its subclasses:
140140

141141
- ``Call.getTarget()`` gets the declared target of the call; undefined for calls through function pointers
142142
- ``Function.getACallToThisFunction()`` gets a call to this function

docs/language/ql-training-rst/cpp/snprintf.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,13 @@ RCE in rsyslog
5252

5353
- Vulnerable code looked similar to this (`original <https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1195-L1211>`__):
5454

55-
.. code-block:: cpp
56-
57-
char buf[1024];
58-
int pos = 0;
59-
for (int i = 0; i < n; i++) {
60-
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
61-
}
55+
.. code-block:: cpp
56+
57+
char buf[1024];
58+
int pos = 0;
59+
for (int i = 0; i < n; i++) {
60+
pos += snprintf(buf + pos, sizeof(buf) - pos, "%s", strs[i]);
61+
}
6262
6363
- Disclosed as `CVE-2018-1000140 <https://nvd.nist.gov/vuln/detail/CVE-2018-1000140>`__.
6464
- Blog post: `https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/ <https://blog.semmle.com/librelp-buffer-overflow-cve-2018-1000140/>`__.

docs/language/ql-training-rst/query-examples/java/empty-if-java.ql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ from IfStmt ifstmt, Block block
44
where
55
block = ifstmt.getThen() and
66
block.getNumStmt() = 0
7-
select ifstmt
7+
select ifstmt, "This if-statement is redundant."

docs/language/ql-training-rst/slide-snippets/info.rst

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,19 @@ The QL queries included in the latest Semmle release are open source. View them
1515

1616
**Extra information**
1717

18-
- Pressing ``p`` toggles extra notes (if they're on the current slide)
19-
- Pressing ``f`` toggles full screen viewing
20-
- Pressing ``o`` toggles overview mode
18+
.. |arrow-l| unicode:: U+2190
19+
20+
.. |arrow-r| unicode:: U+2192
21+
22+
- Press |arrow-l| and |arrow-r| to navigate between slides
23+
- Pressing **p** toggles between the slide and any extra notes (where they're available)
24+
- Pressing **f** toggles full screen viewing on/off
2125

2226
.. note::
2327

2428
To run the queries featured in this training presentation, we recommend you download the free-to-use `QL for Eclipse plugin <https://help.semmle.com/ql-for-eclipse/Content/WebHelp/getting-started.html>`__.
29+
2530
This plugin allows you to locally access the latest features of QL, including the standard QL libraries and queries. It also provides standard IDE features such as syntax highlighting, jump-to-definition, and tab completion.
31+
32+
When you have setup QL for Eclipse we recommend increasing the “Memory for running queries” from the default setting of 4096MB to 8192MB, to ensure that all the queries complete quickly.
2633

docs/language/ql-training-rst/slide-snippets/intro-ql-general.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,15 +103,18 @@ Find all instances!
103103
Analysis overview
104104
=================
105105

106+
.. container:: image-box
107+
108+
.. image:: ../_static-training/analysis-overview.png
109+
106110
.. rst-class:: build
107111

112+
113+
108114
- The database schema is (source) language specific, as are queries and libraries.
109-
- Multi-language code bases are analyzed one language at a time.
110115

111-
.. container:: image-box
116+
- Multi-language code bases are analyzed one language at a time.
112117

113-
.. image:: ../_static-training/analysis-overview.png
114-
115118

116119
.. note::
117120

0 commit comments

Comments
 (0)