github
diff --git a/‎docs/language/ql-training-rst/cpp/data-flow-cpp.rst‎
Lines changed: 4 additions & 155 deletions b/‎docs/language/ql-training-rst/cpp/data-flow-cpp.rst‎
Lines changed: 4 additions & 155 deletions
diff --git a/‎docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst‎
Lines changed: 6 additions & 70 deletions b/‎docs/language/ql-training-rst/cpp/global-data-flow-cpp.rst‎
Lines changed: 6 additions & 70 deletions
diff --git a/‎docs/language/ql-training-rst/java/global-data-flow-java.rst‎
Lines changed: 7 additions & 72 deletions b/‎docs/language/ql-training-rst/java/global-data-flow-java.rst‎
Lines changed: 7 additions & 72 deletions
@@ -112,164 +112,13 @@ We need something better.
 
   Here, ``DMLOut`` and ``ExtOut`` are macros that expand to formatting calls. The format specifier is not constant, in the sense that the format argument is not a string literal. However, it is clearly one of two possible constants, both with the same number of format specifiers.
 
-  What we need is a way to determine whether the format argument is ever set to something that is not constant.
+  What we need is a way to determine whether the format argument is ever set to something that is, not constant.
 
-Data flow analysis
-==================
+.. include general data flow slides
 
-- Models flow of data through the program.
-- Implemented in the module ``semmle.code.cpp.dataflow.DataFlow``.
-- Class ``DataFlow::Node`` represents program elements that have a value, such as expressions and function parameters.
+.. include:: ../slide-snippets/local-data-flow.rst
 
-  - Nodes of the data flow graph.
-
-- Various predicated represent flow between these nodes.
-  
-  - Edges of the data flow graph.
-
-.. note::
-
-  The solution here is to use *data flow*. Data flow is, as the name suggests, about tracking the flow of data through the program. It helps answers questions like: *does this expression ever hold a value that originates from a particular other place in the program*?
-
-  We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in the program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two edges.
-
-Data flow graphs
-================
-
-.. container:: column-left
-
-   Example:
-
-   .. code-block:: cpp
-
-      int func(int, tainted) {
-         int x = tainted;
-         if (someCondition) {
-           int y = x;
-           callFoo(y);
-         } else {
-           return x;
-         }
-         return -1;
-      }
- 
-.. container:: column-right
-
-  Data flow graph:
-   
-      .. graphviz::
-         
-            digraph {
-            graph [ dpi = 1000 ]
-            node [shape=polygon,sides=4,color=blue4,style="filled,rounded",   fontname=consolas,fontcolor=white]
-            a [label=<tainted<BR /><FONT POINT-SIZE="10">ParameterNode</FONT>>]
-            b [label=<tainted<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
-            c [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
-            d [label=<x<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
-            e [label=<y<BR /><FONT POINT-SIZE="10">ExprNode</FONT>>]
-   
-            a -> b
-            b -> {c, d}
-            c -> e
-   
-         }
-
-Local vs global data flow
-=========================
-
-- Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
-- Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
-- Different APIs, so discussed separately
-- This slide deck focuses on the former.
-
-.. note::
-
-  For further information, see:
-
-  - `Introduction to data flow analysis in QL <https://help.semmle.com/QL/learn-ql/ql/intro-to-data-flow.html>`__
-  - `Analyzing data flow in C/C++ <https://help.semmle.com/QL/learn-ql/ql/cpp/dataflow.html>`__
-
-.. rst-class:: background2
-
-Local data flow
-===============
-
-Importing data flow
-===================
-
-To use the data flow library, add the following import:
-
-.. code-block:: ql
-
-   import semmle.code.cpp.dataflow.DataFlow
-
-**Note**: this library contains an explicit “module” declaration:
-
-.. code-block:: ql
-
-   module DataFlow {
-     class Node extends ... { ... }
-     predicate localFlow(Node source, Node sink) {
-               localFlowStep*(source, sink)
-            }
-     ... 
-   }
-
-So all references will need to be qualified (that is, ``DataFlow::Node``)
-
-.. note::
-
-  A **query library** is file with the extension ``.qll``. Query libraries do not contain a query clause, but may contain modules, classes, and predicates. For example, the `C/C++ data flow library <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/DataFlow.qll/module.DataFlow.html>`__ is contained in the ``semmle/code/cpp/dataflow/DataFlow.qll`` QLL file, and can be imported as shown above.
-
-  A **module** is a way of organizing QL code by grouping together related predicates, classes, and (sub-)modules. They can be either explicitly declared or implicit. A query library implicitly declares a module with the same name as the QLL file.
-
-  For further information on libraries and modules in QL, see the chapter on `Modules <https://help.semmle.com/QL/ql-handbook/modules.html>`__ in the QL language handbook.
-
-  For further information on importing QL libraries and modules, see the chapter on `Name resolution <https://help.semmle.com/QL/ql-handbook/name-resolution.html>`__ in the QL language handbook.
-
-Data flow graph
-===============
-
-- Class ``DataFlow::Node`` represents data flow graph nodes
-- Predicate ``DataFlow::localFlowStep`` represents local data flow graph edges, ``DataFlow::localFlow`` is its transitive closure
-- Data flow graph nodes are *not* AST nodes, but they correspond to AST nodes, and there are predicates for mapping between them:
-
-  - ``Expr Node.asExpr()``
-  - ``Parameter Node.asParameter()``
-  - ``DataFlow::Node DataFlow::exprNode(Expr e)``
-  - ``DataFlow::Node DataFlow::parameterNode(Parameter p)``
-  - ``etc.``
-
-.. note::
-
-  The ``DataFlow::Node`` class is shared between both the local and global data flow graphs–the primary difference is the edges, which in the “global” case can link different functions.
-
-  ``localFlowStep`` is the “single step” flow relation–that is, it describes single edges in the local data flow graph. ``localFlow`` represents the `transitive <https://help.semmle.com/QL/ql-handbook/recursion.html#transitive-closures>`__ closure of this relation–in other words, it contains every pair of nodes where the second node is reachable from the first in the data flow graph.
-
-  The data flow graph is separate from the `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`__, to allow for flexibility in how data flow is modeled. There are a small number of data flow node types–expression nodes, parameter nodes, uninitialized variable nodes, and definition by reference nodes. Each node provides mapping functions to and from the relevant AST (for example ``Expr``, ``Parameter`` etc.) or symbol table (for example ``Variable``) classes.
-
-Taint tracking
-==============
-
-- Usually, we want to generalise slightly by not only considering plain data flow, but also “taint” propagation, that is, whether a value is influenced by or derived from another.
-
-- Examples:
-
-  .. code-block:: cpp
-  
-    sink = source;        // source -> sink: data and taint
-    strcat(sink, source); // source -> sink: taint, not data
-
-- Library ``semmle.code.cpp.dataflow.TaintTracking`` provides predicates for tracking taint:
-
-  - ``TaintTracking::localTaintStep`` represents one (local) taint step 
-  - ``TaintTracking::localTaint`` is its transitive closure.
-
-.. note::
-
-  Taint tracking can be thought of as another type of data flow graph. It usually extends the standard data flow graph for a problem by adding edges between nodes where one one node influences or *taints* another.
-
-  The `API <https://help.semmle.com/qldoc/cpp/semmle/code/cpp/dataflow/TaintTracking.qll/module.TaintTracking.html>`__ is almost identical to that of the local data flow. All we need to do to switch to taint tracking is ``import semmle.code.cpp.dataflow.TaintTracking`` instead of ``semmle.code.cpp.dataflow.DataFlow``, and instead of using ``localFlow``, we use ``localTaint``.
+.. resume language-specific slides
 
 Exercise: source nodes
 ======================
 
@@ -36,57 +36,12 @@ Agenda
 - Path queries
 - Data flow models
 
-Information flow
-================
-
-- Many security problems can be phrased as an information flow problem:
-
-  Given a (problem-specific) set of sources and sinks, is there a path in the data flow graph from some source to some sink?
-
-- Some examples:
-
-  - SQL injection: sources are user-input, sinks are SQL queries
-  - Reflected XSS: sources are HTTP requests, sinks are HTTP responses
-
-- We can solve such problems using the data flow and taint tracking libraries.
-
-Global data flow and taint tracking
-===================================
-
-- Recap:
-
-  - Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
-  - Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
-
-- For global data flow (and taint tracking), we must therefore provide restrictions to ensure the problem is tractable.
-- Typically, this involves specifying the *source* and *sink*.
-
-.. note::
-
-  As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
-
-  The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
+.. insert common global data flow slides
 
-Global taint tracking library
-=============================
+.. include:: ../slide-snippets/global-data-flow.rst
 
-The ``semmle.code.cpp.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
+.. resume language-specific global data flow slides
 
-  #. Subclass ``TaintTracking::Configuration`` following this template:
-
-     .. code-block:: ql
-    
-       class Config extends TaintTracking::Configuration {
-         Config() { this = "<some unique identifier>" }
-         override predicate isSource(DataFlow::Node nd) { ... }
-         override predicate isSink(DataFlow::Node nd) { ... }
-       }
-
-  #. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
-
-.. note::
-
-  In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.cpp.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.
 
 Finding tainted format strings (outline)
 ========================================
@@ -164,30 +119,11 @@ Use the ``FormattingFunction`` class, we can write the sink as:
 
   When we run this query, we should find a single result. However, it is tricky to determine whether this result is a true positive (a “real” result) because our query only reports the source and the sink, and not the path through the graph between the two.
 
-Path queries
-============
-
-Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view.
+.. insert path queries slides
 
-Use this template:
-
-.. code-block:: ql
-
-   /**
-    * ... 
-    * @kind path-problem
-    */
-   
-   import semmle.code.cpp.dataflow.TaintTracking
-   import DataFlow::PathGraph
-   ...
-   from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
-   where cfg.hasFlowPath(source, sink)
-   select sink, source, sink, "<message>"
-
-.. note::
+.. include:: ../slide-snippets/path-queries.rst
 
-  To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work–we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
+.. resume language-specific global data flow slides
 
 Defining additional taint steps
 ===============================
 
@@ -36,59 +36,13 @@ Agenda
 - Path queries
 - Data flow models
 
-Information flow
-================
-
-- Many security problems can be phrased as an information flow problem:
-
-  Given a (problem-specific) set of sources and sinks, is there a path in the data flow graph from some source to some sink?
-
-- Some examples:
-
-  - SQL injection: sources are user-input, sinks are SQL queries
-  - Reflected XSS: sources are HTTP requests, sinks are HTTP responses
-
-- We can solve such problems using the data flow and taint tracking libraries.
-
-Global data flow and taint tracking
-===================================
-
-- Recap:
-
-  - Local (“intra-procedural”) data flow models flow within one function; feasible to compute for all functions in a snapshot
-  - Global (“inter-procedural”) data flow models flow across function calls; not feasible to compute for all functions in a snapshot
-
-- For global data flow (and taint tracking), we must therefore provided restrictions to ensure the problem is tractable.
-- Typically, this involves specifying the *source* and *sink*.
-
-.. note::
-
-  As we mentioned in the previous slide deck, while local data flow is feasible to compute for all functions in a snapshot, global data flow is not. This is because the number of paths becomes exponentially larger for global data flow.
-
-  The global data flow (and taint tracking) avoids this problem by requiring that the query author specifies which ``sources`` and ``sinks`` are applicable. This allows the implementation to compute paths between the restricted set of nodes, rather than the full graph.
+.. insert common global data flow slides
 
-Global taint-tracking library
-=============================
+.. include:: ../slide-snippets/global-data-flow.rst
 
-The ``semmle.code.java.dataflow.TaintTracking`` library provides a framework for implementing solvers for global taint tracking problems:
+.. resume language-specific global data flow slides
 
-  #. Subclass ``TaintTracking::Configuration`` following this template:
-
-     .. code-block:: ql
-    
-       class Config extends TaintTracking::Configuration {
-         Config() { this = "<some unique identifier>" }
-         override predicate isSource(DataFlow::Node nd) { ... }
-         override predicate isSink(DataFlow::Node nd) { ... }
-       }
-
-  #. Use ``Config.hasFlow(source, sink)`` to find inter-procedural paths.
-
-.. note::
-
-  In addition to the taint tracking configuration described here, there is also an equivalent *data flow* configuration in ``semmle.code.java.dataflow.DataFlow``, ``DataFlow::Configuration``. Data flow configurations are used to track whether the exact value produced by a source is used by a sink, whereas taint tracking configurations are used to determine whether the source may influence the value used at the sink. Whether you use taint tracking or data flow depends on the analysis problem you are trying to solve.
-
-Code injection in Apache Struts
+Code injection in Apache struts
 ===============================
 
 - In April 2018, Man Yue Mo, a security researcher at Semmle, reported 5 remote code execution (RCE) vulnerabilities (CVE-2018-11776) in Apache Struts.
@@ -181,30 +135,11 @@ Find a method access to ``compileAndExecute``, and mark the first argument.
     ...
   }
 
-Path queries
-============
+.. insert path queries slides
 
-Path queries provide information about the identified paths from sources to sinks. Paths can be examined in Path Explorer view.
-
-Use this template:
-
-.. code-block:: ql
-
-   /**
-    * ... 
-    * @kind path-problem
-    */
-   
-   import semmle.code.java.dataflow.TaintTracking
-   import DataFlow::PathGraph
-   ...
-   from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
-   where cfg.hasFlowPath(source, sink)
-   select sink, source, sink, "<message>"
-
-.. note::
+.. include:: ../slide-snippets/path-queries.rst
 
-  To see the paths between the source and the sinks, we can convert the query to a path problem query. There are a few minor changes that need to be made for this to work - we need an additional import, to specify ``PathNode`` rather than ``Node``, and to add the source/sink to the query output (so that we can automatically determine the paths).
+.. resume language-specific global data flow slides
 
 Defining sanitizers
 ===================