github
diff --git a/‎javascript/config/suites/javascript/flow-summaries‎
Lines changed: 2 additions & 0 deletions b/‎javascript/config/suites/javascript/flow-summaries‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎javascript/documentation/flow-summaries.rst‎
Lines changed: 223 additions & 0 deletions b/‎javascript/documentation/flow-summaries.rst‎
Lines changed: 223 additions & 0 deletions
diff --git a/‎javascript/ql/src/Security/Summaries/AllConfigurations.qll‎
Lines changed: 35 additions & 0 deletions b/‎javascript/ql/src/Security/Summaries/AllConfigurations.qll‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎javascript/ql/src/Security/Summaries/ExtractFlowStepSummaries.ql‎
Lines changed: 32 additions & 0 deletions b/‎javascript/ql/src/Security/Summaries/ExtractFlowStepSummaries.ql‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎javascript/ql/src/Security/Summaries/ExtractSinkSummaries.ql‎
Lines changed: 20 additions & 0 deletions b/‎javascript/ql/src/Security/Summaries/ExtractSinkSummaries.ql‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎javascript/ql/src/Security/Summaries/ExtractSourceSummaries.ql‎
Lines changed: 20 additions & 0 deletions b/‎javascript/ql/src/Security/Summaries/ExtractSourceSummaries.ql‎
Lines changed: 20 additions & 0 deletions
@@ -0,0 +1,2 @@
++ semmlecode-javascript-queries/Security/Summaries/ExtractSourceSummaries.ql
++ semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql
@@ -0,0 +1,223 @@
+Summary-based information flow analysis
+=======================================
+
+Overview
+--------
+
+This document presents an approach for running information flow analyses (such as the standard
+Semmle security queries) on an application that depends on one or more npm packages. Instead of
+installing the npm packages during the snapshot build and analyzing them together with application
+code, we analyze each package in isolation and compute *flow summaries* that record information
+about any sources, sinks and flow steps contributed by the package's API. These flow summaries
+are then imported when building a snapshot of the application (usually in the form of CSV files
+added as external data), and are picked up by the standard security queries, allowing them to reason
+about flow into, out of and through the npm packages as though they had been included as part of the
+build.
+
+Motivating example
+------------------
+
+Let us take the `mkdirp <https://www.npmjs.com/package/mkdirp>`_ package as an example. It exports
+a function that takes as its first argument a file system path, and creates a folder with that
+path, as well as any parent folders that do not exist yet. As further arguments, the function
+accepts an optional configuration object and a callback to invoke once the folder has been
+created.
+
+An application might use this package as follows:
+
+.. code-block:: js
+
+  const mkdirp = require('mkdirp');
+  // ...
+  mkdirp(p, opts, function cb(err) {
+    // ...
+  });
+
+If the value of ``p`` can be controlled by an untrusted user, this would allow them to create arbitrary
+folders, which may not be desirable.
+
+By analyzing the application code base together with the source code for the ``mkdirp`` package,
+Semmle's default path injection analysis would be able to track taint through the call to ``mkdirp`` into its
+implementation, which ultimately uses built-in Node.js file system APIs to create the folder. Since
+the path injection analysis has built-in models of these APIs it would then be able to spot and flag this
+vulnerability.
+
+However, analyzing ``mkdirp`` from scratch for every client application is wasteful. Moreover, it would
+in this case be undesirable to flag the location inside ``mkdirp`` where the folder is actually created
+as part of the alert: the developer of the client application did not write that code and hence will
+have a hard time understanding why it is being flagged.
+
+Both of these concerns can be addressed by treating the first argument to ``mkdirp`` as a path injection
+sink in its own right: the analysis no longer needs to track flow into the implementation of ``mkdirp``,
+so we would no longer need to include its source code in the analysis, and the alert would flag the call
+to ``mkdirp`` in application code, not its implementation in library code.
+
+The information that the first parameter of ``mkdirp`` is interpreted as a file system path and hence should
+be considered a path injection sink is an example of a *flow summary*, or more precisely a *sink summary*.
+Besides sink summaries, we also consider *source summaries* and *flow-step summaries*.
+
+In general, a sink summary states that some API interface point (such as a function parameter) should
+be considered a sink for a certain analysis, so if data from a known source reaches this point without
+undergoing appropriate sanitization, it should be flagged with an alert. A sink summary may also
+specify which taint kind the data needs to have in order for the sink to be problematic.
+
+Conversely, a source summary identifies some API (such as the return value of a function) as a source
+of tainted data for a certain analysis, again optionally specifying a taint kind.
+
+Finally, a flow-step summary records the fact that data that flows into the package at some point
+may propagate to another point (for example, from a function parameter to its return value).
+In this case, there are two relevant taint kinds, one describing the kind of taint data has that
+enters, and one describing the taint of the data that emerges. In general, flow steps (like sources
+and sinks) are analysis-specific, since we need to know about sanitizers.
+
+In what follows we will first discuss how summaries are generated from a snapshot of an npm package,
+and then how they are imported when analyzing client code. Finally, we will discuss the format in which
+flow summaries are stored.
+
+Note that flow summaries are considered an experimental feature at this point. Using them involves
+some manual configuration, and we make no guarantee that the API will remain stable.
+
+Generating summaries
+--------------------
+
+Flow summaries of an npm package can be generated by running special summary extraction queries
+either on a snapshot of the package itself, or on a snapshot of a hand-written model of the
+package. (Note that this requires a working installation of Semmle Core.)
+
+There are three default summary extraction queries:
+
+- Extract flow step summaries (``js/step-summary-extraction``,
+  ``Security/Summaries/ExtractSourceSummaries.ql``)
+- Extract sink summaries (``js/sink-summary-extraction``,
+  ``Security/Summaries/ExtractSinkSummaries.ql``)
+- Extract source summaries (``js/source-summary-extraction``,
+  ``Security/Summaries/ExtractSourceSummaries.ql``)
+
+You can run these queries individually against a snapshot of the npm package you want to create
+flow summaries for using ``odasa runQuery``, and store the output as CSV files named
+``additional-steps.csv``, ``additional-sinks.csv`` and ``additional-sources.csv``, respectively.
+
+For example, assuming that folder ``mkdirp-snapshot`` contains a snapshot of the ``mkdirp``
+project, we can extract sink summaries using the command
+
+.. code-block:: bash
+
+  odasa runQuery \
+        --query $SEMMLE_DIST/queries/semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql \
+        --output-file additional-sinks.csv --snapshot mkdirp-snapshot
+
+
+Instead of generating summaries directly from the package source code, you can also generate
+them from a hand-written model of the package. The model should contain a ``package.json`` file
+giving the correct package name, and models for the relevant API entry points. The models are
+plain JavaScript with special comments annotating certain expressions as sources or sinks.
+
+For example, a model of ``mkdirp`` might look like this:
+
+.. code-block:: js
+
+  module.exports = function mkdirp(path) {
+    path /* Semmle: sink: taint, TaintedPath */
+  };
+
+Annotation comments start with ``Semmle:``, and contain ``source`` and ``sink`` specifications.
+Each such specification lists a flow label (in this case, ``taint``) and a configuration to which
+the specification applies (in this case, ``TaintedPath``).
+
+A source specification annotates an expression as being a source of flow with the given label
+for the purposes of the given configuration, and similar for sinks. Annotation comments apply to
+any expression (and more generally any data flow node) whose source location ends on the line
+where the comment starts.
+
+Using summaries
+---------------
+
+Once you have created summaries using the approach outlined above, you have two options for
+including them in the analysis of a client application.
+
+External data
+:::::::::::::
+
+Firstly, you can include the CSV files generated by running the extraction queries as external
+data when building a snapshot of the client application by copying them into the
+``$snapshot/external/data`` folder. This is typically done by including a command like this
+in your ``project`` file:
+
+.. code-block:: xml
+
+  <build>cp /path/to/additional-sinks.csv ${snapshot}/external/data</build>
+
+If you want to include summaries for multiple libraries, you have to concatenate the
+corresponding CSV files before copying them into the external data folder.
+
+Additionally, you need to import the library ``Security.Summaries.ImportFromCsv`` in your
+``javascript.qll``, which will pick up the summaries from external data and interpret them
+as additional sources, sinks and flow steps:
+
+.. code-block:: ql
+
+  import Security.Summaries.ImportFromCsv
+
+After these preparatory steps, you can run your analysis without any further changes.
+
+External predicates
+:::::::::::::::::::
+
+The second method for including flow summaries is by including the
+``Security.Summaries.ImportFromExternalPredicates`` library in your analysis, which declares
+three external predicates ``additionalSteps``, ``additionalSinks`` and ``additionalSources`` that
+need to be instantiated with the flow summary CSV data.
+
+This is most easily done in QL for Eclipse, which will prompt you for CSV files to populate
+the three predicates.
+
+This approach has the advantage that you do not need to include the CSV files during the
+snapshot build, so you can use an existing snapshot, for example as downloaded from LGTM.com.
+
+Summary format
+--------------
+
+Source and sink summaries are specified as tuples of the form ``(portal, kind, configuration)``,
+where ``portal`` is a description of the API element being marked as a source or sink, ``kind``
+is a flow label (also known as "taint kind") describing the kind of information being generated
+or consumed, and ``configuration`` specifies which flow configuration the summary applies to.
+
+If ``kind`` is empty, it defaults to ``data`` for sources and either ``data`` or ``taint`` for sinks.
+If ``configuration`` is empty, the specification applies to all configurations.
+The default extraction queries never produce empty ``kind`` or ``configuration`` columns.
+
+Similarly, step summaries are tuples of the form
+``(inPortal, inKind, outPortal, outKind, configuration)``, stating that information with label
+``inKind`` that flows into ``inPortal`` resurfaces from ``outPortal``, now having kind ``outKind``.
+As before, ``configuration`` specifies which configuration this information applies to.
+
+In all of the above, ``portal`` is an S-expression that abstractly describes a *portal*, that is,
+an API interface point by which data may enter or leave the npm package being analyzed.
+
+Currently, we model five kinds of portals:
+
+- ``(root <uri>)``, representing the ``module`` object of the main module of the npm package
+  described by ``<uri>``, which is a URL of the form ``https://www.npmjs.com/package/<pkg>``;
+- ``(member <name> <base>)``, representing property ``<name>`` of an object described by
+  portal ``<base>``;
+- ``(instance <base>)``, representing an instance of a (constructor) function or class
+  described by portal ``base``;
+- ``(parameter <i> <base>)``, representing the ``i`` th parameter of a function described by
+  portal ``base``;
+- ``(return <base>)``, representing the return value of a function described by portal ``base``.
+
+In our example above, the first parameter of the default export of package ``mkdirp`` is
+described by the portal
+
+.. code-block:: lisp
+
+  (parameter (member (root https://www.npmjs.com/package/mkdirp) default) 0)
+
+As a more complicated example,
+
+.. code-block:: lisp
+
+  (parameter (parameter (member (instance (member (root https://www.npmjs.com/package/bluebird) Promise)) then) 1) 0)
+
+describes the first parameter of a function passed as second argument to the ``then`` method of
+the ``Promise`` constructor exported by package ``bluebird``.
@@ -0,0 +1,35 @@
+/**
+ * Imports the standard library and all taint-tracking configuration classes from the security queries.
+ */
+
+import javascript
+import semmle.javascript.security.dataflow.BrokenCryptoAlgorithm
+import semmle.javascript.security.dataflow.CleartextLogging
+import semmle.javascript.security.dataflow.CleartextStorage
+import semmle.javascript.security.dataflow.ClientSideUrlRedirect
+import semmle.javascript.security.dataflow.CodeInjection
+import semmle.javascript.security.dataflow.CommandInjection
+import semmle.javascript.security.dataflow.ConditionalBypass
+import semmle.javascript.security.dataflow.CorsMisconfigurationForCredentials
+import semmle.javascript.security.dataflow.DifferentKindsComparisonBypass
+import semmle.javascript.security.dataflow.DomBasedXss as DomBasedXss
+import semmle.javascript.security.dataflow.FileAccessToHttp
+import semmle.javascript.security.dataflow.HardcodedCredentials
+import semmle.javascript.security.dataflow.InsecureRandomness
+import semmle.javascript.security.dataflow.InsufficientPasswordHash
+import semmle.javascript.security.dataflow.NosqlInjection
+import semmle.javascript.security.dataflow.ReflectedXss as ReflectedXss
+import semmle.javascript.security.dataflow.RegExpInjection
+import semmle.javascript.security.dataflow.RemotePropertyInjection
+import semmle.javascript.security.dataflow.RequestForgery
+import semmle.javascript.security.dataflow.ServerSideUrlRedirect
+import semmle.javascript.security.dataflow.SqlInjection
+import semmle.javascript.security.dataflow.StackTraceExposure
+import semmle.javascript.security.dataflow.StoredXss as StoredXss
+import semmle.javascript.security.dataflow.TaintedFormatString
+import semmle.javascript.security.dataflow.TaintedPath
+import semmle.javascript.security.dataflow.TypeConfusionThroughParameterTampering
+import semmle.javascript.security.dataflow.UnsafeDeserialization
+import semmle.javascript.security.dataflow.XmlBomb
+import semmle.javascript.security.dataflow.XpathInjection
+import semmle.javascript.security.dataflow.Xxe
@@ -0,0 +1,32 @@
+/**
+ * @name Extract flow step summaries
+ * @description Extracts flow step summaries, that is, tuples `(p1, lbl1, p2, lbl2, cfg)`
+ *              representing the fact that data with flow label `lbl1` may flow from a
+ *              user-controlled exit node of portal `p1` to an escaping entry node of portal `p2`,
+ *              and have label `lbl2` at that point. Moreover, the path from `p1` to `p2` contains
+ *              no sanitizers specified by configuration `cfg`.
+ * @kind flow-step-summary
+ * @id js/step-summary-extraction
+ */
+
+import AllConfigurations
+import PortalExitSource
+import PortalEntrySink
+
+from
+  TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p1,
+  Portal p2, DataFlow::FlowLabel lbl1, DataFlow::FlowLabel lbl2
+where
+  cfg.hasFlowPath(source, sink) and
+  p1 = source.getNode().(PortalExitSource).getPortal() and
+  p2 = sink.getNode().(PortalEntrySink).getPortal() and
+  lbl1 = sink.getPathSummary().getStartLabel() and
+  lbl2 = sink.getPathSummary().getEndLabel() and
+  // avoid constructing infeasible paths
+  sink.getPathSummary().hasCall() = false and
+  sink.getPathSummary().hasReturn() = false and
+  // restrict to steps flow function parameters to returns
+  p1.(ParameterPortal).getBasePortal() = p2.(ReturnPortal).getBasePortal() and
+  // restrict to data/taint flow
+  lbl1 instanceof DataFlow::StandardFlowLabel
+select p1.toString(), lbl1.toString(), p2.toString(), lbl2.toString(), cfg.toString()
@@ -0,0 +1,20 @@
+/**
+ * @name Extract sink summaries
+ * @description Extracts sink summaries, that is, tuples `(p, lbl, cfg)` representing the fact
+ *              that data with flow label `lbl` may flow from a user-controlled exit node of portal
+ *              `p` to a known sink for configuration `cfg`.
+ * @kind sink-summary
+ * @id js/sink-summary-extraction
+ */
+
+import AllConfigurations
+import PortalExitSource
+import SinkFromAnnotation
+
+from TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p
+where
+  cfg.hasFlowPath(source, sink) and
+  p = source.getNode().(PortalExitSource).getPortal() and
+  // avoid constructing infeasible paths
+  sink.getPathSummary().hasReturn() = false
+select p.toString(), source.getPathSummary().getStartLabel().toString(), cfg.toString()
@@ -0,0 +1,20 @@
+/**
+ * @name Extract source summaries
+ * @description Extracts source summaries, that is, tuples `(p, lbl, cfg)` representing the fact
+ *              that data may flow from a known source for configuration `cfg` to an escaping entry
+ *              node of portal `p`, and have flow label `lbl` at that point.
+ * @kind source-summary
+ * @id js/source-summary-extraction
+ */
+
+import AllConfigurations
+import PortalEntrySink
+import SourceFromAnnotation
+
+from TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p
+where
+  cfg.hasFlowPath(source, sink) and
+  p = sink.getNode().(PortalEntrySink).getPortal() and
+  // avoid constructing infeasible paths
+  sink.getPathSummary().hasCall() = false
+select p.toString(), sink.getPathSummary().getEndLabel().toString(), cfg.toString()
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`++ semmlecode-javascript-queries/Security/Summaries/ExtractSourceSummaries.ql`
	`2`	`++ semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql`