Skip to content

Commit b0dd3df

Browse files
authored
Merge pull request #502 from xiemaisi/js/summaries
Approved by asger-semmle
2 parents 28261d6 + 583734a commit b0dd3df

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+4571
-6
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
+ semmlecode-javascript-queries/Security/Summaries/ExtractSourceSummaries.ql
2+
+ semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
Summary-based information flow analysis
2+
=======================================
3+
4+
Overview
5+
--------
6+
7+
This document presents an approach for running information flow analyses (such as the standard
8+
Semmle security queries) on an application that depends on one or more npm packages. Instead of
9+
installing the npm packages during the snapshot build and analyzing them together with application
10+
code, we analyze each package in isolation and compute *flow summaries* that record information
11+
about any sources, sinks and flow steps contributed by the package's API. These flow summaries
12+
are then imported when building a snapshot of the application (usually in the form of CSV files
13+
added as external data), and are picked up by the standard security queries, allowing them to reason
14+
about flow into, out of and through the npm packages as though they had been included as part of the
15+
build.
16+
17+
Motivating example
18+
------------------
19+
20+
Let us take the `mkdirp <https://www.npmjs.com/package/mkdirp>`_ package as an example. It exports
21+
a function that takes as its first argument a file system path, and creates a folder with that
22+
path, as well as any parent folders that do not exist yet. As further arguments, the function
23+
accepts an optional configuration object and a callback to invoke once the folder has been
24+
created.
25+
26+
An application might use this package as follows:
27+
28+
.. code-block:: js
29+
30+
const mkdirp = require('mkdirp');
31+
// ...
32+
mkdirp(p, opts, function cb(err) {
33+
// ...
34+
});
35+
36+
If the value of ``p`` can be controlled by an untrusted user, this would allow them to create arbitrary
37+
folders, which may not be desirable.
38+
39+
By analyzing the application code base together with the source code for the ``mkdirp`` package,
40+
Semmle's default path injection analysis would be able to track taint through the call to ``mkdirp`` into its
41+
implementation, which ultimately uses built-in Node.js file system APIs to create the folder. Since
42+
the path injection analysis has built-in models of these APIs it would then be able to spot and flag this
43+
vulnerability.
44+
45+
However, analyzing ``mkdirp`` from scratch for every client application is wasteful. Moreover, it would
46+
in this case be undesirable to flag the location inside ``mkdirp`` where the folder is actually created
47+
as part of the alert: the developer of the client application did not write that code and hence will
48+
have a hard time understanding why it is being flagged.
49+
50+
Both of these concerns can be addressed by treating the first argument to ``mkdirp`` as a path injection
51+
sink in its own right: the analysis no longer needs to track flow into the implementation of ``mkdirp``,
52+
so we would no longer need to include its source code in the analysis, and the alert would flag the call
53+
to ``mkdirp`` in application code, not its implementation in library code.
54+
55+
The information that the first parameter of ``mkdirp`` is interpreted as a file system path and hence should
56+
be considered a path injection sink is an example of a *flow summary*, or more precisely a *sink summary*.
57+
Besides sink summaries, we also consider *source summaries* and *flow-step summaries*.
58+
59+
In general, a sink summary states that some API interface point (such as a function parameter) should
60+
be considered a sink for a certain analysis, so if data from a known source reaches this point without
61+
undergoing appropriate sanitization, it should be flagged with an alert. A sink summary may also
62+
specify which taint kind the data needs to have in order for the sink to be problematic.
63+
64+
Conversely, a source summary identifies some API (such as the return value of a function) as a source
65+
of tainted data for a certain analysis, again optionally specifying a taint kind.
66+
67+
Finally, a flow-step summary records the fact that data that flows into the package at some point
68+
may propagate to another point (for example, from a function parameter to its return value).
69+
In this case, there are two relevant taint kinds, one describing the kind of taint data has that
70+
enters, and one describing the taint of the data that emerges. In general, flow steps (like sources
71+
and sinks) are analysis-specific, since we need to know about sanitizers.
72+
73+
In what follows we will first discuss how summaries are generated from a snapshot of an npm package,
74+
and then how they are imported when analyzing client code. Finally, we will discuss the format in which
75+
flow summaries are stored.
76+
77+
Note that flow summaries are considered an experimental feature at this point. Using them involves
78+
some manual configuration, and we make no guarantee that the API will remain stable.
79+
80+
Generating summaries
81+
--------------------
82+
83+
Flow summaries of an npm package can be generated by running special summary extraction queries
84+
either on a snapshot of the package itself, or on a snapshot of a hand-written model of the
85+
package. (Note that this requires a working installation of Semmle Core.)
86+
87+
There are three default summary extraction queries:
88+
89+
- Extract flow step summaries (``js/step-summary-extraction``,
90+
``Security/Summaries/ExtractSourceSummaries.ql``)
91+
- Extract sink summaries (``js/sink-summary-extraction``,
92+
``Security/Summaries/ExtractSinkSummaries.ql``)
93+
- Extract source summaries (``js/source-summary-extraction``,
94+
``Security/Summaries/ExtractSourceSummaries.ql``)
95+
96+
You can run these queries individually against a snapshot of the npm package you want to create
97+
flow summaries for using ``odasa runQuery``, and store the output as CSV files named
98+
``additional-steps.csv``, ``additional-sinks.csv`` and ``additional-sources.csv``, respectively.
99+
100+
For example, assuming that folder ``mkdirp-snapshot`` contains a snapshot of the ``mkdirp``
101+
project, we can extract sink summaries using the command
102+
103+
.. code-block:: bash
104+
105+
odasa runQuery \
106+
--query $SEMMLE_DIST/queries/semmlecode-javascript-queries/Security/Summaries/ExtractSinkSummaries.ql \
107+
--output-file additional-sinks.csv --snapshot mkdirp-snapshot
108+
109+
110+
Instead of generating summaries directly from the package source code, you can also generate
111+
them from a hand-written model of the package. The model should contain a ``package.json`` file
112+
giving the correct package name, and models for the relevant API entry points. The models are
113+
plain JavaScript with special comments annotating certain expressions as sources or sinks.
114+
115+
For example, a model of ``mkdirp`` might look like this:
116+
117+
.. code-block:: js
118+
119+
module.exports = function mkdirp(path) {
120+
path /* Semmle: sink: taint, TaintedPath */
121+
};
122+
123+
Annotation comments start with ``Semmle:``, and contain ``source`` and ``sink`` specifications.
124+
Each such specification lists a flow label (in this case, ``taint``) and a configuration to which
125+
the specification applies (in this case, ``TaintedPath``).
126+
127+
A source specification annotates an expression as being a source of flow with the given label
128+
for the purposes of the given configuration, and similar for sinks. Annotation comments apply to
129+
any expression (and more generally any data flow node) whose source location ends on the line
130+
where the comment starts.
131+
132+
Using summaries
133+
---------------
134+
135+
Once you have created summaries using the approach outlined above, you have two options for
136+
including them in the analysis of a client application.
137+
138+
External data
139+
:::::::::::::
140+
141+
Firstly, you can include the CSV files generated by running the extraction queries as external
142+
data when building a snapshot of the client application by copying them into the
143+
``$snapshot/external/data`` folder. This is typically done by including a command like this
144+
in your ``project`` file:
145+
146+
.. code-block:: xml
147+
148+
<build>cp /path/to/additional-sinks.csv ${snapshot}/external/data</build>
149+
150+
If you want to include summaries for multiple libraries, you have to concatenate the
151+
corresponding CSV files before copying them into the external data folder.
152+
153+
Additionally, you need to import the library ``Security.Summaries.ImportFromCsv`` in your
154+
``javascript.qll``, which will pick up the summaries from external data and interpret them
155+
as additional sources, sinks and flow steps:
156+
157+
.. code-block:: ql
158+
159+
import Security.Summaries.ImportFromCsv
160+
161+
After these preparatory steps, you can run your analysis without any further changes.
162+
163+
External predicates
164+
:::::::::::::::::::
165+
166+
The second method for including flow summaries is by including the
167+
``Security.Summaries.ImportFromExternalPredicates`` library in your analysis, which declares
168+
three external predicates ``additionalSteps``, ``additionalSinks`` and ``additionalSources`` that
169+
need to be instantiated with the flow summary CSV data.
170+
171+
This is most easily done in QL for Eclipse, which will prompt you for CSV files to populate
172+
the three predicates.
173+
174+
This approach has the advantage that you do not need to include the CSV files during the
175+
snapshot build, so you can use an existing snapshot, for example as downloaded from LGTM.com.
176+
177+
Summary format
178+
--------------
179+
180+
Source and sink summaries are specified as tuples of the form ``(portal, kind, configuration)``,
181+
where ``portal`` is a description of the API element being marked as a source or sink, ``kind``
182+
is a flow label (also known as "taint kind") describing the kind of information being generated
183+
or consumed, and ``configuration`` specifies which flow configuration the summary applies to.
184+
185+
If ``kind`` is empty, it defaults to ``data`` for sources and either ``data`` or ``taint`` for sinks.
186+
If ``configuration`` is empty, the specification applies to all configurations.
187+
The default extraction queries never produce empty ``kind`` or ``configuration`` columns.
188+
189+
Similarly, step summaries are tuples of the form
190+
``(inPortal, inKind, outPortal, outKind, configuration)``, stating that information with label
191+
``inKind`` that flows into ``inPortal`` resurfaces from ``outPortal``, now having kind ``outKind``.
192+
As before, ``configuration`` specifies which configuration this information applies to.
193+
194+
In all of the above, ``portal`` is an S-expression that abstractly describes a *portal*, that is,
195+
an API interface point by which data may enter or leave the npm package being analyzed.
196+
197+
Currently, we model five kinds of portals:
198+
199+
- ``(root <uri>)``, representing the ``module`` object of the main module of the npm package
200+
described by ``<uri>``, which is a URL of the form ``https://www.npmjs.com/package/<pkg>``;
201+
- ``(member <name> <base>)``, representing property ``<name>`` of an object described by
202+
portal ``<base>``;
203+
- ``(instance <base>)``, representing an instance of a (constructor) function or class
204+
described by portal ``base``;
205+
- ``(parameter <i> <base>)``, representing the ``i`` th parameter of a function described by
206+
portal ``base``;
207+
- ``(return <base>)``, representing the return value of a function described by portal ``base``.
208+
209+
In our example above, the first parameter of the default export of package ``mkdirp`` is
210+
described by the portal
211+
212+
.. code-block:: lisp
213+
214+
(parameter (member (root https://www.npmjs.com/package/mkdirp) default) 0)
215+
216+
As a more complicated example,
217+
218+
.. code-block:: lisp
219+
220+
(parameter (parameter (member (instance (member (root https://www.npmjs.com/package/bluebird) Promise)) then) 1) 0)
221+
222+
describes the first parameter of a function passed as second argument to the ``then`` method of
223+
the ``Promise`` constructor exported by package ``bluebird``.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
/**
2+
* Imports the standard library and all taint-tracking configuration classes from the security queries.
3+
*/
4+
5+
import javascript
6+
import semmle.javascript.security.dataflow.BrokenCryptoAlgorithm
7+
import semmle.javascript.security.dataflow.CleartextLogging
8+
import semmle.javascript.security.dataflow.CleartextStorage
9+
import semmle.javascript.security.dataflow.ClientSideUrlRedirect
10+
import semmle.javascript.security.dataflow.CodeInjection
11+
import semmle.javascript.security.dataflow.CommandInjection
12+
import semmle.javascript.security.dataflow.ConditionalBypass
13+
import semmle.javascript.security.dataflow.CorsMisconfigurationForCredentials
14+
import semmle.javascript.security.dataflow.DifferentKindsComparisonBypass
15+
import semmle.javascript.security.dataflow.DomBasedXss as DomBasedXss
16+
import semmle.javascript.security.dataflow.FileAccessToHttp
17+
import semmle.javascript.security.dataflow.HardcodedCredentials
18+
import semmle.javascript.security.dataflow.InsecureRandomness
19+
import semmle.javascript.security.dataflow.InsufficientPasswordHash
20+
import semmle.javascript.security.dataflow.NosqlInjection
21+
import semmle.javascript.security.dataflow.ReflectedXss as ReflectedXss
22+
import semmle.javascript.security.dataflow.RegExpInjection
23+
import semmle.javascript.security.dataflow.RemotePropertyInjection
24+
import semmle.javascript.security.dataflow.RequestForgery
25+
import semmle.javascript.security.dataflow.ServerSideUrlRedirect
26+
import semmle.javascript.security.dataflow.SqlInjection
27+
import semmle.javascript.security.dataflow.StackTraceExposure
28+
import semmle.javascript.security.dataflow.StoredXss as StoredXss
29+
import semmle.javascript.security.dataflow.TaintedFormatString
30+
import semmle.javascript.security.dataflow.TaintedPath
31+
import semmle.javascript.security.dataflow.TypeConfusionThroughParameterTampering
32+
import semmle.javascript.security.dataflow.UnsafeDeserialization
33+
import semmle.javascript.security.dataflow.XmlBomb
34+
import semmle.javascript.security.dataflow.XpathInjection
35+
import semmle.javascript.security.dataflow.Xxe
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
/**
2+
* @name Extract flow step summaries
3+
* @description Extracts flow step summaries, that is, tuples `(p1, lbl1, p2, lbl2, cfg)`
4+
* representing the fact that data with flow label `lbl1` may flow from a
5+
* user-controlled exit node of portal `p1` to an escaping entry node of portal `p2`,
6+
* and have label `lbl2` at that point. Moreover, the path from `p1` to `p2` contains
7+
* no sanitizers specified by configuration `cfg`.
8+
* @kind flow-step-summary
9+
* @id js/step-summary-extraction
10+
*/
11+
12+
import AllConfigurations
13+
import PortalExitSource
14+
import PortalEntrySink
15+
16+
from
17+
TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p1,
18+
Portal p2, DataFlow::FlowLabel lbl1, DataFlow::FlowLabel lbl2
19+
where
20+
cfg.hasFlowPath(source, sink) and
21+
p1 = source.getNode().(PortalExitSource).getPortal() and
22+
p2 = sink.getNode().(PortalEntrySink).getPortal() and
23+
lbl1 = sink.getPathSummary().getStartLabel() and
24+
lbl2 = sink.getPathSummary().getEndLabel() and
25+
// avoid constructing infeasible paths
26+
sink.getPathSummary().hasCall() = false and
27+
sink.getPathSummary().hasReturn() = false and
28+
// restrict to steps flow function parameters to returns
29+
p1.(ParameterPortal).getBasePortal() = p2.(ReturnPortal).getBasePortal() and
30+
// restrict to data/taint flow
31+
lbl1 instanceof DataFlow::StandardFlowLabel
32+
select p1.toString(), lbl1.toString(), p2.toString(), lbl2.toString(), cfg.toString()
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/**
2+
* @name Extract sink summaries
3+
* @description Extracts sink summaries, that is, tuples `(p, lbl, cfg)` representing the fact
4+
* that data with flow label `lbl` may flow from a user-controlled exit node of portal
5+
* `p` to a known sink for configuration `cfg`.
6+
* @kind sink-summary
7+
* @id js/sink-summary-extraction
8+
*/
9+
10+
import AllConfigurations
11+
import PortalExitSource
12+
import SinkFromAnnotation
13+
14+
from TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p
15+
where
16+
cfg.hasFlowPath(source, sink) and
17+
p = source.getNode().(PortalExitSource).getPortal() and
18+
// avoid constructing infeasible paths
19+
sink.getPathSummary().hasReturn() = false
20+
select p.toString(), source.getPathSummary().getStartLabel().toString(), cfg.toString()
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/**
2+
* @name Extract source summaries
3+
* @description Extracts source summaries, that is, tuples `(p, lbl, cfg)` representing the fact
4+
* that data may flow from a known source for configuration `cfg` to an escaping entry
5+
* node of portal `p`, and have flow label `lbl` at that point.
6+
* @kind source-summary
7+
* @id js/source-summary-extraction
8+
*/
9+
10+
import AllConfigurations
11+
import PortalEntrySink
12+
import SourceFromAnnotation
13+
14+
from TaintTracking::Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink, Portal p
15+
where
16+
cfg.hasFlowPath(source, sink) and
17+
p = sink.getNode().(PortalEntrySink).getPortal() and
18+
// avoid constructing infeasible paths
19+
sink.getPathSummary().hasCall() = false
20+
select p.toString(), sink.getPathSummary().getEndLabel().toString(), cfg.toString()

0 commit comments

Comments
 (0)