Skip to content

Commit e92a1c3

Browse files
Max Schaeferjf205
andcommitted
JavaScript: Apply suggestions from code review
Co-Authored-By: jf205 <42464962+jf205@users.noreply.github.com>
1 parent 967a578 commit e92a1c3

File tree

1 file changed

+71
-66
lines changed

1 file changed

+71
-66
lines changed

javascript/documentation/library-customization.rst

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -9,40 +9,39 @@ Customization mechanisms
99

1010
The two mechanisms used for customization are subclassing and overriding.
1111

12-
By subclassing an abstract class used by the JavaScript analysis and implementing its abstract
13-
member predicates we can teach the analysis to handle further instances of abstract concepts it
14-
already understands. For example, the standard library defines an abstract class
15-
``SystemCommandExecution`` that covers various APIs for executing operating-system commands. This
16-
class is used by the command-injection analysis to identify problematic flows where input from a
17-
potentially malicious user is interpreted as the name of a system command to execute. By defining
18-
additional subclasses of ``SystemCommandExecution``, we can make this analysis more powerful without
19-
touching its implementation.
12+
We can teach the JavaScript analysis to handle further instances of abstract concepts it already
13+
understands by subclassing abstract classes and implementing their member predicates. For example,
14+
the standard library defines an abstract class ``SystemCommandExecution`` that covers various APIs
15+
for executing operating-system commands. This class is used by the command-injection analysis to
16+
identify problematic flows where input from a potentially malicious user is interpreted as the name
17+
of a system command to execute. By defining additional subclasses of ``SystemCommandExecution``, we
18+
can make this analysis more powerful without touching its implementation.
2019

2120
By overriding a member predicate defined in the library, we can change its behavior either for all
2221
its receivers or only a subset. For example, the standard library predicate
2322
``ControlFlowNode::getASuccessor`` implements the basic control-flow graph on which many further
24-
analyses are based. By overriding it, we can add, suppress or modify control-flow graph edges.
23+
analyses are based. By overriding it, we can add, suppress, or modify control-flow graph edges.
2524

2625
Once a customization has been defined, it needs to be brought into scope so that the default
2726
analysis queries pick it up. This can be done by adding the customizing definitions to
2827
``Customizations.qll``, an initially empty library file that is imported by the default library
2928
``javascript.qll``.
3029

31-
Sometimes you may want to perform both kinds of customizations at the same time: subclass a base
30+
Sometimes you may want to perform both kinds of customizations at the same time. That is, subclass a base
3231
class to provide new implementations of an API, and override some member predicates of the same base
3332
class to selectively change the implementation of the API. This is not always easy to do, since the
3433
former requires the base class to be abstract, while the latter requires it to be concrete.
3534

36-
To work around this, the JavaScript library uses the so-called `range pattern`: the base class
35+
To work around this, the JavaScript library uses the so-called *range pattern*. In this pattern, the base class
3736
``Base`` itself is concrete, but it has an abstract companion class called ``Base::Range`` covering
3837
the same set of values. To change the implementation of the API, subclass ``Base`` and override its
3938
member predicates. To provide new implementations of the API, subclass ``Base::Range`` and implement
4039
its abstract member predicates.
4140

4241
For example, the class ``Base64::Encode`` in the standard library models base64-encoding libraries
43-
using the range pattern. To add support for a new library, subclass ``Base64::Encode::Range`` and
44-
implement the member predicates ``getInput`` and ``getOutput``. (Subclasses for many popular base64
45-
encoders are included in the standard library.) To customize the definition of ``getInput`` or
42+
using the range pattern. It comes with subclasses corresponding to many popular base64 encoders. To
43+
add support for a new library, subclass ``Base64::Encode::Range`` and implement the member
44+
predicates ``getInput`` and ``getOutput``. To customize the definition of ``getInput`` or
4645
``getOutput`` for a library that is already supported, extend ``Base64::Encode`` itself and override
4746
the predicate you want to customize.
4847

@@ -57,57 +56,59 @@ The JavaScript analysis libraries have a layered structure with higher-level ana
5756
lower-level ones. Usually, classes and predicates in a lower layer should not depend on a higher
5857
layer to avoid performance problems and non-monotonic recursion.
5958

60-
We briefly survey the most important analysis layers here, starting from the lowest layer. Below we
61-
will discuss the extension points offered by the individual layers.
59+
In this section, we briefly introduce the most important analysis layers, starting from the lowest
60+
layer. Below, we discuss the extension points offered by the individual layers.
6261

63-
AST
64-
~~~
62+
Abstract syntax tree
63+
~~~~~~~~~~~~~~~~~~~~
6564

66-
The abstract syntax tree, implemented by class ``ASTNode`` and its subclasses, is the lowest layer
67-
and more or less directly represents the information stored in the snapshot data base. It
65+
The abstract syntax tree (AST), implemented by class ``ASTNode`` and its subclasses, is the lowest layer
66+
and is a good representation of the information stored in the snapshot database. It
6867
corresponds closely to the syntactic structure of the program, only abstracting away from
6968
typographical details such as whitespace and indentation.
7069

71-
CFG
72-
~~~
70+
Control-flow graph
71+
~~~~~~~~~~~~~~~~~~
7372

74-
The (intra-procedural) control-flow graph, implemented by class ``ControlFlowNode`` and its
75-
subclasses, is the next higher level. It models flow of control inside functions and top-level
76-
scripts, and is overlaid on top of the AST in that each AST node has a corresponding CFG node. There
77-
are also synthetic CFG nodes that do not correspond to an AST node: entry and exit nodes
78-
(``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end, respectively, of
79-
the execution of a function or top-level, while guard nodes (``GuardControlFlowNode``) record the
80-
fact that some condition is known to hold at some point in the program.
73+
The (intra-procedural) control-flow graph (CFG), implemented by class ``ControlFlowNode`` and its
74+
subclasses, is the next layer. It models flow of control inside functions and top-level scripts. The
75+
CFG is overlaid on top of the AST, meaning that each AST node has a corresponding CFG node. There
76+
are also synthetic CFG nodes that do not correspond to an AST node. For example, entry and exit
77+
nodes (``ControlFlowEntryNode`` and ``ControlFlowExitNode``) mark the beginning and end,
78+
respectively, of the execution of a function or top-level script, while guard nodes
79+
(``GuardControlFlowNode``) record that some condition is known to hold at some point in the program.
8180

8281
Basic blocks (class ``BasicBlock``) organize control-flow nodes into maximal sequences of
8382
straight-line code, which is vital for efficiently reasoning about control flow.
8483

85-
SSA
86-
~~~
84+
Static single-assignment form
85+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8786

88-
The static single-assignment representation (class ``SsaVariable`` and ``SsaDefinition``) uses
87+
The static single-assignment (SSA) representation (class ``SsaVariable`` and ``SsaDefinition``) uses
8988
control-flow information to split up local variables into SSA variables that each only have a single
90-
definition. In addition to regular definitions from assignments and increment/decrement expressions,
91-
the SSA form also introduces pseudo-definitions such as `phi nodes` where multiple possible values
92-
for a variable are merged and `refinement nodes` (also known as `pi nodes`) marking program points
93-
where additional information about a variable becomes available that may restrict its possible set
94-
of values.
89+
definition.
90+
91+
In addition to regular definitions corresponding to assignments and increment/decrement expressions,
92+
the SSA form also introduces pseudo-definitions such as
93+
94+
- `phi nodes` where multiple possible values for a variable are merged
95+
- `refinement nodes` (also known as `pi nodes`) marking program points where additional information about a variable becomes available that may restrict its possible set of values.
9596

9697
Local data flow
9798
~~~~~~~~~~~~~~~
9899

99100
The (intra-procedural) data-flow graph, implemented by class ``DataFlow::Node`` and its subclasses,
100-
represents the flow of data within a function or top-level. Each expression has a corresponding
101-
data-flow node. Additionally, there are data-flow nodes that do not correspond to syntactic
102-
elements; for example, each SSA variable has a corresponding data-flow node. Note that flow between
103-
functions (through arguments and return values) is not modelled in this layer, except for the
104-
special case of immediately-invoked function expressions. Flow through object properties is also not
105-
modelled.
106-
107-
This layer also implements the widely-used source-node API: class ``DataFlow::SourceNode`` and its
101+
represents the flow of data within a function or top-level scripts. Each expression has a
102+
corresponding data-flow node. Additionally, there are data-flow nodes that do not correspond to
103+
syntactic elements. For example, each SSA variable has a corresponding data-flow node. Note that
104+
flow between functions (through arguments and return values) is not modeled in this layer, except
105+
for the special case of immediately-invoked function expressions. Flow through object properties is
106+
also not modeled.
107+
108+
This layer also implements the widely-used source-node API. The class ``DataFlow::SourceNode`` and its
108109
subclasses represent data-flow nodes where new objects are created (such as object expressions), or
109110
where non-local data flow enters the intra-procedural data-flow graph (such as function parameters
110-
or property reads). The source-node API provides convenience predicates for reasoning about these
111+
or property reads). The source-node API provides convenient predicates for reasoning about these
111112
nodes without having to explicitly encode data-flow graph traversal.
112113

113114
Type inference
@@ -121,7 +122,7 @@ Call graph
121122
~~~~~~~~~~
122123

123124
The call graph is implemented as a predicate ``getACallee`` on ``DataFlow::InvokeNode``, the class
124-
of data-flow nodes representing function calls (with or wihout ``new``). It uses local data flow and
125+
of data-flow nodes representing function calls (with or without ``new``). It uses local data flow and
125126
type information, as well as type annotations where available.
126127

127128
Type tracking
@@ -135,8 +136,8 @@ Framework models
135136
~~~~~~~~~~~~~~~~
136137

137138
The libraries under ``semmle/javascript/frameworks`` model a broad range of popular JavaScript
138-
libraries and frameworks, such as Express or Vue.js. Some framework modeling libraries are located
139-
under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients`` and ``JsonParsers``.
139+
libraries and frameworks, such as Express and Vue.js. Some framework modeling libraries are located
140+
under ``semmle/javascript`` directly, for instance ``Base64``, ``EmailClients``, and ``JsonParsers``.
140141

141142
Global data flow and taint tracking
142143
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -147,13 +148,13 @@ information-flow analyses. Most of our security queries are based on this approa
147148
Extension points
148149
----------------
149150

150-
Below we discuss the most important extension points for the individual analysis layers introduced
151+
In this section, we discuss the most important extension points for the individual analysis layers introduced
151152
above.
152153

153154
AST
154155
~~~
155156

156-
This layer should not normally be customized. It is technically possible to override, say,
157+
This layer should not normally be customized. It is technically possible to override, for instance,
157158
``ASTNode.getChild`` to change the way the AST structure is represented, but this should normally be
158159
avoided in the interest of keeping a close correspondence between AST and concrete syntax.
159160

@@ -174,20 +175,20 @@ Local data flow
174175

175176
The ``DataFlow::SourceNode`` class uses the range pattern, so new kinds of source nodes can be
176177
added by extending ``Dataflow::SourceNode::Range``. Some of its subclasses can similarly be
177-
extended: ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
178-
class definitions. The former provides default implementations covering CommonJS, AMD and ECMAScript
179-
2015 modules, while the latter handles ECMAScript 2015 classes as well as traditional function-based
178+
extended. For example, ``DataFlow::ModuleImportNode`` models module imports, and ``DataFlow::ClassNode`` models
179+
class definitions. The former provides default implementations covering CommonJS, AMD, and ECMAScript
180+
2015 modules, while the latter handles ECMAScript 2015 classes, as well as traditional function-based
180181
classes. You can extend their corresponding ``::Range`` classes to add support for other module or
181182
class systems.
182183

183184
Type inference
184185
~~~~~~~~~~~~~~
185186

186187
You can override ``AnalyzedNode::getAValue`` to customize the type inference. Note that the type
187-
inference is expected to be sound, that is (as far as practical) the abstract values inferred for a
188+
inference is expected to be sound, that is (as far as practical), the abstract values inferred for a
188189
data-flow nodes should cover all possible concrete values this node may take on at runtime.
189190

190-
You can also extend the set of abstract values: to add individual abstract values that are
191+
You can also extend the set of abstract values. To add individual abstract values that are
191192
independent of the program being analyzed, define a subclass of ``CustomAbstractValueTag``
192193
describing the new abstract value. There will then be a corresponding value of class
193194
``CustomAbstractValue`` that you can use in overriding definitions of the ``getAValue`` predicate.
@@ -196,7 +197,7 @@ Call graph
196197
~~~~~~~~~~
197198

198199
You can override ``DataFlow::InvokeNode::getACallee(int)`` to customize the call graph. Note that
199-
overriding the zero-argument version ``getACallee()`` is not enough since higher layers use the
200+
overriding the zero-argument version ``getACallee()`` is not enough, since higher layers use the
200201
one-argument version.
201202

202203
Type tracking
@@ -220,21 +221,25 @@ The ``semmle.javascript.frameworks.SQL`` module defines abstract classes for mod
220221
connector libraries, and the ``semmle.javascript.JsonParsers`` and
221222
``semmle.javascript.frameworks.XML`` modules for modeling JSON and XML parsers, respectively.
222223

223-
The ``semmle.javascript.Concepts`` modules defines a few very broad concepts such as system-command
224+
The ``semmle.javascript.Concepts`` module defines a small number of broad concepts such as system-command
224225
executions or file-system accesses, which are concretely instantiated in some of the existing
225226
framework libraries, but can of course be further extended to model additional frameworks.
226227

227228
Global data flow and taint tracking
228229
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229230

230-
Most security queries consist of one QL file defining the query, one configuration module defining
231-
the taint-tracking configuration, and one customization module defining sources, sinks and
232-
sanitizers. For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection
233-
query. It imports module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
234-
configuration class ``CommandInjection::Configuration``, and itself imports module
235-
``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines sources, sinks
236-
and sanitizers by means of three abstract classes ``CommandInjection::Source``,
237-
``CommandInjetion::Sink`` and ``CommandInjection::Sanitizer``, respectively.
231+
Most security queries consist of:
232+
233+
- one QL file defining the query
234+
- one configuration module defining the taint-tracking configuration
235+
- one customization module defining sources, sinks and sanitizers
236+
237+
For example, ``Security/CWE-078/CommandInjection.ql`` defines the command-injection query. It
238+
imports the module ``semmle.javascript.security.dataflow.CommandInjection``, which defines the
239+
configuration class ``CommandInjection::Configuration``. This module in turn imports
240+
``semmle.javascript.security.dataflow.CommandInjectionCustomizations``, which defines three abstract
241+
classes (``CommandInjection::Source``, ``CommandInjection::Sink``, and
242+
``CommandInjection::Sanitizer``) that model sources, sinks, and sanitizers, respectively.
238243

239244
To define additional sources, sinks or sanitizers for this or any other security query, import the
240245
customization module and extend these abstract classes with additional subclasses.

0 commit comments

Comments
 (0)