@@ -125,7 +125,7 @@ For further details about `PostUpdateNode` see [Field flow](#field-flow) below.
125125
126126Nodes corresponding to expressions and parameters are the most common for users
127127to interact with so a couple of convenience predicates are generally included:
128- ```
128+ ``` ql
129129DataFlowExpr Node::asExpr()
130130Parameter Node::asParameter()
131131ExprNode exprNode(DataFlowExpr n)
@@ -266,10 +266,23 @@ as described above.
266266## Field flow
267267
268268The library supports tracking flow through field stores and reads. In order to
269- support this, a class ` Content ` and two predicates
270- ` storeStep(Node node1, Content f, Node node2) ` and
271- ` readStep(Node node1, Content f, Node node2) ` must be defined. It generally
272- makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
269+ support this, two classes ` ContentSet ` and ` Content ` , and two predicates
270+ ` storeStep(Node node1, ContentSet f, Node node2) ` and
271+ ` readStep(Node node1, ContentSet f, Node node2) ` , must be defined. The interaction
272+ between ` ContentSet ` and ` Content ` is defined through
273+
274+ ``` ql
275+ Content ContentSet::getAStoreContent();
276+ Content ContentSet::getAReadContent();
277+ ```
278+
279+ which means that a ` storeStep(n1, cs, n2) ` will be interpreted as storing into _ any_
280+ of ` cs.getAStoreContent() ` , and dually that a ` readStep(n1, cs, n2) ` will be
281+ interpreted as reading from _ any_ of ` cs.getAReadContent() ` . In most cases, there
282+ will be a simple bijection between ` ContentSet ` and ` Content ` , but when modelling
283+ for example flow through arrays it can be more involved (see [ Example 4] ( #example-4 ) ).
284+
285+ It generally makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
273286requirement. Besides this, certain nodes must have associated
274287` PostUpdateNode ` s. The node associated with a ` PostUpdateNode ` should be
275288defined by ` PostUpdateNode::getPreUpdateNode() ` .
@@ -294,7 +307,7 @@ through a couple of examples.
294307### Example 1
295308
296309Consider the following setter and its call:
297- ```
310+ ``` java
298311setFoo(obj, x) {
299312 sink1(obj. foo);
300313 obj. foo = x;
@@ -335,12 +348,12 @@ call sites.
335348In the following two lines we would like flow from ` x ` to reach the
336349` PostUpdateNode ` of ` a ` through a sequence of two store steps, and this is
337350indeed handled automatically by the shared library.
338- ```
351+ ``` java
339352a. b. c = x;
340353a. getB(). c = x;
341354```
342355The only requirement for this to work is the existence of ` PostUpdateNode ` s.
343- For a specified read step (in ` readStep(Node n1, Content f , Node n2) ` ) the
356+ For a specified read step (in ` readStep(Node n1, ContentSet c , Node n2) ` ) the
344357shared library will generate a store step in the reverse direction between the
345358corresponding ` PostUpdateNode ` s. A similar store-through-reverse-read will be
346359generated for calls that can be summarized by the shared library as getters.
@@ -370,33 +383,132 @@ itself as this represents the value of the object after construction, that is
370383after the constructor has run. With this setup of ` ArgumentNode ` s and
371384` PostUpdateNode ` s we will achieve the desired flow from ` source ` to ` sink `
372385
386+ ### Example 4
387+
388+ Assume we want to track flow through arrays precisely:
389+
390+ ``` rb
391+ a[0 ] = tainted
392+ a[1 ] = not_tainted
393+ sink(a[0 ]) # bad
394+ sink(a[1 ]) # good
395+ sink(a[unknown]) # bad; unknown may be 0
396+
397+ b[unknown] = tainted
398+ sink(b[0 ]) # bad; unknown may be 0
399+
400+ c[unknown][0 ] = tainted
401+ c[unknown][1 ] = not_tainted
402+ sink(c[0 ][0 ]) # bad; unknown may be 0
403+ sink(c[0 ][1 ]) # good
404+ ```
405+
406+ This can be done by defining
407+
408+ ``` ql
409+ newtype TContent =
410+ TKnownArrayElementContent(int i) { i in [0 .. 10] } or
411+ TUnknownArrayElementContent()
412+
413+ class Content extends TContent {
414+ ...
415+ }
416+
417+ newtype TContentSet =
418+ TSingletonContent(Content c) or
419+ TAnyArrayElementContent()
420+
421+ class ContentSet extends TContentSet {
422+ Content getAStoreContent() {
423+ this = TSingletonContent(result)
424+ or
425+ // for reverse stores
426+ this = TAnyArrayElementContent() and
427+ result = TUnknownArrayElementContent()
428+ }
429+
430+ Content getAReadContent() {
431+ this = TSingletonContent(result)
432+ or
433+ this = TAnyArrayElementContent() and
434+ (result = TUnknownArrayElementContent() or result = TKnownArrayElementContent(_))
435+ }
436+ }
437+ ```
438+
439+ and we will have the following store/read steps
440+ ``` rb
441+ # storeStep(tainted, TSingletonContent(TKnownArrayElementContent(0)), [post update] a)
442+ a[0 ] = tainted
443+
444+ # storeStep(not_tainted, TSingletonContent(TKnownArrayElementContent(1)), [post update] a)
445+ a[1 ] = not_tainted
446+
447+ # readStep(a, TSingletonContent(TKnownArrayElementContent(0)), a[0])
448+ # readStep(a, TSingletonContent(TUnknownArrayElementContent()), a[0])
449+ sink(a[0 ]) # bad
450+
451+ # readStep(a, TSingletonContent(TKnownArrayElementContent(1)), a[1])
452+ # readStep(a, TSingletonContent(TUnknownArrayElementContent()), a[1])
453+ sink(a[1 ]) # good
454+
455+ # readStep(a, TAnyArrayElementContent(), a[unknown])
456+ sink(a[unknown]) # bad; unknown may be 0
457+
458+ # storeStep(tainted, TSingletonContent(TUnknownArrayElementContent()), [post update] b)
459+ b[unknown] = tainted
460+
461+ # readStep(b, TSingletonContent(TKnownArrayElementContent(0)), b[0])
462+ # readStep(b, TSingletonContent(TUnknownArrayElementContent()), b[0])
463+ sink(b[0 ]) # bad; unknown may be 0
464+
465+ # storeStep(tainted, TSingletonContent(TKnownArrayElementContent(0)), [post update] c[unknown])
466+ # storeStep(not_tainted, TSingletonContent(TKnownArrayElementContent(1)), [post update] c[unknown])
467+ # readStep(c, TAnyArrayElementContent(), c[unknown])
468+ # storeStep([post update] c[unknown], TAnyArrayElementContent(), [post update] c) # auto-generated reverse store (see Example 2)
469+ c[unknown][0 ] = tainted
470+ c[unknown][1 ] = not_tainted
471+
472+
473+ # readStep(c[0], TSingletonContent(TKnownArrayElementContent(0)), c[0][0])
474+ # readStep(c[0], TSingletonContent(TUnknownArrayElementContent()), c[0][0])
475+ # readStep(c[0], TSingletonContent(TKnownArrayElementContent(1)), c[0][1])
476+ # readStep(c[0], TSingletonContent(TUnknownArrayElementContent()), c[0][1])
477+ # readStep(c, TSingletonContent(TKnownArrayElementContent(0)), c[0])
478+ # readStep(c, TSingletonContent(TUnknownArrayElementContent()), c[0])
479+ sink(c[0 ][0 ]) # bad; unknown may be 0
480+ sink(c[0 ][1 ]) # good
481+ ```
482+
373483### Field flow barriers
374484
375485Consider this field flow example:
376- ```
486+ ``` java
377487obj. f = source;
378488obj. f = safeValue;
379489sink(obj. f);
380490```
381491or the similar case when field flow is used to model collection content:
382- ```
492+ ``` java
383493obj. add(source);
384494obj. clear();
385495sink(obj. get(key));
386496```
387497Clearing a field or content like this should act as a barrier, and this can be
388- achieved by marking the relevant ` Node, Content ` pair as a clear operation in
498+ achieved by marking the relevant ` Node, ContentSet ` pair as a clear operation in
389499the ` clearsContent ` predicate. A reasonable default implementation for fields
390500looks like this:
391501``` ql
392- predicate clearsContent(Node n, Content c) {
502+ predicate clearsContent(Node n, ContentSet c) {
393503 n = any(PostUpdateNode pun | storeStep(_, c, pun)).getPreUpdateNode()
394504}
395505```
396506However, this relies on the local step relation using the smallest possible
397507use-use steps. If local flow is implemented using def-use steps, then
398508` clearsContent ` might not be easy to use.
399509
510+ Note that ` clearsContent(n, cs) ` is interpreted using ` cs.getAReadContent() ` .
511+
400512## Type pruning
401513
402514The library supports pruning paths when a sequence of value-preserving steps
@@ -415,13 +527,13 @@ as a single entity (this improves performance). As an example, Java uses erased
415527types for this purpose and a single equivalence class for all numeric types.
416528
417529The type of a ` Node ` is given by the following predicate
418- ```
530+ ``` ql
419531DataFlowType getNodeType(Node n)
420532```
421533and every ` Node ` should have a type.
422534
423535One also needs to define the string representation of a ` DataFlowType ` :
424- ```
536+ ``` ql
425537string ppReprType(DataFlowType t)
426538```
427539The ` ppReprType ` predicate is used for printing a type in the labels of
@@ -499,7 +611,7 @@ predicate nodeIsHidden(Node n)
499611### Unreachable nodes
500612
501613Consider:
502- ```
614+ ``` java
503615foo(source1, false );
504616foo(source2, true );
505617
0 commit comments