Skip to content

Commit 24f407c

Browse files
author
Felicity Chapman
authored
Merge pull request #1689 from markshannon/python-modernize-learn-ql
Python docs: Modernize the learn-ql pages to use the Value API.
2 parents badfc23 + 5e0b263 commit 24f407c

File tree

2 files changed

+72
-65
lines changed

2 files changed

+72
-65
lines changed

docs/language/learn-ql/python/functions.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,19 +58,21 @@ We can modify the query further to include only methods whose body consists of a
5858
Finding a call to a specific function
5959
-------------------------------------
6060

61-
This query uses ``Call`` and ``Name`` to find calls to the function ``input`` - which might potentially be a security hazard (in Python 2).
61+
This query uses ``Call`` and ``Name`` to find calls to the function ``eval`` - which might potentially be a security hazard.
6262

6363
.. code-block:: ql
6464
6565
import python
6666
6767
from Call call, Name name
68-
where call.getFunc() = name and name.getId() = "input"
69-
select call, "call to 'input'."
68+
where call.getFunc() = name and name.getId() = "eval"
69+
select call, "call to 'eval'."
7070
71-
➤ `See this in the query console <https://lgtm.com/query/686330029/>`__. Some of the demo projects on LGTM.com use this function.
71+
➤ `See this in the query console <https://lgtm.com/query/6718356557331218618/>`__. Some of the demo projects on LGTM.com use this function.
7272

73-
The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate gets the expression being called. ``Name.getId()`` gets the identifier (as a string) of the ``Name`` expression. Due to the dynamic nature of Python, this query will select any call of the form ``input(...)`` regardless of whether it is a call to the built-in function ``input`` or not. In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``input`` regardless of name of the variable called.
73+
The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate gets the expression being called. ``Name.getId()`` gets the identifier (as a string) of the ``Name`` expression.
74+
Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not.
75+
In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called.
7476

7577
What next?
7678
----------

docs/language/learn-ql/python/pointsto-type-infer.rst

Lines changed: 65 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@ Tutorial: Points-to analysis and type inference
33

44
This topic contains worked examples of how to write queries using the standard QL library classes for Python type inference.
55

6-
The ``Object`` class
6+
The ``Value`` class
77
--------------------
88

9-
The ``Object`` class and its subclasses ``FunctionObject``, ``ClassObject`` and ``ModuleObject`` represent the values an expression may hold at runtime.
9+
The ``Value`` class and its subclasses ``FunctionValue``, ``ClassValue`` and ``ModuleValue`` represent the values an expression may hold at runtime.
1010

1111
Summary
1212
~~~~~~~
1313

14-
Class hierarchy for ``Object``:
14+
Class hierarchy for ``Value``:
1515

16-
- `Object <https://help.semmle.com/qldoc/python/semmle/python/types/Object.qll/type.Object$Object.html>`__
16+
- `Value <https://help.semmle.com/qldoc/python/semmle/python/objects/ObjectAPI.qll/type.ObjectAPI$Value.html>`__
1717

18-
- ``ClassObject``
19-
- ``FunctionObject``
20-
- ``ModuleObject``
18+
- ``ClassValue``
19+
- ``FunctionValue``
20+
- ``ModuleValue``
2121

2222
Points-to analysis and type inference
2323
-------------------------------------
@@ -26,31 +26,32 @@ Points-to analysis, sometimes known as `pointer analysis <http://en.wikipedia.or
2626

2727
`Type inference <http://en.wikipedia.org/wiki/Type_inference>`__ allows us to infer what the types (classes) of an expression may be at runtime.
2828

29-
The predicate ``ControlFlowNode.refersTo(...)`` shows which object a control flow node may "refer to" at runtime.
29+
The predicate ``ControlFlowNode.pointsTo(...)`` shows which object a control flow node may "point to" at runtime.
3030

31-
``ControlFlowNode.refersTo(...)`` has three variants:
31+
``ControlFlowNode.pointsTo(...)`` has three variants:
3232

3333
.. code-block:: ql
3434
35-
predicate refersTo(Object object)
36-
predicate refersTo(Object object, ControlFlowNode origin)
37-
predicate refersTo(Object object, ClassObject cls, ControlFlowNode origin)
35+
predicate pointsTo(Value object)
36+
predicate pointsTo(Value object, ControlFlowNode origin)
37+
predicate pointsTo(Context context, Value object, ControlFlowNode origin)
3838
39-
``object`` is an object that the control flow node refers to, ``origin`` is where the object comes from, which is useful for displaying meaningful results, and ``cls`` is the inferred class of the ``object``.
39+
``object`` is an object that the control flow node refers to, and ``origin`` is where the object comes from, which is useful for displaying meaningful results.
40+
The third form includes the ``context`` in which the control flow node refers to the ``object``. This form can usually be ignored.
4041

4142
.. pull-quote::
4243

4344
Note
4445

45-
``ControlFlowNode.refersTo()`` cannot find all objects that a control flow node might point to as it impossible to be accurate and find all possible values. We prefer precision (no incorrect values) over recall (finding as many values as possible). We do this because queries based on points-to analysis have fewer false positives and are thus more useful.
46+
``ControlFlowNode.pointsTo()`` cannot find all objects that a control flow node might point to as it is impossible to be accurate *and* to find all possible values. We prefer precision (no incorrect values) over recall (finding as many values as possible). We do this so that queries based on points-to analysis have fewer false positive results and are thus more useful.
4647

47-
For complex data flow analyses, involving multiple stages, the ``ControlFlowNode`` version is more precise, but for simple use cases the ``Expr`` based version is easier to use. For convenience, the ``Expr`` class also has the same three predicates. ``Expr.refersTo(...)`` also has three variants:
48+
For complex data flow analyses, involving multiple stages, the ``ControlFlowNode`` version is more precise, but for simple use cases the ``Expr`` based version is easier to use. For convenience, the ``Expr`` class also has the same three predicates. ``Expr.pointsTo(...)`` also has three variants:
4849

4950
.. code-block:: ql
5051
51-
predicate refersTo(Object object)
52-
predicate refersTo(Object object, AstNode origin)
53-
predicate refersTo(Object object, ClassObject cls, AstNode origin)
52+
predicate pointsTo(Value object)
53+
predicate pointsTo(Value object, AstNode origin)
54+
predicate pointsTo(Context context, Value object, AstNode origin)
5455
5556
Using points-to analysis
5657
------------------------
@@ -84,19 +85,20 @@ The results of this query need to be filtered to return only results where ``ex1
8485

8586
.. code-block:: ql
8687
87-
exists(ClassObject cls1, ClassObject cls2 |
88-
ex1.getType().refersTo(cls1) and
89-
ex2.getType().refersTo(cls2) |
88+
exists(ClassValue cls1, ClassValue cls2 |
89+
ex1.getType().pointsTo(cls1) and
90+
ex2.getType().pointsTo(cls2) |
91+
not cls1 = cls2 and
9092
cls1 = cls2.getASuperType()
9193
)
9294
9395
The line:
9496

9597
::
9698

97-
ex1.getType().refersTo(cls1)
99+
ex1.getType().pointsTo(cls1)
98100

99-
ensures that ``cls1`` is a ``ClassObject`` that the ``except`` block would handle.
101+
ensures that ``cls1`` is a ``ClassValue`` that the ``except`` block would handle.
100102

101103
Combining the parts of the query we get this:
102104

@@ -112,9 +114,10 @@ Combining the parts of the query we get this:
112114
ex1 = t.getHandler(i) and ex2 = t.getHandler(j) and i < j
113115
)
114116
and
115-
exists(ClassObject cls1, ClassObject cls2 |
116-
ex1.getType().refersTo(cls1) and
117-
ex2.getType().refersTo(cls2) |
117+
exists(ClassValue cls1, ClassValue cls2 |
118+
ex1.getType().pointsTo(cls1) and
119+
ex2.getType().pointsTo(cls2) |
120+
not cls1 = cls2 and
118121
cls1 = cls2.getASuperType()
119122
)
120123
select t, ex1, ex2
@@ -136,48 +139,50 @@ First of all find what object is used in the ``for`` loop:
136139

137140
.. code-block:: ql
138141
139-
from For loop, Object iter
140-
where loop.getIter().refersTo(iter)
142+
from For loop, Value iter
143+
where loop.getIter().pointsTo(iter)
141144
select loop, iter
142145
143-
Then we need to determine if a ``ClassObject`` is iterable. ``ClassObject`` provides the predicate ``isIterable()`` which we can combine with the longer form of ``ControlFlowNode.refersTo()`` to get the class of the loop iterator, giving us this:
146+
Then we need to determine if the object ``iter`` is iterable. We can test ``ClassValue`` to see if it has the ``__iter__`` attribute.
144147

145148
**Find non-iterable object used as a loop iterator**
146149

147150
.. code-block:: ql
148151
149-
import python
150-
151-
from For loop, Object iter, ClassObject cls
152-
where loop.getIter().refersTo(iter, cls, _)
153-
and not cls.isIterable()
154-
select loop, cls
152+
import python
155153
156-
➤ `See this in the query console <https://lgtm.com/query/670720182/>`__. Many projects use a non-iterable as a loop iterator.
154+
from For loop, Value iter, ClassValue cls
155+
where loop.getIter().getAFlowNode().pointsTo(iter) and
156+
cls = iter.getClass() and
157+
not exists(cls.lookup("__iter__"))
158+
select loop, cls
159+
160+
➤ `See this in the query console <https://lgtm.com/query/5636475906111506420/>`__. Many projects use a non-iterable as a loop iterator.
157161

158-
Many of the results shown will have ``cls`` as ``NoneType``. It is more informative to show where these ``None`` values may come from. To do this we use the final field of ``refersTo``, as follows:
162+
Many of the results shown will have ``cls`` as ``NoneType``. It is more informative to show where these ``None`` values may come from. To do this we use the final field of ``pointsTo``, as follows:
159163

160164
**Find non-iterable object used as a loop iterator 2**
161165

162166
.. code-block:: ql
163167
164168
import python
165169
166-
from For loop, Object iter, ClassObject cls, AstNode origin
167-
where loop.getIter().refersTo(iter, cls, origin)
168-
and not cls.isIterable()
170+
from For loop, Value iter, ClassValue cls, AstNode origin
171+
where loop.getIter().pointsTo(iter, origin) and
172+
cls = iter.getClass() and
173+
not cls.hasAttribute("__iter__")
169174
select loop, cls, origin
170175
171-
➤ `See this in the query console <https://lgtm.com/query/672230046/>`__. This reports the same results, but with a third column showing the source of the ``None`` values.
176+
➤ `See this in the query console <https://lgtm.com/query/6718356557331218618/>`__. This reports the same results, but with a third column showing the source of the ``None`` values.
172177

173-
Finding calls to functions using call-graph analysis
178+
Finding calls using call-graph analysis
174179
----------------------------------------------------
175180

176-
The ``FunctionObject`` class is a subclass of ``Object`` and corresponds to function objects in Python, in much the same way as the ``ClassObject`` class corresponds to class objects in Python.
181+
The ``Value`` class has a method ``getACall()`` which allows us to find calls to a particular function (including builtin functions).
177182

178-
The ``FunctionObject`` class has a method ``getACall()`` which allows us to find calls to a particular function (including builtin functions).
183+
If we wish to restrict the callables to actual functions we can use the ``FunctionValue`` class, which is a subclass of ``Value`` and corresponds to function objects in Python, in much the same way as the ``ClassValue`` class corresponds to class objects in Python.
179184

180-
Returning to an example from :doc:`Tutorial: Functions <functions>`, we wish to find calls to the ``input`` function.
185+
Returning to an example from :doc:`Tutorial: Functions <functions>`, we wish to find calls to the ``eval`` function.
181186

182187
The original query looked this:
183188

@@ -186,38 +191,38 @@ The original query looked this:
186191
import python
187192
188193
from Call call, Name name
189-
where call.getFunc() = name and name.getId() = "input"
190-
select call, "call to 'input'."
194+
where call.getFunc() = name and name.getId() = "eval"
195+
select call, "call to 'eval'."
191196
192-
➤ `See this in the query console <https://lgtm.com/query/690010037/>`__. Two of the demo projects on LGTM.com have calls that match this pattern.
197+
➤ `See this in the query console <https://lgtm.com/query/6718356557331218618/>`__. Some of the demo projects on LGTM.com have calls that match this pattern.
193198

194199
There are two problems with this query:
195200

196-
- It assumes that any call to something named "input" is a call to the builtin ``input`` function, which may result in some false positive results.
197-
- It assumes that ``input`` cannot be referred to by any other name, which may result in some false negative results.
201+
- It assumes that any call to something named "eval" is a call to the builtin ``eval`` function, which may result in some false positive results.
202+
- It assumes that ``eval`` cannot be referred to by any other name, which may result in some false negative results.
198203

199-
We can get much more accurate results using call-graph analysis. First, we can precisely identify the ``FunctionObject`` for the ``input`` function, by using the ``builtin_object`` QL predicate as follows:
204+
We can get much more accurate results using call-graph analysis. First, we can precisely identify the ``FunctionValue`` for the ``eval`` function, by using the ``Value::named`` QL predicate as follows:
200205

201206
.. code-block:: ql
202207
203208
import python
204209
205-
from FunctionObject input
206-
where input = builtin_object("input")
207-
select input
210+
from Value eval
211+
where eval = Value::named("eval")
212+
select eval
208213
209-
Then we can use ``FunctionObject.getACall()`` to identify calls to the ``input`` function, as follows:
214+
Then we can use ``Value.getACall()`` to identify calls to the ``eval`` function, as follows:
210215

211216
.. code-block:: ql
212217
213218
import python
214219
215-
from ControlFlowNode call, FunctionObject input
216-
where input = builtin_object("input") and
217-
call = input.getACall()
218-
select call, "call to 'input'."
220+
from ControlFlowNode call, Value eval
221+
where eval = Value::named("eval") and
222+
call = eval.getACall()
223+
select call, "call to 'eval'."
219224
220-
➤ `See this in the query console <https://lgtm.com/query/670490037/>`__. This accurately identifies calls to the builtin ``input`` function even when they are referred to using an alternative name. Any false positive results with calls to other ``input`` functions, reported by the original query, have been eliminated. It finds one result in files referenced by the *saltstack/salt* project.
225+
➤ `See this in the query console <https://lgtm.com/query/535131812579637425/>`__. This accurately identifies calls to the builtin ``eval`` function even when they are referred to using an alternative name. Any false positive results with calls to other ``eval`` functions, reported by the original query, have been eliminated.
221226

222227
What next?
223228
----------

0 commit comments

Comments
 (0)