Skip to content

Commit 0cc018f

Browse files
committed
Python: Taint tracking setup alá Go
\## TaintFlow sources The class `RemoteFlowSource` is very similarly defined as the other languages [C++](https://github.com/github/codeql/blob/ac22e7950c126a9407e02d4b468a4b690a9868e2/cpp/ql/src/semmle/code/cpp/security/FlowSources.qll), [Java](https://github.com/github/codeql/blob/6de612a56605eea2a2262f33e86d891f38033667/java/ql/src/semmle/code/java/dataflow/FlowSources.qll), [C#](https://github.com/github/codeql/blob/fddbce0b7bb8dc7e97971a0af7b01902954d1d49/csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll), [JS](https://github.com/github/codeql/blob/78334af3540dfd776ba4fdb561692161c515ca66/javascript/ql/src/semmle/javascript/security/dataflow/RemoteFlowSources.qll), and [Go](https://github.com/github/codeql-go/blob/24b3133e0cce6a36e5727fdb393efa2cc4379de4/ql/src/semmle/go/security/FlowSources.qll). There are some minor differences: - Java/C++ defines the class in `FlowSources.qll` - C# uses `csharp/ql/src/semmle/code/csharp/security/dataflow/flowsources/Remote.qll`, and provide `StoredFlowSource` and `LocalFlowSource` in separate classes. - JS uses `RemoteFlowSources.qll`. - JS defines additional predicate `RemoteFlowSource.isUserControlledObject` - Go uses the class name `UntrustedFlowSource`, but still defined in `ql/src/semmle/go/security/FlowSources.qll` - Go uses the `::Range` pattern to allow both extensibility and refinement The big difference is how a RemoteFlowSource is specified: - Java and C# have all subclasses of `RemoteFlowSource` defined in the same file - Go and JS defines subclasses for frameworks in the actual framework `.qll` file, and all frameworks are transitively imported by `import go` or `import javascript` (so subclasses are always in scope). - C++ uses class `RemoteFlowFunction` to do all the heavy lifting (and its subclasses are transitively imported). \### What we will do Use file `RemoteFlowSource.qll`, define subclasses in framework library classes. _Why? Personally I really like it, Go/JS is already doing it, and Tom expressed a preference for doing the same for C# (although that is not what they are doing today)._ Jonas gave this advice: > Whether you split the definitions between multiple files or keep them all in one file, the property you want is that all definitions are included when the abstract class is included. Otherwise you can get unexpected results via transitive includes. We will make imports of all frameworks in the same file that defines `RemoteFlowSource`, as it seems to be the least intrusive change. If that turns out to be a problem, we can also move them to `python.qll` (the other way is not so easy). \## TaintFlow sinks [JS](https://github.com/github/codeql/blob/473787a4268a5e90b4a653ee8bcdb3681f7867ff/javascript/ql/src/semmle/javascript/Concepts.qll) and [Go](https://github.com/github/codeql-go/blob/ecff1e6a164b18da20daa9611933df4aace20915/ql/src/semmle/go/Concepts.qll) defines abstract base classes for interesting sinks in `Concepts.qll` (and all uses the `::Range` pattern in Go). I really like this idea, since it allows multiple queries to reuse the same sink definitions, and it makes it _easy_ to discover what default sinks are available. Personally I'm not 100% on board with the naming, but I don't have any good reason to change the naming convention. \## Framework modeling Following the model from Go ([example](https://github.com/github/codeql-go/blob/main/ql/src/semmle/go/frameworks/Gin.qll)), I propose that we make every definition in a framework modeling `private`. This allows some greater flexibility in changing our modeling, since we don't need to think about keeping deprecated versions around for a whole year. It _does_ have the downside that someone writing a query can't reuse the classes/predicates for a framework, but it didn't seem to be too big of a concern. If we need to provide access, we can always make the definitions non-private (the other way is not so easy). \## Customizations Also introduced `Customizations.qll` like in JS/Java/Go (to replace `site.qll`)
1 parent b9a6183 commit 0cc018f

File tree

8 files changed

+134
-0
lines changed

8 files changed

+134
-0
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/**
2+
* Contains customizations to the standard library.
3+
*
4+
* This module is imported by `python.qll`, so any customizations defined here automatically
5+
* apply to all queries.
6+
*
7+
* Typical examples of customizations include adding new subclasses of abstract classes such as
8+
* the `RemoteFlowSource::Range` and `AdditionalTaintStep` classes associated with the security
9+
* queries to model frameworks that are not covered by the standard library.
10+
*/
11+
12+
import python
13+
/* General import that is useful */
14+
// import experimental.dataflow.DataFlow
15+
//
16+
/* for extending `TaintTracking::AdditionalTaintStep` */
17+
// import experimental.dataflow.TaintTracking
18+
//
19+
/* for extending `RemoteFlowSource::Range` */
20+
// import experimental.dataflow.RemoteFlowSources
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
private import python
2+
private import experimental.dataflow.DataFlow
3+
// Need to import since frameworks can extend `RemoteFlowSource::Range`
4+
private import experimental.semmle.python.Frameworks
5+
6+
/**
7+
* A data flow source of remote user input.
8+
*
9+
* Extend this class to refine existing API models. If you want to model new APIs,
10+
* extend `RemoteFlowSource::Range` instead.
11+
*/
12+
class RemoteFlowSource extends DataFlow::Node {
13+
RemoteFlowSource::Range self;
14+
15+
RemoteFlowSource() { this = self }
16+
17+
/** Gets a string that describes the type of this remote flow source. */
18+
string getSourceType() { result = self.getSourceType() }
19+
}
20+
21+
/** Provides a class for modeling new sources of remote user input. */
22+
module RemoteFlowSource {
23+
/**
24+
* A data flow source of remote user input.
25+
*
26+
* Extend this class to model new APIs. If you want to refine existing API models,
27+
* extend `RemoteFlowSource` instead.
28+
*/
29+
abstract class Range extends DataFlow::Node {
30+
/** Gets a string that describes the type of this remote flow source. */
31+
abstract string getSourceType();
32+
}
33+
}

python/ql/src/experimental/dataflow/internal/TaintTrackingPublic.qll

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
private import python
77
private import TaintTrackingPrivate
88
private import experimental.dataflow.DataFlow
9+
// Need to import since frameworks can extend `AdditionalTaintStep`
10+
private import experimental.semmle.python.Frameworks
911

1012
// Local taint flow and helpers
1113
/**
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
/**
2+
* Provides abstract classes representing generic concepts such as file system
3+
* access or system command execution, for which individual framework libraries
4+
* provide concrete subclasses.
5+
*/
6+
7+
import python
8+
private import experimental.dataflow.DataFlow
9+
private import experimental.semmle.python.Frameworks
10+
11+
/**
12+
* A data-flow node that executes an operating system command,
13+
* for instance by spawning a new process.
14+
*
15+
* Extend this class to refine existing API models. If you want to model new APIs,
16+
* extend `SystemCommandExecution::Range` instead.
17+
*/
18+
class SystemCommandExecution extends DataFlow::Node {
19+
SystemCommandExecution::Range self;
20+
21+
SystemCommandExecution() { this = self }
22+
23+
/** Gets the argument that specifies the command to be executed. */
24+
DataFlow::Node getCommand() { result = self.getCommand() }
25+
}
26+
27+
/** Provides a class for modeling new system-command execution APIs. */
28+
module SystemCommandExecution {
29+
/**
30+
* A data-flow node that executes an operating system command,
31+
* for instance by spawning a new process.
32+
*
33+
* Extend this class to model new APIs. If you want to refine existing API models,
34+
* extend `SystemCommandExecution` instead.
35+
*/
36+
abstract class Range extends DataFlow::Node {
37+
/** Gets the argument that specifies the command to be executed. */
38+
abstract DataFlow::Node getCommand();
39+
}
40+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
/**
2+
* Helper file that imports all framework modeling.
3+
*/
4+
private import experimental.semmle.python.frameworks.Flask
5+
private import experimental.semmle.python.frameworks.Django
6+
private import experimental.semmle.python.frameworks.Stdlib
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/**
2+
* Provides classes modeling security-relevant aspects of the `django` package.
3+
*/
4+
5+
private import python
6+
private import experimental.dataflow.DataFlow
7+
private import experimental.dataflow.RemoteFlowSources
8+
private import experimental.semmle.python.Concepts
9+
10+
private module Django {
11+
12+
}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/**
2+
* Provides classes modeling security-relevant aspects of the `flask` package.
3+
*/
4+
5+
private import python
6+
private import experimental.dataflow.DataFlow
7+
private import experimental.dataflow.RemoteFlowSources
8+
private import experimental.semmle.python.Concepts
9+
10+
private module Flask {
11+
12+
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/**
2+
* Provides classes modeling security-relevant aspects of the standard libraries.
3+
* Note: some modeling is done internally in the dataflow/taint tracking implementation.
4+
*/
5+
6+
private import python
7+
private import experimental.dataflow.DataFlow
8+
private import experimental.dataflow.RemoteFlowSources
9+
private import experimental.semmle.python.Concepts

0 commit comments

Comments
 (0)