Skip to content

Commit 1dae3a5

Browse files
Atos1337Denis FokinVictor Samoilov
authored
ML path selector (#99)
* Buildable * Fix style * Fix some issues * Add jvmArgs for framework tests * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Try to fix CI * Revert non-path-selector changes * Revert non-path-selector changes * Try to fix CI * Try to fix CI * Revert "Try to fix CI" This reverts commit 7a98a7c. * Revert "Try to fix CI" This reverts commit ff5c242. * Make parent optional * Add docs * Add versions in properties * Fix issues * Revert engine style * Fix spelling * Add predictor * Fix issues * Add comments * Handle exceptional situations * Split methods and add comment * Add parsing tests * Fix issues * Fix issues * Change predictor in ContestEstimator Co-authored-by: Denis Fokin <Denis.Fokin@huawei.com> Co-authored-by: Victor Samoilov <samoilov.victor@huawei.com>
1 parent 39add16 commit 1dae3a5

File tree

51 files changed

+2071
-73
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2071
-73
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Changes in Execution State
2+
3+
```mermaid
4+
classDiagram
5+
class StateAnalyticsProperties{
6+
+int depth
7+
+int visitedAfterLastFork
8+
+int visitedBeforeLastFork
9+
+int stmtsSinceLastCovered
10+
+ExecutionState? parent
11+
+long executingTime
12+
+double reward
13+
+List~Double~ features
14+
-boolean isFork
15+
-boolean isVisitedNew
16+
-int successorDepth
17+
-int successorVisitedAfterLastFork
18+
-int successorVisitedBeforeLastFork
19+
-int successorStmtSinceLastCovered
20+
21+
+updateIsVisitedNew()
22+
+updateIsFork()
23+
}
24+
ExecutionState o-- StateAnalyticsProperties
25+
```
26+
27+
`StateAnalyticsProperties` maintains properties of `ExecutionState`, which don't need for symbolic execution, but need for `JLearch`.
28+
29+
* `depth: Int` - number of forks on the state's path excluded current state, if it is fork. In this case, fork is a state with more than one successor excluded implicit `NPE` branches.
30+
* `visitedAfterLastFork: Int` - number of `stmt`, that was visited by `states` on this state's path after the last fork in first time.
31+
* `visitedBeforeLastFork: Int` - number of `stmt`, that was visited by `states` on this state's path before the last fork in first time.
32+
* `stmtsSinceLastCovered: Int` - number of `states` on this state's path after the last state that visited any `stmt` in first time.
33+
* `parent: ExecutionState?` - parent of current `state`. If `UtSettings.featureProcess == false`, then it is always null, because we don't need this field in this case. If it is not null, then we can't delete `state` until all successors of this state will be deleted, which may cause memory issue.
34+
* `executingTime: Long` - amount of time, during which this state was traversed.
35+
* `reward: Double?` - calculated reward of this state
36+
* `features: List<Double>` - list of extracted features for this state
37+
38+
Field with `successor` prefix is used for a constructor of successor properties.
39+
40+
* `updateIsFork()` - set `isFork` on true. This method is called when traversing of `stmt` produces more than one explicit state. Now it may be during the traversing of `IfStmt`, `SwitchStmt`, `AssignStmt` or `InvokeStmt`.
41+
* `updateIsCoveredNew()` - set `isVisitedNew` on true, set `stmtsSinceLastCovered` on zero and increase `visitedAfterLastFork` on 1. This method is called in `UtBotSymbolicEngine` after new state `s` is polled and `s.stmt` was not visited yet.

docs/jlearch/features.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Collecting features
2+
3+
Now we collect 13 features, that will be described in original [paper](https://files.sri.inf.ethz.ch/website/papers/ccs21-learch.pdf), except constraint representation, but it can be extended.
4+
5+
* `stack` - size of state’s current call stack.
6+
* `successor` - number of successors of state’s current basic block.
7+
* `testCase` - number of test cases generated so far
8+
* `coverageByBranch` - number of instructions, which was covered first time on our last branch
9+
* `coverageByPath` - number of instructions, which was covered first time on our path
10+
* `depth` - number of forks already performed along state’s path.
11+
* `cpicnt` - number of instructions visited in state's current function.
12+
* `icnt` - number of times for which st ate’s current instruction has
13+
been visited
14+
* `covNew` - number of instructions executed by st ate since the last
15+
time a new instruction is covered
16+
* `subpath` - number of times for which st ate’s subpaths have been
17+
visited. The length of the subpaths can be 1, 2, 4, or 8 respectively
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# JLearch architecture
2+
3+
# Global Class Diagram
4+
5+
```mermaid
6+
classDiagram
7+
class FeatureProcessor{
8+
dumpFeatures()
9+
}
10+
<<interface>> FeatureProcessor
11+
class TraverseGraphStatistics{
12+
onVisit(ExecutionState)
13+
onTraversed(ExecutionState)
14+
}
15+
class InterproceduralUnitGraph
16+
class FeatureExtractorFactory
17+
<<interface>> FeatureExtractorFactory
18+
class FeatureProcessorFactory
19+
<<interface>> FeatureProcessorFactory
20+
class EngineAnalyticsContext
21+
class UtBotSymbolicEngine
22+
class NNRewardGuidedSelectorFactory
23+
<<interface>> NNRewardGuidedSelectorFactory
24+
class FeatureExtractor{
25+
extractFeatures(ExecutionState)
26+
}
27+
<<interface>> FeatureExtractor
28+
29+
UtBotSymbolicEngine ..> EngineAnalyticsContext
30+
EngineAnalyticsContext o-- FeatureProcessorFactory
31+
EngineAnalyticsContext o-- FeatureExtractorFactory
32+
EngineAnalyticsContext o-- NNRewardGuidedSelectorFactory
33+
34+
FeatureProcessor --|> TraverseGraphStatistics
35+
InterproceduralUnitGraph o-- TraverseGraphStatistics
36+
UtBotSymbolicEngine *-- FeatureProcessor
37+
UtBotSymbolicEngine *-- InterproceduralUnitGraph
38+
39+
class Predictors
40+
class NNStateRewardPredictor
41+
class NNRewardGuidedSelector
42+
43+
44+
class GreedySearch
45+
46+
class BasePathSelector
47+
48+
GreedySearch --|> BasePathSelector
49+
NNRewardGuidedSelector --|> GreedySearch
50+
51+
UtBotSymbolicEngine *-- BasePathSelector
52+
53+
Predictors o-- NNStateRewardPredictor
54+
NNRewardGuidedSelector ..> Predictors
55+
NNRewardGuidedSelector *-- FeatureExtractor
56+
57+
NNStateRewardPredictorSmile --|> NNStateRewardPredictor
58+
NNStateRewardPredictorTorch --|> NNStateRewardPredictor
59+
60+
NNStateRewardGuidedSelectorWithRecalculationWeight --|> NNRewardGuidedSelector
61+
NNStateRewardGuidedSelectorWithoutRecalculationWeight --|> NNRewardGuidedSelector
62+
```
63+
64+
This diagram doesn't illustrate some details, so read them below.
65+
66+
# FeatureProcessor
67+
68+
It is interface in framework-module, that allows to use implementation from analytics module.
69+
70+
* `dumpFeatures(state: ExecutionState)` - dump features and rewards in some format on disk. Called at the end of traverse in `UtBotSymbolicEngine`
71+
72+
## Implementation class diagram
73+
74+
```mermaid
75+
classDiagram
76+
class FeatureProcessorWithStatesRepetition{
77+
-Map~Int, FeatureList~ dumpedStates
78+
-Set~Stmt~ visitedStmts
79+
-List~TestCase~ testCases
80+
-int generatedTestCases
81+
dumpFeatures()
82+
}
83+
84+
class FeatureExtractor{
85+
extractFeatures(ExecutionState)
86+
}
87+
88+
class TraverseGraphStatistics{
89+
onVisit(ExecutionState)
90+
onTraversed(ExecutionState)
91+
}
92+
93+
class RewardEstimator{
94+
calculateRewards(List~TestCase~)
95+
}
96+
97+
class TestCase{
98+
+List<State> states
99+
+int newCoverage
100+
+int testIndex
101+
}
102+
103+
FeatureProcessorWithStatesRepetition --|> TraverseGraphStatistics
104+
FeatureProcessorWithStatesRepetition o-- FeatureExtractor
105+
FeatureProcessorWithStatesRepetition o-- RewardEstimator
106+
FeatureProcessorWithStatesRepetition ..> EngineAnalyticsContext
107+
108+
```
109+
110+
`State = Pair<Int, Long>`
111+
112+
`FeatureList = List<Double`
113+
114+
## RewardEstimator
115+
116+
Maintains calculation of reward.
117+
118+
* `calculateRewards(List<TestCase>): Map<Int, Double>` - calculates `coverage` for each state and `time` for each state. `Coverage` - sum of `newCoverage` by `TestCase` that contains its state. `Time` - sum of `state.executingTime` by all states, that has this state on its path. Then calculates `reward(coverage, time)`.
119+
120+
## FeatureProcessorWithStatesRepetition
121+
122+
* `onVisit(state: ExecutionState)` - extractFeatures for state
123+
* `onTraversed(state: ExecutionState)` - create `TestCase`, so we go from `state` to `state.parent` while it is not root, for each `state` on path add its features to `dumpedStates`, calculate coverage of its `TestCase`, increment `generatedTestCases` on 1 and add new `TestCase` in `testCases`.
124+
* `dumpFeatures()` - call `RewardEstimator.calculateRewards()` and write `csv` file for each `TestCase` in format: `newCov,features,reward` for each `state` in it. `newCov` - flag that indicates whether this `TestCase` cover something new or not. So in this approach, each `state` will be written as many times as the number of `TestCase` that has it.
125+
For creating `FeatureExtractor`, it uses `FeatureExtractorFactory` from `EngineAnalyticsContext`.
126+
127+
# FeatureExtractor
128+
129+
It is interface in framework-module, that allows to use implementation from analytics module.
130+
* `extractFeatures(state: ExecutionState)` - create features list for state and store it in `state.features`. Now we extract all features, which were described in [paper](https://files.sri.inf.ethz.ch/website/papers/ccs21-learch.pdf). In feature, we can extend the feature list by other features, for example, NeuroSMT.
131+
132+
# NNStateRewardPredictor
133+
134+
Interface for reward predictors. Now it has two implementations in `analytics` module:
135+
136+
* `NNStateRewardPredictorSmile`: it uses our own format to store feedforward neural network, and it uses `Smile` library to do multiplication of matrix.
137+
* `NNStateRewardPredictorTorch`: it assumed that a model is any type of model in `pt` format. It uses the `Deep Java library` to use such models.
138+
139+
It should be created at the beginning of work and stored at `Predictors` class to be used in `NNRewardGuidedSelector` from the `framework` module.
140+
141+
142+
# NNStateRewardGuidedSelector
143+
144+
It uses an `EngineAnalyticsContext` to create `FeatureExtractor`.
145+
We override `ExecutionState.weight` as `NNStateRewardPredictor.predict(this.features)`.
146+
We have two different implementantions:
147+
* `NNStateRewardGuidedSelectorWithRecalculation`: we recalculate reward every time, so in `ExecutionState.weight` we extract features and call predict.
148+
* `NNStateRewardGuidedSlectorWithoutRecalculation`: we extract features in `offerImpl`, calculate `reward` and store it in `ExecutionState.reward` without recalculation it every time.
149+
150+
# EngineAnalyticsContext
151+
152+
It is an object that should be filled by factories in the beginning of work to allow objects from the `framework` module using objects from `analytics` module.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# GreedySearch
2+
3+
```mermaid
4+
classDiagram
5+
class GreedySearch{
6+
-Set~ExecutionState~ states
7+
+ExecutionState.weight
8+
}
9+
GreedySearch --|> BasePathSelector
10+
```
11+
Base methods such as `offer` or `remove` is implemented pretty simple and just a delegation to `states`.
12+
13+
In `peekImpl` we find the set of `states` with maximum `weight` and peek random among them, so to use this class in implementation of some `pathSelector`, you just need to override an `ExecutionState.weight`.
14+
15+
# SubpathStatistics
16+
17+
```mermaid
18+
classDiagram
19+
class SubpathStatistics{
20+
+int index
21+
-Map~Subpath, Int~ subpathCount
22+
subpathCount(ExecutionState)
23+
}
24+
class TraverseGraphStatistics{
25+
onVisit(ExecutionState)
26+
}
27+
28+
SubpathStatistics --|> TraverseGraphStatistics
29+
TraverseGraphStatistics o-- InterProceduralUnitGraph
30+
```
31+
`Subpath` = `List<Edge>`
32+
33+
This class maintains frequency of each subpath with length `2^index`, which is presented as `List<Edge>`, in a certain instance of `InterproceduralUnitGraph`
34+
35+
* `onVisit(state: ExecutionState)` - we calculate subpath of this state and increment its frequency on `1`
36+
* `subpathCount(state: ExecutionState)` - we calculate subpath of this state and return its frequency
37+
38+
# SubpathGuidedSelector
39+
40+
```mermaid
41+
classDiagram
42+
SubpathGuidedSelector o-- SubpathStatistics
43+
SubpathGuidedSelector --|> GreedySearch
44+
```
45+
46+
Inspired by [paper](http://pxzhang.cn/paper/concolic_testing/oopsla13-pgse.pdf).
47+
48+
We override `ExecutionState.weight` as `-StatementStatistics.subpathCount(this)`, so we pick the `state`, which `subpath` is less traveled.
49+
50+
# StatementStatistics
51+
52+
```mermaid
53+
classDiagram
54+
class StatementStatistics{
55+
-Map~Stmt, Int~ statementsCount
56+
-Map~SootMethod, Int~ statementsInMethodCount
57+
+statementCount(ExecutionState)
58+
+statementsInMethodCount(ExecutionState)
59+
}
60+
61+
class TraverseGraphStatistics{
62+
onVisit(ExecutionState)
63+
}
64+
65+
StatementStatistics --|> TraverseGraphStatistics
66+
TraverseGraphStatistics o-- InterProceduralUnitGraph
67+
```
68+
69+
This class maintains frequency of each `Stmt` and number of `Stmt`, that was visited in some `SootMethod`, on a certain instance of `InterproceduralUnitGraph`.
70+
71+
* `onVisit(state: ExecutionState)` - increment frequency of state's `stmt` on 1. If we visit this `stmt` for the first time, then increment number of `Stmt`, that we visit in the current state's `method`, on 1.
72+
* `statementCount(state: ExecutionState)` - get a frequency of state's `stmt`
73+
* `statementsInMethodCount(state: ExecutionState)` - get number of `stmt`, that was visited in the current state's `method`.
74+
75+
# CPInstSelector
76+
77+
```mermaid
78+
classDiagram
79+
CPInstSelector o-- StatementStatistics
80+
CPInstSelector --|> NonUniformRandomSearch
81+
```
82+
83+
Override `ExecutionState.cost` as `StatementStatistics.statementInMethodCount(this)`, so we are more likely to explore the least explored `method`.
84+
85+
# ForkDepthSelector
86+
87+
```mermaid
88+
classDiagram
89+
ForkDepthSelector --|> NonUniformRandomSearch
90+
```
91+
92+
Override `ExecutionState.cost` as `ExecutionState.depth`, so we are more likely to explore the least deep `state` in terms of the number of forks on its path.
93+

gradle.properties

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,8 @@ eclipse_aether_version=1.1.0
3535
maven_wagon_version=3.5.1
3636
maven_plugin_api_version=3.8.5
3737
maven_plugin_tools_version=3.6.4
38+
javacpp_version=1.5.3
39+
jsoup_version=1.7.2
40+
djl_api_version=0.17.0
41+
pytorch_native_version=1.9.1
3842
# soot also depends on asm, so there could be two different versions

utbot-analytics/build.gradle

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ String classifier = osName + "-x86_64"
1212

1313
evaluationDependsOn(':utbot-framework')
1414
compileTestJava.dependsOn tasks.getByPath(':utbot-framework:testClasses')
15+
1516
dependencies {
1617
implementation(project(":utbot-framework"))
1718
implementation(project(':utbot-instrumentation'))
@@ -29,6 +30,7 @@ dependencies {
2930

3031
implementation group: 'org.bytedeco', name: 'arpack-ng', version: "3.7.0-1.5.4", classifier: "$classifier"
3132
implementation group: 'org.bytedeco', name: 'openblas', version: "0.3.10-1.5.4", classifier: "$classifier"
33+
implementation group: 'org.bytedeco', name: 'javacpp', version: javacpp_version, classifier: "$classifier"
3234

3335
implementation group: 'tech.tablesaw', name: 'tablesaw-core', version: '0.38.2'
3436
implementation group: 'tech.tablesaw', name: 'tablesaw-jsplot', version: '0.38.2'
@@ -37,13 +39,19 @@ dependencies {
3739

3840
implementation group: 'com.github.javaparser', name: 'javaparser-core', version: '3.22.1'
3941

42+
implementation group: 'org.jsoup', name: 'jsoup', version: jsoup_version
43+
44+
implementation "ai.djl:api:$djl_api_version"
45+
implementation "ai.djl.pytorch:pytorch-engine:$djl_api_version"
46+
implementation "ai.djl.pytorch:pytorch-native-auto:$pytorch_native_version"
47+
4048
testCompile project(':utbot-framework').sourceSets.test.output
4149
}
4250

4351
test {
4452

45-
useJUnitPlatform{
46-
excludeTags 'Summary'
53+
useJUnitPlatform {
54+
excludeTags 'Summary'
4755
}
4856

4957
}
@@ -54,4 +62,17 @@ processResources {
5462
into "models"
5563
}
5664
}
65+
}
66+
67+
jar {
68+
dependsOn classes
69+
manifest {
70+
attributes 'Main-Class': 'org.utbot.QualityAnalysisKt'
71+
}
72+
73+
from {
74+
configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
75+
}
76+
77+
duplicatesStrategy = DuplicatesStrategy.EXCLUDE
5778
}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
package org.utbot.features
2+
3+
import org.utbot.analytics.FeatureExtractor
4+
import org.utbot.analytics.FeatureExtractorFactory
5+
import org.utbot.engine.InterProceduralUnitGraph
6+
7+
/**
8+
* Implementation of feature extractor factory
9+
*/
10+
class FeatureExtractorFactoryImpl : FeatureExtractorFactory {
11+
override operator fun invoke(graph: InterProceduralUnitGraph): FeatureExtractor = FeatureExtractorImpl(graph)
12+
}

0 commit comments

Comments
 (0)