Skip to content

Commit 7544bc8

Browse files
authored
Merge pull request #3974 from owen-mc/docs/query-classification-and-display
Docs: Query classification and display
2 parents e89e99d + 77312a2 commit 7544bc8

File tree

1 file changed

+101
-0
lines changed

1 file changed

+101
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Query classification and display
2+
3+
## Attributable Queries
4+
5+
The results of some queries are unsuitable for attribution to individual
6+
developers. Most of them have a threshold value on which they trigger,
7+
for example all metric violations and statistics based queries. The
8+
results of such queries would all be attributed to the person pushing
9+
the value over (or under) the threshold. Some queries only trigger when
10+
another one doesn't. An example of this is the MaybeNull query which
11+
only triggers if the AlwaysNull query doesn't. A small change in the
12+
data flow could make an alert switch from AlwaysNull to MaybeNull (or
13+
vice versa). As a result we attribute both a fix and an introduction to
14+
the developer that changed the data flow. For this particular example
15+
the funny attribution results are more a nuisance than a real problem;
16+
the overall alert count remains unchanged. However, for the duplicate
17+
and similar code queries the effects can be much more severe, as they
18+
come in versions for "duplicate file" and "duplicate function" among
19+
many others, where "duplicate function" only triggers if "duplicate
20+
file" didn't. As a result adding some code to a duplicate file might
21+
result in a "fix" of a "duplicate file" alert and an introduction of
22+
many "duplicate function" alerts. This would be highly unfair.
23+
Currently, only the duplicate and similar code queries exhibit this
24+
"exchanging one for many" alerts when trying to attribute their results.
25+
Therefore we currently exclude all duplicate code related alerts from
26+
attribution.
27+
28+
The following queries are excluded from attribution:
29+
30+
- Metric violations, i.e. the ones with metadata properties like
31+
 `@(error|warning|recommendation)-(to|from)`
32+
- Queries with tag `non-attributable`
33+
34+
This check is applied when the results of a single attribution are
35+
loaded into the datastore. This means that any change to this behaviour
36+
will only take effect on newly attributed revisions but the historical
37+
data remains unchanged.
38+
39+
## Query severity and precision
40+
41+
We currently classify queries on two axes, with some additional tags.
42+
Those axes are severity and precision, and are defined using the
43+
query-metadata properties `@problem.severity` and `@precision`.
44+
45+
For severity, we have the following categories:
46+
47+
- Error
48+
- Warning
49+
- Recommendation
50+
51+
These categories may change in the future.
52+
53+
For precision, we have the following categories:
54+
55+
- very-high
56+
- high
57+
- medium
58+
- low
59+
60+
As [usual](https://en.wikipedia.org/wiki/Precision_and_recall),
61+
precision is defined as the percentage of query results that are true
62+
positives, i.e., precision = number of true positives / (number of true
63+
positives + number of false positives). There is no hard-and-fast rule
64+
for which precision ranges correspond to which categories.
65+
66+
We expect these categories to remain unchanged for the foreseeable
67+
future.
68+
69+
### A note on precision
70+
71+
Intuitively, precision measures how well the query performs at finding the
72+
results it is supposed to find, i.e., how well it implements its
73+
(informal, unwritten) rule. So how precise a query is depends very much
74+
on what we consider that rule to be. We generally try to sharpen our
75+
rules to focus on results that a developer might actually be interested
76+
in.
77+
78+
## Which queries to run and display on LGTM
79+
80+
The following queries are run:
81+
82+
Precision: | very-high | high | medium | low
83+
---------------|-----------|---------|---------|----
84+
Error | **Yes** | **Yes** | **Yes** | No
85+
Warning | **Yes** | **Yes** | **Yes** | No
86+
Recommendation | **Yes** | **Yes** | No | No
87+
88+
The following queries have their results displayed by default:
89+
90+
Precision: | very-high | high | medium | low
91+
---------------|-----------|---------|--------|----
92+
Error | **Yes** | **Yes** | No | No
93+
Warning | **Yes** | **Yes** | No | No
94+
Recommendation | **Yes** | No | No | No
95+
96+
Results for queries that are run but not displayed by default can be
97+
made visible by editing the project configuration.
98+
99+
Queries from custom query packs (in-repo or site-wide) are always run
100+
and displayed by default. They can be hidden by editing the project
101+
config, and "disabled" by removing them from the query pack.

0 commit comments

Comments
 (0)