|
| 1 | +# Query classification and display |
| 2 | + |
| 3 | +## Attributable Queries |
| 4 | + |
| 5 | +The results of some queries are unsuitable for attribution to individual |
| 6 | +developers. Most of them have a threshold value on which they trigger, |
| 7 | +for example all metric violations and statistics based queries. The |
| 8 | +results of such queries would all be attributed to the person pushing |
| 9 | +the value over (or under) the threshold. Some queries only trigger when |
| 10 | +another one doesn't. An example of this is the MaybeNull query which |
| 11 | +only triggers if the AlwaysNull query doesn't. A small change in the |
| 12 | +data flow could make an alert switch from AlwaysNull to MaybeNull (or |
| 13 | +vice versa). As a result we attribute both a fix and an introduction to |
| 14 | +the developer that changed the data flow. For this particular example |
| 15 | +the funny attribution results are more a nuisance than a real problem; |
| 16 | +the overall alert count remains unchanged. However, for the duplicate |
| 17 | +and similar code queries the effects can be much more severe, as they |
| 18 | +come in versions for "duplicate file" and "duplicate function" among |
| 19 | +many others, where "duplicate function" only triggers if "duplicate |
| 20 | +file" didn't. As a result adding some code to a duplicate file might |
| 21 | +result in a "fix" of a "duplicate file" alert and an introduction of |
| 22 | +many "duplicate function" alerts. This would be highly unfair. |
| 23 | +Currently, only the duplicate and similar code queries exhibit this |
| 24 | +"exchanging one for many" alerts when trying to attribute their results. |
| 25 | +Therefore we currently exclude all duplicate code related alerts from |
| 26 | +attribution. |
| 27 | + |
| 28 | +The following queries are excluded from attribution: |
| 29 | + |
| 30 | +- Metric violations, i.e. the ones with metadata properties like |
| 31 | + `@(error|warning|recommendation)-(to|from)` |
| 32 | +- Queries with tag `non-attributable` |
| 33 | + |
| 34 | +This check is applied when the results of a single attribution are |
| 35 | +loaded into the datastore. This means that any change to this behaviour |
| 36 | +will only take effect on newly attributed revisions but the historical |
| 37 | +data remains unchanged. |
| 38 | + |
| 39 | +## Query severity and precision |
| 40 | + |
| 41 | +We currently classify queries on two axes, with some additional tags. |
| 42 | +Those axes are severity and precision, and are defined using the |
| 43 | +query-metadata properties `@problem.severity` and `@precision`. |
| 44 | + |
| 45 | +For severity, we have the following categories: |
| 46 | + |
| 47 | +- Error |
| 48 | +- Warning |
| 49 | +- Recommendation |
| 50 | + |
| 51 | +These categories may change in the future. |
| 52 | + |
| 53 | +For precision, we have the following categories: |
| 54 | + |
| 55 | +- very-high |
| 56 | +- high |
| 57 | +- medium |
| 58 | +- low |
| 59 | + |
| 60 | +As [usual](https://en.wikipedia.org/wiki/Precision_and_recall), |
| 61 | +precision is defined as the percentage of query results that are true |
| 62 | +positives, i.e., precision = number of true positives / (number of true |
| 63 | +positives + number of false positives). There is no hard-and-fast rule |
| 64 | +for which precision ranges correspond to which categories. |
| 65 | + |
| 66 | +We expect these categories to remain unchanged for the foreseeable |
| 67 | +future. |
| 68 | + |
| 69 | +### A note on precision |
| 70 | + |
| 71 | +Intuitively, precision measures how well the query performs at finding the |
| 72 | +results it is supposed to find, i.e., how well it implements its |
| 73 | +(informal, unwritten) rule. So how precise a query is depends very much |
| 74 | +on what we consider that rule to be. We generally try to sharpen our |
| 75 | +rules to focus on results that a developer might actually be interested |
| 76 | +in. |
| 77 | + |
| 78 | +## Which queries to run and display on LGTM |
| 79 | + |
| 80 | +The following queries are run: |
| 81 | + |
| 82 | +Precision: | very-high | high | medium | low |
| 83 | +---------------|-----------|---------|---------|---- |
| 84 | +Error | **Yes** | **Yes** | **Yes** | No |
| 85 | +Warning | **Yes** | **Yes** | **Yes** | No |
| 86 | +Recommendation | **Yes** | **Yes** | No | No |
| 87 | + |
| 88 | +The following queries have their results displayed by default: |
| 89 | + |
| 90 | +Precision: | very-high | high | medium | low |
| 91 | +---------------|-----------|---------|--------|---- |
| 92 | +Error | **Yes** | **Yes** | No | No |
| 93 | +Warning | **Yes** | **Yes** | No | No |
| 94 | +Recommendation | **Yes** | No | No | No |
| 95 | + |
| 96 | +Results for queries that are run but not displayed by default can be |
| 97 | +made visible by editing the project configuration. |
| 98 | + |
| 99 | +Queries from custom query packs (in-repo or site-wide) are always run |
| 100 | +and displayed by default. They can be hidden by editing the project |
| 101 | +config, and "disabled" by removing them from the query pack. |
0 commit comments