Skip to content

Commit 430ed1a

Browse files
committed
SARIF reporter
SARIF is a unified file format used to exchange information between static analysis tools (like pylint) and various types of formatters, meta-runners, broadcasters / alert system, ... This implementation is ad-hoc, and non-validating. Spec v Github ------------- Turns out Github both doesn't implement all of SARIF (which makes sense) and requires a bunch of properties which the spec considers optional. The [official SARIF validator][] (linked to by both oasis and github) was used to validate the output of the reporter, ensuring that all the github requirements it flags are fulfilled, and fixing *some* of the validator's pet issues. As of now the following issues are left unaddressed: - azure requires `run.automationDetails`, looking at the spec I don't think it makes sense for the reporter to inject that, it's more up to the CI - the validator wants a `run.versionControlProvenance`, same as above - the validator wants rule names in PascalCase, lol - the validator wants templated result messages, but without pylint providing the args as part of the `Message` that's a bit of a chore - the validator wants `region` to include a snippet (the flagged content) - the validator wants `physicalLocation` to have a `contextRegion` (most likely with a snippet) On URIs ------- The reporter makes use of URIs for artifacts (~files). Per ["guidance on the use of artifactLocation objects"][3.4.7], `uri` *should* capture the deterministic part of the artifact location and `uriBaseId` *should* capture the non-deterministic part. However as far as I can tell pylint has no requirement (and no clean way to require) consistent resolution roots: `path` is just relative to the cwd, and there is no requirement to have project-level files to use pylint. This makes the use of relative uris dodgy, but absolute uris are pretty much always broken for the purpose of *interchange* so they're not really any better. As a side-note, Github [asserts][relative-uri-guidance] > While this [nb: `originalUriBaseIds`] is not required by GitHub for > the code scanning results to be displayed correctly, it is required > to produce a valid SARIF output when using relative URI references. However per [3.4.4][] this is incorrect, the `uriBaseId` can be resolved through end-user configuration, `originalUriBaseIds`, external information (e.g. envvars), or heuristics. It would be nice to document the "relative root" via `originalUriBaseIds` (which may be omitted for that purpose per [3.14.14][], but per the above claiming a consistent project root is dodgy. We *could* resolve known project files (e.g. pyproject.toml, tox.ini, etc...) in order to find a consistent root (project root, repo root, ...) and set / use that for relative URIs but that's a lot of additional complexity which I'm not sure is warranted at least for a first version. Fixes #5493 [3.4.4]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540869 [3.4.7]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540872 [3.14.14]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10540936 [relative-uri-guidance]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#relative-uri-guidance-for-sarif-producers [official SARIF validator]: https://sarifweb.azurewebsites.net/
1 parent 7588243 commit 430ed1a

File tree

6 files changed

+393
-1
lines changed

6 files changed

+393
-1
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Support for SARIF as an output format.
2+
3+
Closes #5493
4+
Closes #10647

pylint/lint/base_options.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ def _make_linter_options(linter: PyLinter) -> Options:
104104
"group": "Reports",
105105
"help": "Set the output format. Available formats are: 'text', "
106106
"'parseable', 'colorized', 'json2' (improved json format), 'json' "
107-
"(old json format), msvs (visual studio) and 'github' (GitHub actions). "
107+
"(old json format), msvs (visual studio), 'github' (GitHub actions), "
108+
"and 'sarif'. "
108109
"You can also give a reporter class, e.g. mypackage.mymodule."
109110
"MyReporterClass.",
110111
"kwargs": {"linter": linter},

pylint/reporters/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from pylint.reporters.json_reporter import JSON2Reporter, JSONReporter
1515
from pylint.reporters.multi_reporter import MultiReporter
1616
from pylint.reporters.reports_handler_mix_in import ReportsHandlerMixIn
17+
from pylint.reporters.sarif_reporter import SARIFReporter
1718

1819
if TYPE_CHECKING:
1920
from pylint.lint.pylinter import PyLinter
@@ -31,4 +32,5 @@ def initialize(linter: PyLinter) -> None:
3132
"JSONReporter",
3233
"MultiReporter",
3334
"ReportsHandlerMixIn",
35+
"SARIFReporter",
3436
]

pylint/reporters/sarif_reporter.py

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# Licensed under the GPL: https://www.gnu.org/licenses/old-licenses/gpl-2.0.html
2+
# For details: https://github.com/pylint-dev/pylint/blob/main/LICENSE
3+
# Copyright (c) https://github.com/pylint-dev/pylint/blob/main/CONTRIBUTORS.txt
4+
5+
# pylint: disable=wrong-spelling-in-comment
6+
7+
from __future__ import annotations
8+
9+
import json
10+
from textwrap import shorten
11+
from typing import TYPE_CHECKING, Literal, TypedDict
12+
13+
import pylint
14+
import pylint.message
15+
from pylint.constants import MSG_TYPES
16+
from pylint.reporters import BaseReporter
17+
18+
if TYPE_CHECKING:
19+
from pylint.lint import PyLinter
20+
from pylint.reporters.ureports.nodes import Section
21+
22+
23+
def register(linter: PyLinter) -> None:
24+
linter.register_reporter(SARIFReporter)
25+
26+
27+
class SARIFReporter(BaseReporter):
28+
name = "sarif"
29+
extension = "sarif"
30+
linter: PyLinter
31+
32+
def display_reports(self, layout: Section) -> None:
33+
"""Don't do anything in this reporter."""
34+
35+
def _display(self, layout: Section) -> None:
36+
"""Do nothing."""
37+
38+
def display_messages(self, layout: Section | None) -> None:
39+
"""Launch layouts display."""
40+
output: Log = {
41+
"version": "2.1.0",
42+
"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json",
43+
"runs": [
44+
{
45+
"tool": {
46+
"driver": {
47+
"name": "pylint",
48+
"fullName": f"pylint {pylint.__version__}",
49+
"version": pylint.__version__,
50+
# should be versioned but not all versions are kept so...
51+
"informationUri": "https://pylint.readthedocs.io/",
52+
"rules": [
53+
{
54+
"id": m.msgid,
55+
"name": m.symbol,
56+
"deprecatedIds": [
57+
msgid for msgid, _ in m.old_names
58+
],
59+
"deprecatedNames": [
60+
name for _, name in m.old_names
61+
],
62+
# per 3.19.19 shortDescription should be a
63+
# single sentence which can't be guaranteed,
64+
# however github requires it...
65+
"shortDescription": {
66+
"text": m.description.split(".", 1)[0]
67+
},
68+
# github requires that this is less than 1024 characters
69+
"fullDescription": {
70+
"text": shorten(
71+
m.description, 1024, placeholder="..."
72+
)
73+
},
74+
"help": {"text": m.format_help()},
75+
"helpUri": f"https://pylint.readthedocs.io/en/stable/user_guide/messages/{MSG_TYPES[m.msgid[0]]}/{m.symbol}.html",
76+
# handle_message only gets the formatted message,
77+
# so to use `messageStrings` we'd need to
78+
# convert the templating and extract the args
79+
# out of the msg
80+
}
81+
for checker in self.linter.get_checkers()
82+
for m in checker.messages
83+
if m.symbol in self.linter.stats.by_msg
84+
],
85+
}
86+
},
87+
"results": [self.serialize(message) for message in self.messages],
88+
}
89+
],
90+
}
91+
json.dump(output, self.out)
92+
93+
@staticmethod
94+
def serialize(message: pylint.message.Message) -> Result:
95+
region: Region = {
96+
"startLine": message.line,
97+
"startColumn": message.column + 1,
98+
"endLine": message.end_line or message.line,
99+
"endColumn": (message.end_column or message.column) + 1,
100+
}
101+
102+
location: Location = {
103+
"physicalLocation": {
104+
"artifactLocation": {
105+
"uri": message.path.replace("\\", "/"),
106+
},
107+
"region": region,
108+
},
109+
}
110+
if message.obj:
111+
logical_location: LogicalLocation = {
112+
"name": message.obj,
113+
"fullyQualifiedName": f"{message.module}.{message.obj}",
114+
}
115+
location["logicalLocations"] = [logical_location]
116+
117+
return {
118+
"ruleId": message.msg_id,
119+
"message": {"text": message.msg},
120+
"level": CATEGORY_MAP[message.category],
121+
"locations": [location],
122+
"partialFingerprints": {
123+
# encoding the node path seems like it would be useful to dedup alerts?
124+
"nodePath/v1": "",
125+
},
126+
}
127+
128+
129+
CATEGORY_MAP: dict[str, ResultLevel] = {
130+
"convention": "note",
131+
"refactor": "note",
132+
"statement": "note",
133+
"info": "note",
134+
"warning": "warning",
135+
"error": "error",
136+
"fatal": "error",
137+
}
138+
139+
140+
class Run(TypedDict):
141+
tool: Tool
142+
# invocation parameters / environment for the tool
143+
# invocation: list[Invocations]
144+
results: list[Result]
145+
# originalUriBaseIds: dict[str, ArtifactLocation]
146+
147+
148+
Log = TypedDict(
149+
"Log",
150+
{
151+
"version": Literal["2.1.0"],
152+
"$schema": Literal[
153+
"https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/schemas/sarif-schema-2.1.0.json"
154+
],
155+
"runs": list[Run],
156+
},
157+
)
158+
159+
160+
class Tool(TypedDict):
161+
driver: Driver
162+
163+
164+
class Driver(TypedDict):
165+
name: Literal["pylint"]
166+
# optional but azure wants it
167+
fullName: str
168+
version: str
169+
informationUri: str # not required but validator wants it
170+
rules: list[ReportingDescriptor]
171+
172+
173+
class ReportingDescriptorOpt(TypedDict, total=False):
174+
deprecatedIds: list[str]
175+
deprecatedNames: list[str]
176+
messageStrings: dict[str, MessageString]
177+
178+
179+
class ReportingDescriptor(ReportingDescriptorOpt):
180+
id: str
181+
# optional but validator really wants it (then complains that it's not pascal cased)
182+
name: str
183+
# not required per spec but required by github
184+
shortDescription: MessageString
185+
fullDescription: MessageString
186+
help: MessageString
187+
helpUri: str
188+
189+
190+
class MarkdownMessageString(TypedDict, total=False):
191+
markdown: str
192+
193+
194+
class MessageString(MarkdownMessageString):
195+
text: str
196+
197+
198+
ResultLevel = Literal["none", "note", "warning", "error"]
199+
200+
201+
class ResultOpt(TypedDict, total=False):
202+
ruleId: str
203+
ruleIndex: int
204+
205+
level: ResultLevel
206+
207+
208+
class Result(ResultOpt):
209+
message: Message
210+
# not required per spec but required by github
211+
locations: list[Location]
212+
partialFingerprints: dict[str, str]
213+
214+
215+
class Message(TypedDict, total=False):
216+
# needs to have either text or id but it's a PITA to type
217+
218+
#: plain text message string (can have markdown links but no other formatting)
219+
text: str
220+
#: formatted GFM text
221+
markdown: str
222+
#: rule id
223+
id: str
224+
#: arguments for templated rule messages
225+
arguments: list[str]
226+
227+
228+
class Location(TypedDict, total=False):
229+
physicalLocation: PhysicalLocation # actually required by github
230+
logicalLocations: list[LogicalLocation]
231+
232+
233+
class PhysicalLocation(TypedDict):
234+
artifactLocation: ArtifactLocation
235+
# not required per spec, required by github
236+
region: Region
237+
238+
239+
class ArtifactLocation(TypedDict, total=False):
240+
uri: str
241+
#: id of base URI for resolving relative `uri`
242+
uriBaseId: str
243+
description: Message
244+
245+
246+
class LogicalLocation(TypedDict, total=False):
247+
name: str
248+
fullyQualifiedName: str
249+
#: schema is `str` with a bunch of *suggested* terms, of which this is a subset
250+
kind: Literal[
251+
"function", "member", "module", "parameter", "returnType", "type", "variable"
252+
]
253+
254+
255+
class Region(TypedDict):
256+
# none required per spec, all required by github
257+
startLine: int
258+
startColumn: int
259+
endLine: int
260+
endColumn: int

tests/lint/unittest_expand_modules.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,14 @@ def test__is_in_ignore_list_re_match() -> None:
151151
"name": "reporters.unittest_reporting",
152152
"isignored": False,
153153
},
154+
str(REPORTERS_PATH / "unittest_sarif_reporter.py"): {
155+
"basename": "reporters",
156+
"basepath": str(REPORTERS_PATH / "__init__.py"),
157+
"isarg": False,
158+
"path": str(REPORTERS_PATH / "unittest_sarif_reporter.py"),
159+
"name": "reporters.unittest_sarif_reporter",
160+
"isignored": False,
161+
},
154162
}
155163

156164

0 commit comments

Comments
 (0)