Add Guidance wrt Labelling to Naming and Rules Best Practices#2691
Add Guidance wrt Labelling to Naming and Rules Best Practices#2691conallob wants to merge 9 commits intoprometheus:mainfrom
Conversation
Signed-off-by: Conall O'Brien <conall@conall.net>
Signed-off-by: Conall O'Brien <conall@conall.net>
|
Obligatory post-it note reminder: https://photos.app.goo.gl/Bkfir4wRiLtNVG4W8 |
|
With my current patchy availability, there is little chance I get to this anytime soon. Maybe @juliusv has a qualified opinion here? |
|
Hey @conallob, congrats for the nice PR! Just one thing: maybe you could fix the
There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:
I don't understand the |
Fixed the typo.
I'm afraid that best practice is unrelated. It also makes sense as written, once you've written enough rules. It's weighing up the trade-off between tracking the chain of operations across a pipeline of rules vs the rule name growing unwieldy. Many of these best practices trace back to specific philosophies from Prometheus' predecessor. If you still think it needs a polish, please a separate doc bug. |
Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Conall O'Brien <conall@conall.net>
Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Conall O'Brien <conall@conall.net>
| Keeping the metric name unchanged makes it easy to know what a metric is and | ||
| easy to find in the codebase. | ||
|
|
||
| IMPORTANT: `job` label acts as a primary key. It is **strongly** recommended that you use it to scope your PromQL expressions to the system you are monitoring. |
There was a problem hiding this comment.
This is misleading. Prometheus doesn't have the concept of "primary key". Not even metric names are a "primary key".
There was a problem hiding this comment.
Fair, especially since folks used to SQL DBs will jump to the conclusion that it's a SQL DB, which it isn't.
Iterated on the language to avoid creating ambiguity
Iterate on the description of the job label, removing "primary key", given it's association with SQL Signed-off-by: Conall O'Brien <conall@conall.net>
|
PTAL |
|
Friendly ping? |
|
Friendly, you're not at SRECon EMEA this week, ping? |
| WARNING: When using `without`, be careful not to strip out the `job` label accidentally. | ||
|
|
||
| * `instance` | ||
| * The `instance` label will include the `ip:port` what was scraped, providing a crucial breadcrumb for debugging scrape time issues |
There was a problem hiding this comment.
| * The `instance` label will include the `ip:port` what was scraped, providing a crucial breadcrumb for debugging scrape time issues | |
| * The `instance` label by default will include the `ip:port` what was scraped. |
| ## Labels | ||
|
|
||
| * `job` | ||
| * The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. |
There was a problem hiding this comment.
| * The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. | |
| * The `job` is a default target label set by the scrape configs and is used to identify metrics scraped from the same target/exporter. |
|
|
||
| * `job` | ||
| * The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. | ||
| * If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation |
There was a problem hiding this comment.
I'm not sure this is really a useful note here, as this applies to all label matching.
| * If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation |
There was a problem hiding this comment.
It applies to all labels. But job and instance are two uniform labels found on every metric, including ubiquitous synthetic metrics such as up
There was a problem hiding this comment.
Yes, but it's not related to job, but related to "target labels" and discovery. That is a different thing and related to querying, not creating labels.
| WARNING: When using `without`, be careful not to strip out the `job` label accidentally. | ||
|
|
There was a problem hiding this comment.
This warning doesn't make a lot of sense to me. It has a high probability of being quoted as copy-pasta without being understood. Let's just drop it.
| WARNING: When using `without`, be careful not to strip out the `job` label accidentally. |
There was a problem hiding this comment.
I don't want to encourage copy-pasta, but this is an important point.
If using alerting expressions like up{job=bla} > 0 for 3m , you need to be careful not to accidentally strip the job label. If you do, your alert no longer works as intended.
I'll polish the wording here
There was a problem hiding this comment.
This doesn't seem related to naming practices, which is what this guide is about.
Co-authored-by: Ben Kochie <superq@gmail.com> Signed-off-by: Conall O'Brien <conall@conall.net>
|
For perspective, one of the motivations behind this PR is the anti-patterm of writing alert expressions intended for a single tenant system, which has evolved into a multi-tenant system. e.g Once you start adding additional jobs that match on the same labels (e.g Daemonsets, fleet-wide node_exporter, etc), teams start getting paged for systems they don't own or care about |
Add Guidance wrt Labelling to Naming and Rules Best Practices to docs/practices/naming.md and docs/practices/rules.md, specifically:
jobandinstancejoblabel, especially in multi-tenant systemsThis Fixes #2690