-
Notifications
You must be signed in to change notification settings - Fork 206
feat(metrics): add scheduler attempt counter #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): add scheduler attempt counter #1931
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
pkg/epp/metrics/metrics.go
Outdated
| // RecordSchedulingOutcome records metrics at the end of a scheduling attempt, | ||
| // including latency, attempt status. | ||
| func RecordSchedulingOutcome(duration time.Duration, err error) { | ||
| RecordSchedulerE2ELatency(duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this metric should be nested in here. Please remove this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
pkg/epp/scheduling/scheduler.go
Outdated
|
|
||
| defer func() { | ||
| metrics.RecordSchedulerE2ELatency(time.Since(scheduleStart)) | ||
| metrics.RecordSchedulingOutcome(time.Since(scheduleStart), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this line and the deleted line above with:
| metrics.RecordSchedulingOutcome(time.Since(scheduleStart), err) | |
| duration := time.Since(scheduleStart) | |
| metrics.RecordSchedulerE2ELatency(duration) | |
| metrics.RecordSchedulingOutcome(duration, err) |
f54475a to
6983812
Compare
6983812 to
1f791b7
Compare
|
/lgtm |
pkg/epp/scheduling/scheduler.go
Outdated
| before := time.Now() | ||
| result, err := s.profileHandler.ProcessResults(ctx, cycleState, request, profileRunResults) | ||
| metrics.RecordPluginProcessingLatency(framework.ProcessProfilesResultsExtensionPoint, s.profileHandler.TypedName().Type, s.profileHandler.TypedName().Name, time.Since(before)) | ||
| metrics.RecordSchedulerAttempt(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not capturing correctly all error paths.
e.g., L82 is yet another path where the scheduler attempt fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this catch, I've made some change.
Signed-off-by: CYJiang <googs1025@gmail.com>
1f791b7 to
6b1988a
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: googs1025, nirrozenbaum, shmuelk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduce
SchedulerAttemptsTotalcounter with "success"/"failure" statuslabels. This improves observability of
the scheduler for monitoring, alerting, and debugging in production.
Which issue(s) this PR fixes:
Fixes None
Does this PR introduce a user-facing change?: