Verify and document queue-proxy pod statistics security#16366
Verify and document queue-proxy pod statistics security#16366knative-prow[bot] merged 2 commits intoknative:mainfrom
Conversation
This change verifies and documents that queue-proxy instances can only report statistics for their own pod and cannot spoof metrics for other pods. Changes: - Added TestProtobufStatsReporterPodNameImmutable to verify pod name immutability across multiple stat reports - Added TestProtobufStatsReporterMultiplePods to verify separate pod identities are maintained - Enhanced ProtobufStatsReporter documentation explaining the security property of immutable pod identity binding The podName field is set once at construction from the Kubernetes environment and cannot be modified, ensuring correct metric attribution for autoscaling decisions.
|
Hi @sonikaarora. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi, If I understand your tests right they basically only check that golang works correctly and doesn't mix properties of multiple instances of the same struct - IMO no need to test this 😀 . Considering that the issue kind is documentation and considering this quote from the issue I think that this PR is not the correct approach to solve it:
|
|
/hold |
d3b0f24 to
85fc1c7
Compare
|
Hi @linkvt, Thanks for the feedback! You're absolutely right - my initial tests were too basic. I've updated the PR to address your concern. The key change is a new integration test: This test verifies the complete autoscaler → queue-proxy flow:
This directly addresses the issue requirement: "when autoscaler calls the queue proxy, all the statistics from the queue-proxy are associated with the Revision and Pod". The original unit tests remain (they verify pod-level immutability) but are now complemented by this end-to-end test showing revision isolation via Kubernetes labels and PodAccessor filtering. Would appreciate another look when you have time. Thanks! |
|
/ok-to-test |
|
/assign @evankanderson for input I read the title of the issue and thought this was about the autoscaler verifying the source of metrics are coming from a specific pod/revision. But seeing the label about documentation and the body of the issue I'm unsure about the requirements. |
|
@dprotaso: GitHub didn't allow me to assign the following users: for, input. Note that only knative members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
|
||
| // TestProtobufStatsReporterMultiplePods verifies that different queue-proxy instances | ||
| // maintain separate pod identities and cannot report for each other. | ||
| func TestProtobufStatsReporterMultiplePods(t *testing.T) { |
There was a problem hiding this comment.
I think we can drop this test. It's not really doing anything more than TestProtobufStatsReporterPodNameImmutable
| // that statistics are correctly attributed to the appropriate revision. | ||
| // | ||
| // Related to: https://github.com/knative/serving/issues/11015 | ||
| func TestScrapeOnlyScrapesPodsForCorrectRevision(t *testing.T) { |
There was a problem hiding this comment.
The test scraper only really supports scraping one revision's pods - so we shouldn't seed stats for revision B
| // Create a scraper specifically for revision-a | ||
| // The fake client will return stats for all 4 pods, but the scraper | ||
| // should only scrape the 2 pods belonging to revision-a | ||
| allStats := append(statsRevA, statsRevB...) |
There was a problem hiding this comment.
Stats here shouldn't include statsRevB because the test scraper returns stats in the order of invocation
For example - if I were to call scrape 4 times then you'd get stats rev b being returned in the scraper. This would mean the mock is not behaving like the real scraper.
|
Yes, the original issue is to verify that when the autoscaler is reading metrics from a specific pod. In particular, https://github.com/knative/serving/blob/main/pkg/autoscaler/metrics/stats_scraper.go#L326 passes a We can't actually tell whether it's a queue proxy or something malicious; we do prevent a DoS by limiting the buffer size. We should also prevent it from replying for a different Knative service, but I don't think anyone spent the time to analyze the call stack to see if this was a reasonable concern. We seem to have a |
I've read the code. It appears that the autoscaler works as follows:
Based on this structure, the most damage an attacker with editor on all types in a namespace (the normal
In particular, it would not be possible to use access to the |
|
That's not to say that these might not be useful changes, but the "investigate" part of the original issue really was "investigate" before proposing fixes. |
|
FWIW - I'm good to merge the PR (with the requested changes) though I'd say it doesn't fix the target issue. |
85fc1c7 to
cc0a491
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #16366 +/- ##
==========================================
+ Coverage 80.17% 80.22% +0.05%
==========================================
Files 216 216
Lines 13440 13440
==========================================
+ Hits 10775 10782 +7
+ Misses 2300 2294 -6
+ Partials 365 364 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@dprotaso Thanks for the review! I've made both changes:
Updated commit is pushed. Please take another look when you get a chance. |
|
/retest |
|
/lgtm |
|
/hold cancel |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, sonikaarora The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
This PR verifies and documents that queue-proxy instances can only report statistics for their own pod and cannot spoof metrics for other pods.
Changes
TestProtobufStatsReporterPodNameImmutable: Verifies that the pod name remains immutable across multiple stat reportsTestProtobufStatsReporterMultiplePods: Verifies that separate queue-proxy instances maintain independent pod identitiesProtobufStatsReporterexplaining immutable pod identity bindingVerification
The implementation was inspected and verified to ensure:
env.ServingPod)podNamefield is private and has no setter methodsTest Results
All tests pass including the new security verification tests:
Coverage: 97.1% of statements in pkg/queue
Backward Compatibility
✅ No breaking changes
✅ All existing tests pass
✅ No changes to public APIs
Related: #11015