PoC: Integrate conditional reads in Kubernetes

Right now, the conditional read functionality is integrated into the webhook itself, and not exposed to Kubernetes.

A concrete example of how a future version of Kubernetes could integrate conditional reads
would be that authorizers are allowed to return conditions also on read
requests. The syntax of the condition must be a
[generalized selector](github.com/kubernetes/kubernetes/issues/128154) (most
likely a subset of CEL) of a well-known condition type. Note that extraction of
values from an object does not need to change, we can limit expressiveness to
labels and existing simple JSONpath-based extractors.

Consider the following fictional authorizer chain decisions:

- Authorizer 1:
  - `effect=Deny` condition: `metadata.labels.owner != "lucas"`
  - `effect=NoOpinion` condition: `metadata.labels.visible != "true"`
  - `effect=Allow` condition: `object.type == "k8s.io/basic-auth"`
  - `effect=Allow` condition: `metadata.labels.public == "true"`
- Authorizer 2:
  - `effect=Allow` condition: `metadata.labels.env == "dev"`

Note that the authorizer chain should be walked until a concrete decision is reached. These conditions turn into the following boolean predicate:

```cel
isAuthorized(object) = !(object.metadata.labels.owner != "lucas") AND (
  (
    !(object.metadata.labels.visible != "true") AND
    (
      (object.type == "k8s.io/basic-auth") OR
      (object.metadata.labels.public == "true")
    )
  ) OR
  (
    (object.metadata.labels.env == "dev")
  )
)
```

which could also be written in Disjunctive Normal Form (DNF) as follows:

```cel
isAuthorized(object) = (
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.visible == "true") AND
  (object.type == "k8s.io/basic-auth")
) OR
(
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.visible == "true") AND
  (object.metadata.labels.public == "true")
) OR
(
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.env == "dev")
)
```

Note that the authorizer 1's `effect=Deny` condition must evaluate to false for
an object to be matched. However, the `effect=NoOpinion` is scoped only to
authorizer 1, if an object was such that `metadata.labels.owner == "lucas"` and
`metadata.labels.env == "dev"`, it is authorized by authorizer 2, even though
`metadata.labels.visible == "false"` (which yields a `NoOpinion` response from
authorizer 1).

The API server must make sure that every object that is returned from storage is
authorized. The API server cannot know what objects are in storage (as one of
the authorization requirements is to be stateless with regards to the data
store), but it can prove something stronger: for every possible `object` that
could be constructed, that matches the given `objectSelected(object)` selector,
`isAuthorized(object)` is true.

This equation can be resolved with a SAT/SMT solver as follows:

```text
(forall object: objectSelected(object) => isAuthorized(object)) == TRUE
=== (forall object: (not objectSelected(object)) OR isAuthorized(object)) == TRUE
=== (exists object: objectSelected(object) AND (not isAuthorized(object))) == FALSE
```

A client who wants to ask "show me all instances of resource X that I can see"
can thus perform a SelfSAR, construct a selector `objectSelected` which is equal
to `isAuthorized` (and thus correct-by-construction), and thus see all objects
that it can, without having to know its permissions up front, or issue `n`
different requests (e.g. for each namespace). This would work for
controllers/watches as well. Even more conveniently, the API server could
provide the client with a mode that "downgrades" an unconstrained request (e.g.
`GET /api/v1/pods`) by the server adding the selector that the client is
authorized to see. This could/would answer the "impossible problem" posed in [David Eads and Joe Betz recent KubeCon talk](https://kccncna2025.sched.com/event/27Nnf/sig-api-machinery-and-ai-what-comes-next-joe-betz-google-david-eads-red-hat).

The "impossible problem" referenced in the talk refers to the race condition of resuming a watch of a given resourceVersion, when in between the watch was opened the previous and current time, permissions changed. Thus might the resumption of the watch target a different set of objects (according to new permissions) than the old watch observed (according to old permissions).

With conditional reads, the controller can "lock" the permission snapshot it uses up front when constructing the watch. This means that the controller can choose to use the old permissions when resuming the watch, such that exactly the same objects are returned (at their respective resourceVersions) as before. This all can thus be done without the authorizer or the kube-apiserver needing to store any state about the watch's permission state; this is done client-side in an opaque way.

However, eventually the controller might want to "upgrade" its authorization state, intentionally. It can do so through a new `SelfSubjectAccessReview`, which yields updated permissions. Because the selector would be restricted to simple AND, OR, NOT, `==`, and `in` semantics, it is possible to compute the difference between two selectors (the old and the new permission selectors) to find out what:

a) what objects were authorized before, but are not anymore
b) what objects are authorized now, but not before

The controller can thus list the newly-authorized (b) objects separately, and enqueue these like normal, or run them through some special init procedure.

For objects that were authorized before, but not anymore (a), the controller has three options; this holds already with existing controllers (a controller today can open a watch of e.g. all secrets in the cluster, and on reconnect get a 403):

1) keep enqueueing un-authorized objects (I think this is the current behavior); `client.Get` will hit the cached/last-seen objects in memory from before the previous watch termination, but (presumably) any update will also fail (assuming the controller lost both read and update access simultaneously), so no reconciles of such objects will succeed
2) execute some special access denied cleanup function on the objects, which tell the controller that it lost access to the object, but that it should most likely NOT delete the target system being reconciled, as the object indeed might still exist in the API server (403 != 404)
3) just stop enqueueing the controller for this object silently

I think this feature set could be quite compelling to solve this "show me all resources I can see" issue, for both "normal" clients and controllers, without having to build in the expected permissions into the client through configuration that eventually will drift out of sync with actual policies.

One could do a PoC with this, through
a) designing a CEL subset that could be used as a generalized selector
b) moving the "is this selector contained within the authorized set" SMT solving logic from this project to k8s core as a PoC. This would require embedding an SMT solver like cvc5 or z3 through wasm into the kube-apiserver, or translating the selectors into generic SAT form, and using pure Go SAT solver like https://github.com/crillab/gophersat
c) PoC-ing the controller side-implementation

(However, note that Conditional Reads are NOT part of the KEP (https://github.com/kubernetes/enhancements/pull/5684) right now,
another KEP is expected for that eventually (if people like the idea), but I
felt it is good to mention the sketch up-front here so that reviewers have an
idea how conditional authorization can become usable for both reads and writes,
eventually.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoC: Integrate conditional reads in Kubernetes #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PoC: Integrate conditional reads in Kubernetes #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions