Skip to content

PoC: Integrate conditional reads in Kubernetes #50

@luxas

Description

@luxas

Right now, the conditional read functionality is integrated into the webhook itself, and not exposed to Kubernetes.

A concrete example of how a future version of Kubernetes could integrate conditional reads
would be that authorizers are allowed to return conditions also on read
requests. The syntax of the condition must be a
generalized selector (most
likely a subset of CEL) of a well-known condition type. Note that extraction of
values from an object does not need to change, we can limit expressiveness to
labels and existing simple JSONpath-based extractors.

Consider the following fictional authorizer chain decisions:

  • Authorizer 1:
    • effect=Deny condition: metadata.labels.owner != "lucas"
    • effect=NoOpinion condition: metadata.labels.visible != "true"
    • effect=Allow condition: object.type == "k8s.io/basic-auth"
    • effect=Allow condition: metadata.labels.public == "true"
  • Authorizer 2:
    • effect=Allow condition: metadata.labels.env == "dev"

Note that the authorizer chain should be walked until a concrete decision is reached. These conditions turn into the following boolean predicate:

isAuthorized(object) = !(object.metadata.labels.owner != "lucas") AND (
  (
    !(object.metadata.labels.visible != "true") AND
    (
      (object.type == "k8s.io/basic-auth") OR
      (object.metadata.labels.public == "true")
    )
  ) OR
  (
    (object.metadata.labels.env == "dev")
  )
)

which could also be written in Disjunctive Normal Form (DNF) as follows:

isAuthorized(object) = (
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.visible == "true") AND
  (object.type == "k8s.io/basic-auth")
) OR
(
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.visible == "true") AND
  (object.metadata.labels.public == "true")
) OR
(
  (object.metadata.labels.owner == "lucas") AND
  (object.metadata.labels.env == "dev")
)

Note that the authorizer 1's effect=Deny condition must evaluate to false for
an object to be matched. However, the effect=NoOpinion is scoped only to
authorizer 1, if an object was such that metadata.labels.owner == "lucas" and
metadata.labels.env == "dev", it is authorized by authorizer 2, even though
metadata.labels.visible == "false" (which yields a NoOpinion response from
authorizer 1).

The API server must make sure that every object that is returned from storage is
authorized. The API server cannot know what objects are in storage (as one of
the authorization requirements is to be stateless with regards to the data
store), but it can prove something stronger: for every possible object that
could be constructed, that matches the given objectSelected(object) selector,
isAuthorized(object) is true.

This equation can be resolved with a SAT/SMT solver as follows:

(forall object: objectSelected(object) => isAuthorized(object)) == TRUE
=== (forall object: (not objectSelected(object)) OR isAuthorized(object)) == TRUE
=== (exists object: objectSelected(object) AND (not isAuthorized(object))) == FALSE

A client who wants to ask "show me all instances of resource X that I can see"
can thus perform a SelfSAR, construct a selector objectSelected which is equal
to isAuthorized (and thus correct-by-construction), and thus see all objects
that it can, without having to know its permissions up front, or issue n
different requests (e.g. for each namespace). This would work for
controllers/watches as well. Even more conveniently, the API server could
provide the client with a mode that "downgrades" an unconstrained request (e.g.
GET /api/v1/pods) by the server adding the selector that the client is
authorized to see. This could/would answer the "impossible problem" posed in David Eads and Joe Betz recent KubeCon talk.

The "impossible problem" referenced in the talk refers to the race condition of resuming a watch of a given resourceVersion, when in between the watch was opened the previous and current time, permissions changed. Thus might the resumption of the watch target a different set of objects (according to new permissions) than the old watch observed (according to old permissions).

With conditional reads, the controller can "lock" the permission snapshot it uses up front when constructing the watch. This means that the controller can choose to use the old permissions when resuming the watch, such that exactly the same objects are returned (at their respective resourceVersions) as before. This all can thus be done without the authorizer or the kube-apiserver needing to store any state about the watch's permission state; this is done client-side in an opaque way.

However, eventually the controller might want to "upgrade" its authorization state, intentionally. It can do so through a new SelfSubjectAccessReview, which yields updated permissions. Because the selector would be restricted to simple AND, OR, NOT, ==, and in semantics, it is possible to compute the difference between two selectors (the old and the new permission selectors) to find out what:

a) what objects were authorized before, but are not anymore
b) what objects are authorized now, but not before

The controller can thus list the newly-authorized (b) objects separately, and enqueue these like normal, or run them through some special init procedure.

For objects that were authorized before, but not anymore (a), the controller has three options; this holds already with existing controllers (a controller today can open a watch of e.g. all secrets in the cluster, and on reconnect get a 403):

  1. keep enqueueing un-authorized objects (I think this is the current behavior); client.Get will hit the cached/last-seen objects in memory from before the previous watch termination, but (presumably) any update will also fail (assuming the controller lost both read and update access simultaneously), so no reconciles of such objects will succeed
  2. execute some special access denied cleanup function on the objects, which tell the controller that it lost access to the object, but that it should most likely NOT delete the target system being reconciled, as the object indeed might still exist in the API server (403 != 404)
  3. just stop enqueueing the controller for this object silently

I think this feature set could be quite compelling to solve this "show me all resources I can see" issue, for both "normal" clients and controllers, without having to build in the expected permissions into the client through configuration that eventually will drift out of sync with actual policies.

One could do a PoC with this, through
a) designing a CEL subset that could be used as a generalized selector
b) moving the "is this selector contained within the authorized set" SMT solving logic from this project to k8s core as a PoC. This would require embedding an SMT solver like cvc5 or z3 through wasm into the kube-apiserver, or translating the selectors into generic SAT form, and using pure Go SAT solver like https://github.com/crillab/gophersat
c) PoC-ing the controller side-implementation

(However, note that Conditional Reads are NOT part of the KEP (kubernetes/enhancements#5684) right now,
another KEP is expected for that eventually (if people like the idea), but I
felt it is good to mention the sketch up-front here so that reviewers have an
idea how conditional authorization can become usable for both reads and writes,
eventually.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions