One-Time Pass Codes for Kyverno

In real life, imposed rules often have cases where exceptions may be required but on a case-by-case basis. Policy is really no different here. While prevention of objectively "bad" behavior should be commonplace and enforced as widely as possible, there are valid situations where the rule may need to be bent slightly. I've covered how some of these exceptions work in Kyverno in the past, but I also wanted to explore the possibility of creating some sort of "self-driving" exception system even if just conceptual in nature. In this blog, I'll share a fun little concept project I concocted on how to use Kyverno to implement a one-time pass code system for allowing these exceptions. It's probably not highly practical, but it does give you a sense of what's possible and just how powerful and flexible Kyverno can be to deliver even semi-crazy use cases like this one.

Chances are high you're using some sort of validation policies in your cluster if you're reading this article. And chances are also pretty high that at least one of those policies is in Enforce mode which, as you probably know, will prevent a "bad" resource from being created should it violate one or more rules in the policy. There are a couple ways to provide exceptions in Kyverno. One of those is to define an exclude block in a rule and list them there. Another is to define them centrally in another Kubernetes resource like a ConfigMap. And yet another still is to use the formal PolicyException resource introduced in Kyverno 1.9. These are all really useful mechanisms that you should try and employ. But what if in some situations you just wanted to be mostly hands off and provide a bit more loose control? What if you could just let developers and other users know how they can get around policy but still with some form of an access system? I thought I'd play around with that idea a bit and wanted to see if I could do something like a one-time pass code system for Kyverno. It turns out that because of the amazing flexibility and power of Kyverno, not only can this be done but it really wasn't that difficult!

At the end of the day, the idea is this: provide a unique one-time pass code (OTP) back to a user if their resource is blocked by a validate rule but ensure that code and use of it is documented so it can be audited. And, obviously, to prevent reuse of any code more than once.

With a combination of a couple different Kyverno policies which use both validation and mutation for existing resources, this is all possible. The full sequence of how I wanted this to work is shown below.

Sequence diagram showing end-to-end flow of events.

And here's how to put this together.

First, we'll need a Namespace I'm calling platform in which to put our ConfigMap used as the OTP journal. Obviously, in a case where, for some reason, you wanted to implement this in a "real" environment, you'd absolutely want to protect this with RBAC so users can't read it. This ConfigMap has a key called codes with just some starter codes to give you an idea of the formatting and sample contents.

1apiVersion: v1
2kind: ConfigMap
3metadata:
4  name: otp
5  namespace: platform
6data:
7  codes: |-
8    - ua8v92pg
9    - 9akvm2o7    

Next, we need to create the validation rules. There are two rules going on in this policy.

  1. The invalid-otp rule is universal and not tied to any specific rule or other policy. It simply checks for creation of Deployments which have the otp label set that the code hasn't been consumed. This will come into play later.
  2. The host-namespaces-otp rule is just an existing rule from the Pod Security Standards of Kyverno policies which has been slightly modified to look-up codes from the ConfigMap mentioned earlier. You'll see that the OTP code is actually created in the message field of this rule. This is important because in the next phase, we'll harvest this information to be the input driver for the ConfigMap.

Also, notice how I've used spec.applyRules: One in this policy and ordered the rules such that invalid-otp is first. This is to prevent creation of yet another OTP if a user either specifies an invalid one or a code which has already been consumed. Although OTP codes will be generated automatically any time there is a Deployment which fails the host-namespaces-otp rule, we only want a code to be generated when they aren't trying to specify one in the first place.

Below is the full validation policy.

 1apiVersion: kyverno.io/v2beta1
 2kind: ClusterPolicy
 3metadata:
 4  name: disallow-host-namespaces-otp
 5spec:
 6  validationFailureAction: Enforce
 7  background: false
 8  applyRules: One
 9  rules:
10    - name: invalid-otp
11      match:
12        any:
13        - resources:
14            kinds:
15              - Deployment
16            operations:
17              - CREATE
18            selector:
19              matchLabels:
20                otp: "?*"
21      context:
22      - name: otp
23        configMap:
24          name: otp
25          namespace: platform
26      preconditions:
27        all:
28        - key: "{{ request.object.metadata.labels.otp }}"
29          operator: AnyNotIn
30          value: "{{ parse_yaml(otp.data.codes) }}"
31      validate:
32        message: The code {{ request.object.metadata.labels.otp }} is invalid or has already been used.
33        deny: {}
34    - name: host-namespaces-otp
35      match:
36        any:
37        - resources:
38            kinds:
39              - Deployment
40            operations:
41              - CREATE
42      context:
43      - name: otp
44        configMap:
45          name: otp
46          namespace: platform
47      preconditions:
48        all:
49        - key: "{{ request.object.metadata.labels.otp || '' }}"
50          operator: AnyNotIn
51          value: "{{ parse_yaml(otp.data.codes) }}"
52      validate:
53        message: >-
54          Sharing the host namespaces is disallowed. The fields spec.hostNetwork,
55          spec.hostIPC, and spec.hostPID must be unset or set to `false`. To get around this,
56          you may use a one-time pass code "{{ random('[0-9a-z]{8}') }}" assigned as the value of
57          a label with key "otp". Use of this code will be recorded along with your username.          
58        pattern:
59          spec:
60            template:
61              spec:
62                =(hostPID): false
63                =(hostIPC): false
64                =(hostNetwork): false

The net effect here is if a user tries to create a "bad" Deployment which violates the host-namespaces-otp rule, it'll block them but return a message containing the OTP code and how to use it. Notice also how I'm warning in the message that, if you use this code, it'll be recorded for audit purposes.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: busybox
 5  namespace: default
 6  labels:
 7    app: busybox
 8spec:
 9  replicas: 1
10  selector:
11    matchLabels:
12      app: busybox
13  template:
14    metadata:
15      labels:
16        app: busybox
17    spec:
18      hostIPC: true
19      containers:
20      - image: busybox:1.28
21        name: busybox
22        command: ["sleep", "9999"]
 1$ kubectl apply -f baddeploy.yaml 
 2Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 
 3
 4resource Deployment/default/busybox was blocked due to the following policies 
 5
 6disallow-host-namespaces-otp:
 7  host-namespaces-otp: 'validation error: Sharing the host namespaces is disallowed.
 8    The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set
 9    to `false`. To get around this, you may use a one-time pass code "ee4co4k8" assigned
10    as the value of a label with key "otp". Use of this code will be recorded along
11    with your username. rule host-namespaces-otp failed at path /spec/template/spec/hostIPC/'

Next, we need to implement the ConfigMap management system so that OTP codes are added when they need to be and removed upon first use. This was the fun part. Let me explain how this works.

First, in the add-otp rule, in order to dynamically add the OTP codes to the ConfigMap, we're parsing them out of the Event Kyverno generates whenever there's a blocked resource. This Event–just a standard Kubernetes v1 Event–contains the message which contains the OTP we saw earlier. Since Kyverno can match on these Events (you will need to update your resource filter to allow this), we can use that specific Event as the trigger for a mutate-existing rule on our ConfigMap.

Note: if you remove the Event resource filter you will increase the processing load on Kyverno which will, in turn, require more resources.

With this OTP code extracted from the message, we can append it to the ConfigMap.

Second, in the manage-otp rule, we're watching for the creation of Deployments that set the otp label and, if that value is valid, we're modifying its entry in the ConfigMap to record the timestamp and also username of the actor who consumed it. This serves a dual purpose in that because this information has been appended, the code itself is invalidated. Much better than simply deleting the code from the list.

Below is the second policy with both rules.

 1apiVersion: kyverno.io/v2beta1
 2kind: ClusterPolicy
 3metadata:
 4  name: manage-otp-list
 5spec:
 6  rules:
 7  - name: add-otp
 8    match:
 9      any:
10      - resources:
11          kinds:
12            - v1/Event
13          names:
14            - "disallow-host-namespaces-otp.?*"
15    preconditions:
16      all:
17      - key: "{{ request.object.reason }}"
18        operator: Equals
19        value: PolicyViolation
20      - key: "{{ contains(request.object.message, 'one-time pass code') }}"
21        operator: Equals
22        value: true
23    context:
24    - name: otp
25      variable:
26        jmesPath: split(request.object.message,'"') | [1]
27    mutate:
28      targets:
29        - apiVersion: v1
30          kind: ConfigMap
31          name: otp
32          namespace: platform
33      patchStrategicMerge:
34        data:
35          codes: |-
36            {{ @ }}
37            - {{ otp }}            
38  - name: manage-otp
39    match:
40      any:
41      - resources:
42          kinds:
43            - Deployment
44          operations:
45            - CREATE
46          selector:
47            matchLabels:
48              otp: "?*"
49    context:
50    - name: otp
51      configMap:
52        name: otp
53        namespace: platform
54    preconditions:
55      all:
56      - key: "{{ request.object.metadata.labels.otp }}"
57        operator: AnyIn
58        value: "{{ parse_yaml(otp.data.codes) }}"
59    mutate:
60      targets:
61        - apiVersion: v1
62          kind: ConfigMap
63          name: otp
64          namespace: platform
65          context:
66          - name: used
67            variable:
68              jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
69      patchStrategicMerge:
70        data:
71          codes: |-
72                        {{ used }}

Try it out with a Deployment which uses the code provided earlier.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: busybox
 5  namespace: default
 6  labels:
 7    app: busybox
 8    otp: 1t1h360g
 9spec:
10  replicas: 1
11  selector:
12    matchLabels:
13      app: busybox
14  template:
15    metadata:
16      labels:
17        app: busybox
18    spec:
19      hostIPC: true
20      containers:
21      - image: busybox:1.28
22        name: busybox
23        command: ["sleep", "9999"]

When a valid code is consumed, Kyverno will update the ConfigMap to transform this

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: otp
 5  namespace: platform
 6data:
 7  codes: |-
 8    - ua8v92pg
 9    - 9akvm2o7
10    - 1t1h360g    

into this

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: otp
 5  namespace: platform
 6data:
 7  codes: |-
 8    - ua8v92pg
 9    - 9akvm2o7
10    - 1t1h360g-2023-06-21T15:04:59Z-czoller    

Alright, let's try it out end-to-end and see this whole thing work!

Create a "bad" Deployment.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: busybox
 5  namespace: default
 6  labels:
 7    app: busybox
 8spec:
 9  replicas: 1
10  selector:
11    matchLabels:
12      app: busybox
13  template:
14    metadata:
15      labels:
16        app: busybox
17    spec:
18      hostIPC: true
19      containers:
20      - image: busybox:1.28
21        name: busybox
22        command: ["sleep", "9999"]
 1$ kubectl apply -f baddeploy.yaml 
 2Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 
 3
 4resource Deployment/default/busybox was blocked due to the following policies 
 5
 6disallow-host-namespaces-otp:
 7  host-namespaces-otp: 'validation error: Sharing the host namespaces is disallowed.
 8    The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set
 9    to `false`. To get around this, you may use a one-time pass code "uq1s17g8" assigned
10    as the value of a label with key "otp". Use of this code will be recorded along
11    with your username. rule host-namespaces-otp failed at path /spec/template/spec/hostIPC/'

Let's use the code uq1s17g8 just provided.

I'll take the same "bad" Deployment and add that as the value of a label called otp.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: busybox
 5  namespace: default
 6  labels:
 7    app: busybox
 8    otp: uq1s17g8
 9spec:
10  replicas: 1
11  selector:
12    matchLabels:
13      app: busybox
14  template:
15    metadata:
16      labels:
17        app: busybox
18    spec:
19      hostIPC: true
20      containers:
21      - image: busybox:1.28
22        name: busybox
23        command: ["sleep", "9999"]
1$ kubectl apply -f baddeploy.yaml 
2deployment.apps/busybox created

Let's ensure someone cannot use this same code a second time, so we'll delete the Deployment we just created.

1$ kubectl delete deploy busybox
2deployment.apps "busybox" deleted

And try to create the same exact Deployment once again.

1$ kubectl apply -f baddeploy.yaml 
2Error from server: error when creating "baddeploy.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 
3
4resource Deployment/default/busybox was blocked due to the following policies 
5
6disallow-host-namespaces-otp:
7  invalid-otp: The code uq1s17g8 is invalid or has already been used.

There you can see that the same code uq1s17g8 is now flagged as invalid since it was used once before.

As a privileged cluster admin, we can also check our otp ConfigMap and see who and when a code was used.

 1$ kubectl -n platform get cm otp -o yaml
 2apiVersion: v1
 3data:
 4  codes: |-
 5    - ua8v92pg
 6    - 9akvm2o7
 7    - 1t1h360g-2023-06-21T15:04:59Z-czoller
 8    - uq1s17g8-2023-06-21T15:10:18Z-jdoe
 9kind: ConfigMap
10metadata:
11  annotations:
12    policies.kyverno.io/last-applied-patches: |
13      manage-otp.manage-otp-list.kyverno.io: replaced /data/codes
14  creationTimestamp: "2023-06-20T13:01:27Z"
15  name: otp
16  namespace: platform
17  resourceVersion: "5147565"
18  uid: ed2cce4e-6cf4-4309-b2cc-a2c45493ef4e

And there you have it, your very own OTP system for Kyverno which is self-managed and allows for auditing.

Even though this concept probably isn't very practical to use in the real world, I had fun just experimenting with the idea to see if it was possible. Who knows, maybe some of you out there can even use this!