One-Time Pass Codes for Kyverno...With Quotas!

Aug 3, 2023 · 12 min read · kyverno k8s ·

If you've spent any time reading my blogs before, it's probably painfully apparent that one of the things I genuinely love doing is tinkering around and finding out how to do fun, but practical, things with various technologies. I did a post earlier in June about how you can use Kyverno as a one-time pass code system and got to thinking it'd sure be cool if there was a way, conducive to how many enterprises operate, to restrict the number of codes that could be used. Similar to an all-you-can-eat buffet, you may not want developers and other cluster consumers to just use as many codes as they want to but impose some sort of quota system on them. In this post, I'm leveling up that article plus bringing in some concepts I wrote about in July to bring you Pass-Codes 2.0 or how to use one-time pass codes for Kyverno...but with monthly quotas!

If you haven't checked out that first article on using Kyverno as a one-time pass code system, I recommend you take a few minutes and do so now as I won't be covering all of that again. But the problem I realized when initially brainstorming that system, and also mentioned by some readers, is that it gives users free rein to use as many codes as they want. So long as the code is valid, they can bypass a policy, sort of like a revolving door. I knew I could solve that but realized in order to do so I'd need to draw on at least one other capability and that I covered in July with my article on performing scheduled mutations. So, with the groundwork laid, we've got all the pieces to put some gates around the pass code system. I'll show you how to tie both of these together to create a quota system allowing some of your users to be subject to it while giving you the freedom to allow others to still eat at the buffet if they like.

In pass-codes "1.0", a code is automatically generated any time a resource violates a specific Kyverno policy in Enforce mode. Any validate policy can be subject to this pass code system. When the code is consumed, which can only be done once, it is invalidated, yet retained, on the list. Aside from the user who consumed the code and the time at which they did so, there was nothing stopping them from doing it again and again. One-time pass codes can be useful in a number of ways especially in "break glass" situations. But if you decide to employ them, you don't want the system abused for general use. If everyone gets unlimited pass codes then the policy is little more than a Cheeto for a deadbolt. This is where quotas come in. With this new-and-improved version, you define which user(s) are subject to a quota and how many codes they are able to consume. A separate policy then resets that quota based on a time period of your liking, perhaps monthly.

Layering in a quota system is accomplished by defining the quotas you want per user in the same otp ConfigMap that I had in the platform Namespace (you could, of course, use something else). These users are the names of the users as presented by whatever authentication mechanism you use so they should be exact. For example, let's say I had users chip and mark on whom I wanted to impose a quota system. The number (quoted, because values in a ConfigMap must be strings) is the number of OTPs they're allowed to consume. Note that these have to be valid codes, not just any code as that would defeat the system. But once a code is valid and successfully consumed, it will be deducted from their user quota as well as invalidated from the running list of generated OTPs under data.codes. Once that number reaches 0, as you guessed, they're done, even if they continue to supply valid codes.

This whole OTP ConfigMap could easily be broken out into two separate ones or managed as a custom resource of your own design. The route I took for these articles was to prove out the concept.

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: otp
 5  namespace: platform
 6data:
 7  chip: "5"
 8  mark: "2"
 9  codes: |-
10    - 6o3ukw1q
11    - 20bao9hy
12    - k8j4j8zu

What I think is cool here is that we can use Kyverno again to manage this quota system on a schedule of your choosing which makes this entirely self driving. How is this done? Using scheduled mutations.

By using a separate "mutate existing" policy, we define what the starting numbers look like. So, perhaps like above, I want chip and mark to have 5 and 2 OTPs they can consume, respectively. Whenever this mutation policy fires, it'll effectively "reset" those numbers to whatever is defined in the policy. The last piece is the matter of timing. As I showed in my July article on scheduled mutations, we can define a schedule for this using a Kyverno cleanup policy. The cleanup policy is where you'll define the frequency of this quota reset. I'll show monthly here and in the policies you can use, but it can be whatever frequency you'd like.

Below are all the policies needed to bring this system together. These include modifications to existing rules as well as introduction of a couple new ones. Let me walk you through these as there are multiple rules involved.

I have three separate policies for organizational purposes. They probably could all be converged but it would make things more difficult to manage.

(Validation) disallow-host-namespaces-otp: This policy performs all the validations of the rule of interest, the OTP checking, and the quota system. The rules here are specifically ordered and configured so that processing will stop once the first one applies.
- invalid-otp: validates the OTP code, when provided, is not invalid.
- exceeded-otp-quota: validates that a valid OTP code supplied by a user subject to a quota is not exceeded.
- host-namespaces-otp: the primary policy subject to the OTP system. It will be applied if the Deployment violates the pattern AND there is no OTP code specified or it's not found/invalid.
(Mutation) manage-otp-list: This policy performs all the mutations on the OTP ConfigMap except the quota refresh. Since all mutations in Kubernetes come before any validations, it may appear that some portion of these rules don't make sense but they're necessary to prevent unintended mutations.
- add-otp: harvests the generated OTP from the violation Event (created as a result of host-namespaces-otp) and adds it to the ConfigMap storing all codes.
- consume-otp-noquota: marks a code as consumed when the user requesting it does not have a quota assigned.
- consume-otp-quota: marks a code as consumed when the user does have a quota assigned but only if the quota has greater than 0 remaining.
(Mutation) quota-management: This policy, triggered by the cleanup policy, resets the quota system to the desired numbers.
- reset-monthly-quotas: observes the deletion caused by the cleanup policy and resets the quotas for the desired users to the new numbers.

One: Validate

  1# ServiceAccounts are exempt from this. Policies are applied from top to bottom. Processing stops
  2# once the first one applies so order matters here.
  3apiVersion: kyverno.io/v2beta1
  4kind: ClusterPolicy
  5metadata:
  6  name: disallow-host-namespaces-otp
  7spec:
  8  validationFailureAction: Enforce
  9  background: false
 10  applyRules: One
 11  rules:
 12    # Validate the OTP code, when provided, is not invalid.
 13    - name: invalid-otp
 14      match:
 15        any:
 16        - resources:
 17            kinds:
 18              - Deployment
 19            operations:
 20              - CREATE
 21            selector:
 22              matchLabels:
 23                otp: "?*"
 24      context:
 25      - name: otp
 26        configMap:
 27          name: otp
 28          namespace: platform
 29      preconditions:
 30        all:
 31        - key: "{{ request.object.metadata.labels.otp }}"
 32          operator: AnyNotIn
 33          value: "{{ parse_yaml(otp.data.codes) }}"
 34      validate:
 35        message: The code {{ request.object.metadata.labels.otp }} is invalid or has already been used.
 36        deny: {}
 37    # Validate that a valid OTP code supplied by a user subject to a quota is not exceeded.
 38    - name: exceeded-otp-quota
 39      match:
 40        any:
 41        - resources:
 42            kinds:
 43              - Deployment
 44            operations:
 45              - CREATE
 46            selector:
 47              matchLabels:
 48                otp: "?*"
 49      context:
 50      - name: otp
 51        configMap:
 52          name: otp
 53          namespace: platform
 54      - name: currentquota
 55        variable:
 56          jmesPath: otp.data.{{request.userInfo.username}}
 57          default: ''
 58      - name: keys
 59        variable:
 60          jmesPath: keys(otp.data)
 61          default: ''      
 62      preconditions:
 63        all:
 64        - key: "{{ request.object.metadata.labels.otp }}"
 65          operator: AnyIn
 66          value: "{{ parse_yaml(otp.data.codes) }}"
 67        - key: "{{ request.userInfo.username || '' }}"
 68          operator: NotEquals
 69          value: "system:serviceaccount:?*"
 70        - key: "{{ request.userInfo.username || '' }}"
 71          operator: AnyIn
 72          value: "{{ keys }}"
 73      validate:
 74        message: >-
 75          The quota for "{{ request.userInfo.username }}" has been exhausted.
 76          Please contact a platform administrator to increase the quota or apply for a Policy Exception.
 77        deny:
 78          conditions:
 79            all:
 80            - key: "{{ to_number(currentquota) }}"
 81              operator: Equals
 82              value: 0
 83    # The primary policy subject to the OTP system. It will be applied if the Deployment violates
 84    # the pattern AND there is no OTP code specified or it's not found/invalid.
 85    - name: host-namespaces-otp
 86      match:
 87        any:
 88        - resources:
 89            kinds:
 90              - Deployment
 91            operations:
 92              - CREATE
 93      context:
 94      - name: otp
 95        configMap:
 96          name: otp
 97          namespace: platform
 98      preconditions:
 99        all:
100        - key: "{{ request.object.metadata.labels.otp || '' }}"
101          operator: AnyNotIn
102          value: "{{ parse_yaml(otp.data.codes) }}"
103        - key: "{{ request.userInfo.username || '' }}"
104          operator: NotEquals
105          value: "system:serviceaccount:?*"
106      validate:
107        message: >-
108          Sharing the host namespaces is disallowed. The fields spec.hostNetwork,
109          spec.hostIPC, and spec.hostPID must be unset or set to `false`. To get around this,
110          you may use a one-time pass code "{{ random('[0-9a-z]{8}') }}" assigned as the value of
111          a label with key "otp". Use of this code will be recorded along with your username. You may
112          also be subject to a quota.
113        pattern:
114          spec:
115            template:
116              spec:
117                =(hostPID): false
118                =(hostIPC): false
119                =(hostNetwork): false

Two: Mutate

  1apiVersion: kyverno.io/v2beta1
  2kind: ClusterPolicy
  3metadata:
  4  name: manage-otp-list
  5spec:
  6  rules:
  7  # Harvest the generated OTP from the violation Event and add it to the ConfigMap storing all codes.
  8  - name: add-otp
  9    match:
 10      any:
 11      - resources:
 12          kinds:
 13            - v1/Event # May require removing resource filter. May result in more processing by admission controller.
 14          names:
 15            - "disallow-host-namespaces-otp.?*"
 16    preconditions:
 17      all:
 18      - key: "{{ request.object.reason }}"
 19        operator: Equals
 20        value: PolicyViolation
 21      - key: "{{ contains(request.object.message, 'one-time pass code') }}"
 22        operator: Equals
 23        value: true
 24    context:
 25    - name: otp
 26      variable:
 27        jmesPath: split(request.object.message,'"') | [1]
 28    mutate:
 29      targets:
 30        - apiVersion: v1
 31          kind: ConfigMap
 32          name: otp
 33          namespace: platform
 34      patchStrategicMerge:
 35        data:
 36          codes: |-
 37            {{ @ }}
 38            - {{ otp }}
 39  # Mark a code as consumed when the user requesting it does not have a quota assigned.
 40  - name: consume-otp-noquota
 41    match:
 42      any:
 43      - resources:
 44          kinds:
 45            - Deployment
 46          operations:
 47            - CREATE
 48          selector:
 49            matchLabels:
 50              otp: "?*"
 51    context:
 52    - name: otp
 53      configMap:
 54        name: otp
 55        namespace: platform
 56    - name: keys
 57      variable:
 58        jmesPath: keys(otp.data)
 59        default: ''      
 60    preconditions:
 61      all:
 62      - key: "{{ request.object.metadata.labels.otp }}"
 63        operator: AnyIn
 64        value: "{{ parse_yaml(otp.data.codes) }}"
 65      - key: "{{ request.userInfo.username || '' }}"
 66        operator: NotEquals
 67        value: "system:serviceaccount:?*"
 68      - key: "{{ request.userInfo.username || '' }}"
 69        operator: AnyNotIn
 70        value: "{{ keys }}"
 71    mutate:
 72      targets:
 73        - apiVersion: v1
 74          kind: ConfigMap
 75          name: otp
 76          namespace: platform
 77          context:
 78          - name: used
 79            variable:
 80              jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
 81      patchStrategicMerge:
 82        data:
 83          codes: |-
 84            {{ used }}
 85  # Mark a code as consumed when the user does have a quota assigned but only if the quota has greater than 0 remaining.
 86  - name: consume-otp-quota
 87    match:
 88      any:
 89      - resources:
 90          kinds:
 91            - Deployment
 92          operations:
 93            - CREATE
 94          selector:
 95            matchLabels:
 96              otp: "?*"
 97    context:
 98    - name: otp
 99      configMap:
100        name: otp
101        namespace: platform
102    preconditions:
103      all:
104      - key: "{{ request.object.metadata.labels.otp }}"
105        operator: AnyIn
106        value: "{{ parse_yaml(otp.data.codes) }}"
107      - key: "{{ request.userInfo.username || '' }}"
108        operator: NotEquals
109        value: "system:serviceaccount:?*"
110    mutate:
111      targets:
112        - apiVersion: v1
113          kind: ConfigMap
114          name: otp
115          namespace: platform
116          context:
117          - name: used
118            variable:
119              jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
120          - name: currentquota
121            variable:
122              jmesPath: target.data.{{request.userInfo.username}}
123          preconditions:
124            all:
125            - key: "{{ to_number(currentquota) }}"
126              operator: GreaterThan
127              value: 0
128      patchStrategicMerge:
129        data:
130          codes: |-
131            {{ used }}
132          "{{request.userInfo.username}}": "{{ subtract((to_number(currentquota)),`1`) }}"

Three: Quota

 1apiVersion: kyverno.io/v2beta1
 2kind: ClusterPolicy
 3metadata:
 4  name: quota-management
 5spec:
 6  rules:
 7  - name: reset-monthly-quotas
 8    match:
 9      any:
10      - resources:
11          kinds:
12            - Pod
13          names:
14            - cleanmeup*
15          namespaces:
16            - platform
17          operations:
18            - DELETE
19    mutate:
20      targets:
21        - apiVersion: v1
22          kind: ConfigMap
23          name: otp
24          namespace: platform
25      patchStrategicMerge:
26        data:
27          chip: "5"
28          mark: "2"

You'll also need permissions so the cleanup controller can delete the Pods needed to kick off the reset system.

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 3metadata:
 4  labels:
 5    app.kubernetes.io/component: cleanup-controller
 6    app.kubernetes.io/instance: kyverno
 7    app.kubernetes.io/part-of: kyverno
 8  name: kyverno:cleanup-pods
 9rules:
10- apiGroups:
11  - ""
12  resources:
13  - pods
14  verbs:
15  - get
16  - list
17  - delete

And the cleanup policy which is the schedule for the quota system reset.

 1apiVersion: kyverno.io/v2alpha1
 2kind: CleanupPolicy
 3metadata:
 4  name: trigger-reset-monthly-quotas
 5  namespace: platform
 6spec:
 7  match:
 8    any:
 9    - resources:
10        kinds:
11          - Pod
12        names:
13          - cleanmeup*
14        selector:
15          matchLabels:
16            purpose: deleteme
17  schedule: "0 0 1 * *"

This is probably easier viewed than read, so check out a demo recording of this whole flow on YouTube.

And that's basically a wrap. This blog post ties together a couple previous posts and upgrades them with some new features, which was a fun project to undertake. On a personal note, I also just realized this is my 20th article covering Kyverno! I've had a blast working with this technology and time surely has flown. I've got some personal news updates to share which will change things a bit, but I'll get to those soon.

Thanks for reading and drop me a line if you thought this was of interest.