One-Time Pass Codes for Kyverno...With Quotas!
If you've spent any time reading my blogs before, it's probably painfully apparent that one of the things I genuinely love doing is tinkering around and finding out how to do fun, but practical, things with various technologies. I did a post earlier in June about how you can use Kyverno as a one-time pass code system and got to thinking it'd sure be cool if there was a way, conducive to how many enterprises operate, to restrict the number of codes that could be used. Similar to an all-you-can-eat buffet, you may not want developers and other cluster consumers to just use as many codes as they want to but impose some sort of quota system on them. In this post, I'm leveling up that article plus bringing in some concepts I wrote about in July to bring you Pass-Codes 2.0 or how to use one-time pass codes for Kyverno...but with monthly quotas!
If you haven't checked out that first article on using Kyverno as a one-time pass code system, I recommend you take a few minutes and do so now as I won't be covering all of that again. But the problem I realized when initially brainstorming that system, and also mentioned by some readers, is that it gives users free rein to use as many codes as they want. So long as the code is valid, they can bypass a policy, sort of like a revolving door. I knew I could solve that but realized in order to do so I'd need to draw on at least one other capability and that I covered in July with my article on performing scheduled mutations. So, with the groundwork laid, we've got all the pieces to put some gates around the pass code system. I'll show you how to tie both of these together to create a quota system allowing some of your users to be subject to it while giving you the freedom to allow others to still eat at the buffet if they like.
In pass-codes "1.0", a code is automatically generated any time a resource violates a specific Kyverno policy in Enforce
mode. Any validate policy can be subject to this pass code system. When the code is consumed, which can only be done once, it is invalidated, yet retained, on the list. Aside from the user who consumed the code and the time at which they did so, there was nothing stopping them from doing it again and again. One-time pass codes can be useful in a number of ways especially in "break glass" situations. But if you decide to employ them, you don't want the system abused for general use. If everyone gets unlimited pass codes then the policy is little more than a Cheeto for a deadbolt. This is where quotas come in. With this new-and-improved version, you define which user(s) are subject to a quota and how many codes they are able to consume. A separate policy then resets that quota based on a time period of your liking, perhaps monthly.
Layering in a quota system is accomplished by defining the quotas you want per user in the same otp
ConfigMap that I had in the platform
Namespace (you could, of course, use something else). These users are the names of the users as presented by whatever authentication mechanism you use so they should be exact. For example, let's say I had users chip
and mark
on whom I wanted to impose a quota system. The number (quoted, because values in a ConfigMap must be strings) is the number of OTPs they're allowed to consume. Note that these have to be valid codes, not just any code as that would defeat the system. But once a code is valid and successfully consumed, it will be deducted from their user quota as well as invalidated from the running list of generated OTPs under data.codes
. Once that number reaches 0
, as you guessed, they're done, even if they continue to supply valid codes.
This whole OTP ConfigMap could easily be broken out into two separate ones or managed as a custom resource of your own design. The route I took for these articles was to prove out the concept.
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: otp
5 namespace: platform
6data:
7 chip: "5"
8 mark: "2"
9 codes: |-
10 - 6o3ukw1q
11 - 20bao9hy
12 - k8j4j8zu
What I think is cool here is that we can use Kyverno again to manage this quota system on a schedule of your choosing which makes this entirely self driving. How is this done? Using scheduled mutations.
By using a separate "mutate existing" policy, we define what the starting numbers look like. So, perhaps like above, I want chip
and mark
to have 5 and 2 OTPs they can consume, respectively. Whenever this mutation policy fires, it'll effectively "reset" those numbers to whatever is defined in the policy. The last piece is the matter of timing. As I showed in my July article on scheduled mutations, we can define a schedule for this using a Kyverno cleanup policy. The cleanup policy is where you'll define the frequency of this quota reset. I'll show monthly here and in the policies you can use, but it can be whatever frequency you'd like.
Below are all the policies needed to bring this system together. These include modifications to existing rules as well as introduction of a couple new ones. Let me walk you through these as there are multiple rules involved.
I have three separate policies for organizational purposes. They probably could all be converged but it would make things more difficult to manage.
- (Validation) disallow-host-namespaces-otp: This policy performs all the validations of the rule of interest, the OTP checking, and the quota system. The rules here are specifically ordered and configured so that processing will stop once the first one applies.
- invalid-otp: validates the OTP code, when provided, is not invalid.
- exceeded-otp-quota: validates that a valid OTP code supplied by a user subject to a quota is not exceeded.
- host-namespaces-otp: the primary policy subject to the OTP system. It will be applied if the Deployment violates the pattern AND there is no OTP code specified or it's not found/invalid.
- (Mutation) manage-otp-list: This policy performs all the mutations on the OTP ConfigMap except the quota refresh. Since all mutations in Kubernetes come before any validations, it may appear that some portion of these rules don't make sense but they're necessary to prevent unintended mutations.
- add-otp: harvests the generated OTP from the violation Event (created as a result of host-namespaces-otp) and adds it to the ConfigMap storing all codes.
- consume-otp-noquota: marks a code as consumed when the user requesting it does not have a quota assigned.
- consume-otp-quota: marks a code as consumed when the user does have a quota assigned but only if the quota has greater than 0 remaining.
- (Mutation) quota-management: This policy, triggered by the cleanup policy, resets the quota system to the desired numbers.
- reset-monthly-quotas: observes the deletion caused by the cleanup policy and resets the quotas for the desired users to the new numbers.
One: Validate
1# ServiceAccounts are exempt from this. Policies are applied from top to bottom. Processing stops
2# once the first one applies so order matters here.
3apiVersion: kyverno.io/v2beta1
4kind: ClusterPolicy
5metadata:
6 name: disallow-host-namespaces-otp
7spec:
8 validationFailureAction: Enforce
9 background: false
10 applyRules: One
11 rules:
12 # Validate the OTP code, when provided, is not invalid.
13 - name: invalid-otp
14 match:
15 any:
16 - resources:
17 kinds:
18 - Deployment
19 operations:
20 - CREATE
21 selector:
22 matchLabels:
23 otp: "?*"
24 context:
25 - name: otp
26 configMap:
27 name: otp
28 namespace: platform
29 preconditions:
30 all:
31 - key: "{{ request.object.metadata.labels.otp }}"
32 operator: AnyNotIn
33 value: "{{ parse_yaml(otp.data.codes) }}"
34 validate:
35 message: The code {{ request.object.metadata.labels.otp }} is invalid or has already been used.
36 deny: {}
37 # Validate that a valid OTP code supplied by a user subject to a quota is not exceeded.
38 - name: exceeded-otp-quota
39 match:
40 any:
41 - resources:
42 kinds:
43 - Deployment
44 operations:
45 - CREATE
46 selector:
47 matchLabels:
48 otp: "?*"
49 context:
50 - name: otp
51 configMap:
52 name: otp
53 namespace: platform
54 - name: currentquota
55 variable:
56 jmesPath: otp.data.{{request.userInfo.username}}
57 default: ''
58 - name: keys
59 variable:
60 jmesPath: keys(otp.data)
61 default: ''
62 preconditions:
63 all:
64 - key: "{{ request.object.metadata.labels.otp }}"
65 operator: AnyIn
66 value: "{{ parse_yaml(otp.data.codes) }}"
67 - key: "{{ request.userInfo.username || '' }}"
68 operator: NotEquals
69 value: "system:serviceaccount:?*"
70 - key: "{{ request.userInfo.username || '' }}"
71 operator: AnyIn
72 value: "{{ keys }}"
73 validate:
74 message: >-
75 The quota for "{{ request.userInfo.username }}" has been exhausted.
76 Please contact a platform administrator to increase the quota or apply for a Policy Exception.
77 deny:
78 conditions:
79 all:
80 - key: "{{ to_number(currentquota) }}"
81 operator: Equals
82 value: 0
83 # The primary policy subject to the OTP system. It will be applied if the Deployment violates
84 # the pattern AND there is no OTP code specified or it's not found/invalid.
85 - name: host-namespaces-otp
86 match:
87 any:
88 - resources:
89 kinds:
90 - Deployment
91 operations:
92 - CREATE
93 context:
94 - name: otp
95 configMap:
96 name: otp
97 namespace: platform
98 preconditions:
99 all:
100 - key: "{{ request.object.metadata.labels.otp || '' }}"
101 operator: AnyNotIn
102 value: "{{ parse_yaml(otp.data.codes) }}"
103 - key: "{{ request.userInfo.username || '' }}"
104 operator: NotEquals
105 value: "system:serviceaccount:?*"
106 validate:
107 message: >-
108 Sharing the host namespaces is disallowed. The fields spec.hostNetwork,
109 spec.hostIPC, and spec.hostPID must be unset or set to `false`. To get around this,
110 you may use a one-time pass code "{{ random('[0-9a-z]{8}') }}" assigned as the value of
111 a label with key "otp". Use of this code will be recorded along with your username. You may
112 also be subject to a quota.
113 pattern:
114 spec:
115 template:
116 spec:
117 =(hostPID): false
118 =(hostIPC): false
119 =(hostNetwork): false
Two: Mutate
1apiVersion: kyverno.io/v2beta1
2kind: ClusterPolicy
3metadata:
4 name: manage-otp-list
5spec:
6 rules:
7 # Harvest the generated OTP from the violation Event and add it to the ConfigMap storing all codes.
8 - name: add-otp
9 match:
10 any:
11 - resources:
12 kinds:
13 - v1/Event # May require removing resource filter. May result in more processing by admission controller.
14 names:
15 - "disallow-host-namespaces-otp.?*"
16 preconditions:
17 all:
18 - key: "{{ request.object.reason }}"
19 operator: Equals
20 value: PolicyViolation
21 - key: "{{ contains(request.object.message, 'one-time pass code') }}"
22 operator: Equals
23 value: true
24 context:
25 - name: otp
26 variable:
27 jmesPath: split(request.object.message,'"') | [1]
28 mutate:
29 targets:
30 - apiVersion: v1
31 kind: ConfigMap
32 name: otp
33 namespace: platform
34 patchStrategicMerge:
35 data:
36 codes: |-
37 {{ @ }}
38 - {{ otp }}
39 # Mark a code as consumed when the user requesting it does not have a quota assigned.
40 - name: consume-otp-noquota
41 match:
42 any:
43 - resources:
44 kinds:
45 - Deployment
46 operations:
47 - CREATE
48 selector:
49 matchLabels:
50 otp: "?*"
51 context:
52 - name: otp
53 configMap:
54 name: otp
55 namespace: platform
56 - name: keys
57 variable:
58 jmesPath: keys(otp.data)
59 default: ''
60 preconditions:
61 all:
62 - key: "{{ request.object.metadata.labels.otp }}"
63 operator: AnyIn
64 value: "{{ parse_yaml(otp.data.codes) }}"
65 - key: "{{ request.userInfo.username || '' }}"
66 operator: NotEquals
67 value: "system:serviceaccount:?*"
68 - key: "{{ request.userInfo.username || '' }}"
69 operator: AnyNotIn
70 value: "{{ keys }}"
71 mutate:
72 targets:
73 - apiVersion: v1
74 kind: ConfigMap
75 name: otp
76 namespace: platform
77 context:
78 - name: used
79 variable:
80 jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
81 patchStrategicMerge:
82 data:
83 codes: |-
84 {{ used }}
85 # Mark a code as consumed when the user does have a quota assigned but only if the quota has greater than 0 remaining.
86 - name: consume-otp-quota
87 match:
88 any:
89 - resources:
90 kinds:
91 - Deployment
92 operations:
93 - CREATE
94 selector:
95 matchLabels:
96 otp: "?*"
97 context:
98 - name: otp
99 configMap:
100 name: otp
101 namespace: platform
102 preconditions:
103 all:
104 - key: "{{ request.object.metadata.labels.otp }}"
105 operator: AnyIn
106 value: "{{ parse_yaml(otp.data.codes) }}"
107 - key: "{{ request.userInfo.username || '' }}"
108 operator: NotEquals
109 value: "system:serviceaccount:?*"
110 mutate:
111 targets:
112 - apiVersion: v1
113 kind: ConfigMap
114 name: otp
115 namespace: platform
116 context:
117 - name: used
118 variable:
119 jmesPath: replace_all(target.data.codes,'{{request.object.metadata.labels.otp}}','{{request.object.metadata.labels.otp}}-{{time_now_utc()}}-{{request.userInfo.username}}')
120 - name: currentquota
121 variable:
122 jmesPath: target.data.{{request.userInfo.username}}
123 preconditions:
124 all:
125 - key: "{{ to_number(currentquota) }}"
126 operator: GreaterThan
127 value: 0
128 patchStrategicMerge:
129 data:
130 codes: |-
131 {{ used }}
132 "{{request.userInfo.username}}": "{{ subtract((to_number(currentquota)),`1`) }}"
Three: Quota
1apiVersion: kyverno.io/v2beta1
2kind: ClusterPolicy
3metadata:
4 name: quota-management
5spec:
6 rules:
7 - name: reset-monthly-quotas
8 match:
9 any:
10 - resources:
11 kinds:
12 - Pod
13 names:
14 - cleanmeup*
15 namespaces:
16 - platform
17 operations:
18 - DELETE
19 mutate:
20 targets:
21 - apiVersion: v1
22 kind: ConfigMap
23 name: otp
24 namespace: platform
25 patchStrategicMerge:
26 data:
27 chip: "5"
28 mark: "2"
You'll also need permissions so the cleanup controller can delete the Pods needed to kick off the reset system.
1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4 labels:
5 app.kubernetes.io/component: cleanup-controller
6 app.kubernetes.io/instance: kyverno
7 app.kubernetes.io/part-of: kyverno
8 name: kyverno:cleanup-pods
9rules:
10- apiGroups:
11 - ""
12 resources:
13 - pods
14 verbs:
15 - get
16 - list
17 - delete
And the cleanup policy which is the schedule for the quota system reset.
1apiVersion: kyverno.io/v2alpha1
2kind: CleanupPolicy
3metadata:
4 name: trigger-reset-monthly-quotas
5 namespace: platform
6spec:
7 match:
8 any:
9 - resources:
10 kinds:
11 - Pod
12 names:
13 - cleanmeup*
14 selector:
15 matchLabels:
16 purpose: deleteme
17 schedule: "0 0 1 * *"
This is probably easier viewed than read, so check out a demo recording of this whole flow on YouTube.
And that's basically a wrap. This blog post ties together a couple previous posts and upgrades them with some new features, which was a fun project to undertake. On a personal note, I also just realized this is my 20th article covering Kyverno! I've had a blast working with this technology and time surely has flown. I've got some personal news updates to share which will change things a bit, but I'll get to those soon.
Thanks for reading and drop me a line if you thought this was of interest.