Exploring Kyverno: Part 1, Validation
Welcome to the first part of my "Exploring Kyverno" blog series. If you haven't read the introduction, I encourage you to do so first. This series is a multi-part exploration of the open-source, Kubernetes-native policy engine called Kyverno. In this article, I'll be covering the first major capability offered by Kyverno: validations.
By far, the biggest use case for admission controllers such as Kyverno is the ability to check incoming requests and validate that they conform to approved standards. These standards can be well-known and established standards like many best practices, or they can be something totally custom which is important to you and your way of operating. Regardless of the motivation, Kyverno's first and most popular ability is to validate requests.
Before getting into that, let's quickly get Kyverno installed. It's trivial to install and also extremely quick, so follow the quick start guide and choose your method. I'll just opt for this one liner which will give you the bleeding edge release (which may be a pre-release):
1kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/main/definitions/release/install.yaml
Once installed, Kyverno will register itself as mutating and validating webhooks. If you'd like to see what those look like, there's a very cool plug-in for krew
released here which will visualize those for you in a terminal. It's installed simply with kubectl krew install view-webhook
.
1+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+
2| KIND | NAME | WEBHOOK | SERVICE | RESOURCES&OPERATIONS | REMAINING DAY | ACTIVE NS |
3+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+
4| Mutating | kyverno-policy-mutating-webhook-cfg | nirmata.kyverno.policy-mutating-webhook | └─┬kyverno-svc | ├──clusterpolicies/* | 9 years | ✖ No Active Namespaces |
5| | | | ├──NS : kyverno | └─┬policies/* | | |
6| | | | ├──Path: /policymutate | ├──+CREATE | | |
7| | | | └─┬IP : 100.70.92.19 (ClusterIP) | └──^UPDATE | | |
8| | | | └──443/TCP | | | |
9+ +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+ + +
10| | kyverno-resource-mutating-webhook-cfg | nirmata.kyverno.resource.mutating-webhook | └─┬kyverno-svc | └─┬*/* | | |
11| | | | ├──NS : kyverno | ├──+CREATE | | |
12| | | | ├──Path: /mutate | └──^UPDATE | | |
13| | | | └─┬IP : 100.70.92.19 (ClusterIP) | | | |
14| | | | └──443/TCP | | | |
15+ +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+ + +
16| | kyverno-verify-mutating-webhook-cfg | nirmata.kyverno.verify-mutating-webhook | └─┬kyverno-svc | └─┬deployments/* | | |
17| | | | ├──NS : kyverno | └──^UPDATE | | |
18| | | | ├──Path: /verifymutate | | | |
19| | | | └─┬IP : 100.70.92.19 (ClusterIP) | | | |
20| | | | └──443/TCP | | | |
21+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+ + +
22| Validating | kyverno-policy-validating-webhook-cfg | nirmata.kyverno.policy-validating-webhook | └─┬kyverno-svc | ├──clusterpolicies/* | | |
23| | | | ├──NS : kyverno | └─┬policies/* | | |
24| | | | ├──Path: /policyvalidate | └──^UPDATE | | |
25| | | | └─┬IP : 100.70.92.19 (ClusterIP) | | | |
26| | | | └──443/TCP | | | |
27+ +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+ + +
28| | kyverno-resource-validating-webhook-cfg | nirmata.kyverno.resource.validating-webhook | └─┬kyverno-svc | └─┬*/* | | |
29| | | | ├──NS : kyverno | └──-DELETE | | |
30| | | | ├──Path: /validate | | | |
31| | | | └─┬IP : 100.70.92.19 (ClusterIP) | | | |
32| | | | └──443/TCP | | | |
33+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+
Let's now look at some of Kyverno's validate functionality.
Kyverno's policies are very orderly and simply laid out. A policy can either be scoped to the entire cluster (a ClusterPolicy) or scoped to a namespace (a Policy). Each policy contains one or more rules. And each rule contains a match
statement which is the filter which selects to which resources the rule will apply, and an action statement which can be either validate
, mutate
, or generate
. I'm just covering the validate
one in this article; the other two I'll look at separately in forthcoming articles.
In the match
statement, Kyverno can check basically any type of resource in Kubernetes, including subjects, roles, and even custom resources. For more fine-grained control, you can add an exclude
statement that filters matches out further. You can combine various clauses in both match
and exclude
to get down to very fine levels if that's what you need. There's more, of course, but that's the gist.
Enough of that. Let's look at some concrete examples.
1apiVersion: kyverno.io/v1
2kind: ClusterPolicy
3metadata:
4 name: require-labels
5spec:
6 validationFailureAction: enforce
7 rules:
8 - name: check-for-labels
9 match:
10 resources:
11 kinds:
12 - Pod
13 validate:
14 message: "The label `app.kubernetes.io/name` is required."
15 pattern:
16 metadata:
17 labels:
18 app.kubernetes.io/name: "?*"
Above is a ClusterPolicy, which means it works across the entire cluster. It contains a single rule
whose job is to check all incoming Pods to ensure they have a label called app.kubernetes.io/name
, which is a fairly common label. So in our match
statement, we're telling Kyverno we want to match on a resource
and that resource should be a Pod
. Although we could tell it more specific details of what to include or exclude, let's just let this be.
Our action here is to validate
so that's also in the rule. The pattern
is what we're looking to validate, which is the label. We don't care the value of that label, only that a value exists. Kyverno can use wildcards, so this statement is just saying "ensure there is some value". And the message
is what gets displayed if a request is invalid. Finally, there's a validationFailureAction
field which tells Kyverno how to respond. It's set to enforce
meaning it will immediately block the request and the user will get the message specified. Kyverno can also give you a report of what things are invalid but still allow their creation. By setting this field to audit
instead, Kyverno will generate a PolicyReport object (another standard K8s resource) which is essentially a log book of resources which passed or failed validation. This can be extremely handy when evaluating new policies in your environment to assess what's going to happen rather than just throwing a switch (like a PodSecurityPolicy) and figuring out what broke.
So with this ClusterPolicy saved into a manifest, apply it with kubectl
. Now let's test it by creating a Deployment containing a single Pod which doesn't have this label specified.
1kubectl create deployment nginx --image=nginx
We're just creating your standard nginx
container wrapped in a Deployment. But since Deployments are just wrappers for ReplicaSets which are wrappers for Pods, and Kyverno was told to match on Pods (regardless of the source controller), Kyverno is going to see this and block it.
1Error from server: admission webhook "nirmata.kyverno.resource.validating-webhook" denied the request:
2
3resource Deployment/default/nginx was blocked due to the following policies
4
5require-labels:
6 autogen-check-for-labels: 'Validation error: label `app.kubernetes.io/name` is required;
7 Validation rule autogen-check-for-labels failed at path /spec/template/metadata/labels/app.kubernetes.io/name/'
The response above shows the name of the policy and the name of the rule which blocked the creation of the Deployment, useful when backtracking to figure out how that behavior came to be.
Now, create a Pod directly that does specify the label (could also be a Deployment).
1$ kubectl run nginx --image nginx --labels app.kubernetes.io/name=nginx
2pod/nginx created
Now Kyverno permits its creation, because it is deemed valid according to the policy.
As you can see, not only is the policy structure and language in Kyverno very straightforward and logical, you just define your intent as you're used to doing with Kubernetes. And you don't need to program it to do so.
"So what," you say, "why is that valuable?" Well, let's look at the alternative via a different example.
Another one of the common best practices in Kubernetes land is to ensure that all Pods have resource requests and limits defined. Having these values defined avoids a whole host of problems that can crop up including capacity issues and even whole cluster crashes. Let's say you wanted to impose a restriction that all Pods in your namespace called "production" must have limits defined and yet none can use more than 200m of CPU and 1 gig of memory. Resource Quotas, while useful, aren't going to help you here because they work at an aggregate level, and so normally you'd be on your own.
In Kyverno, no problem, super simple to define. It looks something like this.
1apiVersion: kyverno.io/v1
2kind: Policy
3metadata:
4 name: require-pod-requests-and-limits
5 namespace: production
6spec:
7 validationFailureAction: enforce
8 rules:
9 - name: validate-resource-limits
10 match:
11 resources:
12 kinds:
13 - Pod
14 validate:
15 message: "Pod limits are required in this namespace with a max of 200m of CPU and 1Gi of memory."
16 pattern:
17 spec:
18 containers:
19 - resources:
20 limits:
21 memory: "<=1Gi"
22 cpu: "<=200m"
Although that Policy manifest is just over 20 lines, the real "work" is only about 2.
How about doing this same thing in OPA/Gatekeeper?
1rego: |
2 package k8scontainerlimits
3
4 missing(obj, field) = true {
5 not obj[field]
6 }
7
8 missing(obj, field) = true {
9 obj[field] == ""
10 }
11
12 canonify_cpu(orig) = new {
13 is_number(orig)
14 new := orig * 1000
15 }
16
17 canonify_cpu(orig) = new {
18 not is_number(orig)
19 endswith(orig, "m")
20 new := to_number(replace(orig, "m", ""))
21 }
22
23 canonify_cpu(orig) = new {
24 not is_number(orig)
25 not endswith(orig, "m")
26 re_match("^[0-9]+$", orig)
27 new := to_number(orig) * 1000
28 }
29
30 # 10 ** 21
31 mem_multiple("E") = 1000000000000000000000 { true }
32
33 # 10 ** 18
34 mem_multiple("P") = 1000000000000000000 { true }
35
36 # 10 ** 15
37 mem_multiple("T") = 1000000000000000 { true }
38
39 # 10 ** 12
40 mem_multiple("G") = 1000000000000 { true }
41
42 # 10 ** 9
43 mem_multiple("M") = 1000000000 { true }
44
45 # 10 ** 6
46 mem_multiple("k") = 1000000 { true }
47
48 # 10 ** 3
49 mem_multiple("") = 1000 { true }
50
51 # Kubernetes accepts millibyte precision when it probably shouldn't.
52 # https://github.com/kubernetes/kubernetes/issues/28741
53 # 10 ** 0
54 mem_multiple("m") = 1 { true }
55
56 # 1000 * 2 ** 10
57 mem_multiple("Ki") = 1024000 { true }
58
59 # 1000 * 2 ** 20
60 mem_multiple("Mi") = 1048576000 { true }
61
62 # 1000 * 2 ** 30
63 mem_multiple("Gi") = 1073741824000 { true }
64
65 # 1000 * 2 ** 40
66 mem_multiple("Ti") = 1099511627776000 { true }
67
68 # 1000 * 2 ** 50
69 mem_multiple("Pi") = 1125899906842624000 { true }
70
71 # 1000 * 2 ** 60
72 mem_multiple("Ei") = 1152921504606846976000 { true }
73
74 get_suffix(mem) = suffix {
75 not is_string(mem)
76 suffix := ""
77 }
78
79 get_suffix(mem) = suffix {
80 is_string(mem)
81 count(mem) > 0
82 suffix := substring(mem, count(mem) - 1, -1)
83 mem_multiple(suffix)
84 }
85
86 get_suffix(mem) = suffix {
87 is_string(mem)
88 count(mem) > 1
89 suffix := substring(mem, count(mem) - 2, -1)
90 mem_multiple(suffix)
91 }
92
93 get_suffix(mem) = suffix {
94 is_string(mem)
95 count(mem) > 1
96 not mem_multiple(substring(mem, count(mem) - 1, -1))
97 not mem_multiple(substring(mem, count(mem) - 2, -1))
98 suffix := ""
99 }
100
101 get_suffix(mem) = suffix {
102 is_string(mem)
103 count(mem) == 1
104 not mem_multiple(substring(mem, count(mem) - 1, -1))
105 suffix := ""
106 }
107
108 get_suffix(mem) = suffix {
109 is_string(mem)
110 count(mem) == 0
111 suffix := ""
112 }
113
114 canonify_mem(orig) = new {
115 is_number(orig)
116 new := orig * 1000
117 }
118
119 canonify_mem(orig) = new {
120 not is_number(orig)
121 suffix := get_suffix(orig)
122 raw := replace(orig, suffix, "")
123 re_match("^[0-9]+$", raw)
124 new := to_number(raw) * mem_multiple(suffix)
125 }
126
127 violation[{"msg": msg}] {
128 general_violation[{"msg": msg, "field": "containers"}]
129 }
130
131 violation[{"msg": msg}] {
132 general_violation[{"msg": msg, "field": "initContainers"}]
133 }
134
135 general_violation[{"msg": msg, "field": field}] {
136 container := input.review.object.spec[field][_]
137 cpu_orig := container.resources.limits.cpu
138 not canonify_cpu(cpu_orig)
139 msg := sprintf("container <%v> cpu limit <%v> could not be parsed", [container.name, cpu_orig])
140 }
141
142 general_violation[{"msg": msg, "field": field}] {
143 container := input.review.object.spec[field][_]
144 mem_orig := container.resources.limits.memory
145 not canonify_mem(mem_orig)
146 msg := sprintf("container <%v> memory limit <%v> could not be parsed", [container.name, mem_orig])
147 }
148
149 general_violation[{"msg": msg, "field": field}] {
150 container := input.review.object.spec[field][_]
151 not container.resources
152 msg := sprintf("container <%v> has no resource limits", [container.name])
153 }
154
155 general_violation[{"msg": msg, "field": field}] {
156 container := input.review.object.spec[field][_]
157 not container.resources.limits
158 msg := sprintf("container <%v> has no resource limits", [container.name])
159 }
160
161 general_violation[{"msg": msg, "field": field}] {
162 container := input.review.object.spec[field][_]
163 missing(container.resources.limits, "cpu")
164 msg := sprintf("container <%v> has no cpu limit", [container.name])
165 }
166
167 general_violation[{"msg": msg, "field": field}] {
168 container := input.review.object.spec[field][_]
169 missing(container.resources.limits, "memory")
170 msg := sprintf("container <%v> has no memory limit", [container.name])
171 }
172
173 general_violation[{"msg": msg, "field": field}] {
174 container := input.review.object.spec[field][_]
175 cpu_orig := container.resources.limits.cpu
176 cpu := canonify_cpu(cpu_orig)
177 max_cpu_orig := input.parameters.cpu
178 max_cpu := canonify_cpu(max_cpu_orig)
179 cpu > max_cpu
180 msg := sprintf("container <%v> cpu limit <%v> is higher than the maximum allowed of <%v>", [container.name, cpu_orig, max_cpu_orig])
181 }
182
183 general_violation[{"msg": msg, "field": field}] {
184 container := input.review.object.spec[field][_]
185 mem_orig := container.resources.limits.memory
186 mem := canonify_mem(mem_orig)
187 max_mem_orig := input.parameters.memory
188 max_mem := canonify_mem(max_mem_orig)
189 mem > max_mem
190 msg := sprintf("container <%v> memory limit <%v> is higher than the maximum allowed of <%v>", [container.name, mem_orig, max_mem_orig])
191 }
That's almost 200 LINES of boilerplate code you are required to write (or own, as it were) in order to teach Gatekeeper, using the language of Rego, what these units of measurement mean and how to convert between them. All for the sake of checking if two simple fields are within an acceptable limit.
In actuality, a fully-functional policy is more than these ~200 lines. I've omitted the rest of the ConstraintTemplate and left out entirely the Constraint. Source.
The point here is that because Kyverno is Kubernetes native, it owns all the logic allowing you to keep your rules simple, focus on delivering value quickly, and eliminating as much cognitive and technical debt as possible. In OPA/Gatekeeper, because it has no native awareness of Kubernetes, you have to program it yourself to do anything and everything you need. Which would you rather write?
There are many other features of validate
rules including things like variables and conditionals. But one feature that's particularly useful when doing validations is to validate or block specific actions against resources. In this regard, Kyverno can actually augment and improve Kubernetes' native RBAC with abilities that you couldn't otherwise get.
For example, say you had a Kubernetes cluster where you had RBAC defined at the namespace level and several different Roles defined, including the admin
Role. You wanted to give those users access to the things they need in that namespace while protecting others. Kubernetes only gives you the ability to define resource types and verbs, things like GET
on ConfigMap
, for example. But with Kyverno, you can even get more granular.
What if you wanted to lock resources down on an à la carte basis but didn't want to screw with your Roles? With a policy like this below, you can allow users to mess with NetworkPolicy objects except those that follow a certain naming pattern.
1apiVersion: kyverno.io/v1
2kind: ClusterPolicy
3metadata:
4 name: deny-netpol-changes
5spec:
6 validationFailureAction: enforce
7 background: false
8 rules:
9 - name: deny-netpol-changes
10 match:
11 resources:
12 kinds:
13 - NetworkPolicy
14 name: "*-default"
15 exclude:
16 clusterRoles:
17 - cluster-admin
18 validate:
19 message: "Changing default network policies is not allowed."
20 deny: {}
In the above, only NetworkPolicy resources with the name ending in "-default" will be locked down across the cluster. This means your users who have permission to work with NetworkPolicy objects are free to do so, just not any of the "-default" NetworkPolicy objects. So by using Kyverno as the validating admission controller, you can further extend RBAC capabilities to sub-resource levels.
Let's do one more that pulls together several pieces of functionality into one policy. What I've shown has to do with validation of either native Kubernetes resources or actions, but Kyverno can validate any custom resource you have in your environment (and it still doesn't require you to program anything).
In this next example, I've got a TKGI (once known as PKS or Enterprise PKS) Kubernetes cluster which has a concept called LogSinks. This is a custom resource which allows easy forwarding of logs to a destination on either a Namespace or cluster basis without having to jump through the complex hoops of things like fluentd. As logs are really important, especially in stateless apps, you want to ensure that, although you delegate that ability to your Namespace admins, they can only go to designated locations in the company and not some rogue system that gets stood up. Using a combination of a Kyverno policy, variables from the AdmissionRequest payload, and also reading in data from a ConfigMap, it's most certainly possible.
I'm going to allow my Namespace admins to forward their logs to one of three possible destinations in my organization. I'll define these in a ConfigMap like so.
1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: logsink-destinations
5 namespace: qa-falco
6data:
7 allowed-destinations: "[\"splunk.zoller.com\", \"loginsight.zoller.com\", \"solarwinds.zoller.com\"]"
Next, I'll create a Kyverno Policy that watches for these LogSink resources in my qa-falco
Namespace and checks their host
field against the allowable list in my earlier ConfigMap.
1apiVersion: kyverno.io/v1
2kind: Policy
3metadata:
4 name: logsink-policy
5 namespace: qa-falco
6spec:
7 validationFailureAction: enforce
8 background: false
9 rules:
10 - name: validate-logsink-destination
11 context:
12 - name: logsink-allowed-destinations
13 configMap:
14 name: logsink-destinations
15 namespace: qa-falco
16 match:
17 resources:
18 kinds:
19 - LogSink
20 validate:
21 message: "The destination {{ request.object.spec.host }} is not in the list of allowed destinations."
22 deny:
23 conditions:
24 - key: "{{ request.object.spec.host }}"
25 operator: NotIn
26 value: "{{ \"logsink-allowed-destinations\".data.\"allowed-destinations\" }}"
By creating a context
element in the Policy which refers to a ConfigMap, that data can subsequently be consumed as a variable in the validate.deny
statement. And by tapping in to the AdmissionReview request data that the kube-apiserver sent over, we can check it against our ConfigMap data.
Now, if I try and create a LogSink in that same Namespace that uses a host I didn't authorize, it's immediately blocked.
1apiVersion: pksapi.io/v1beta1
2kind: LogSink
3metadata:
4 name: falco-sink
5 namespace: qa-falco
6spec:
7 type: syslog
8 host: wavefront.zoller.com
9 port: 9000
10 enable_tls: true
1$ k create -f logsink.yaml
2Error from server: error when creating "logsink.yaml": admission webhook "nirmata.kyverno.resource.validating-webhook" denied the request:
3
4resource LogSink/default/falco-sink was blocked due to the following policies
5
6logsink-policy:
7 validate-logsink-destination: 'The destination wavefront.zoller.com is not in the list of allowed destinations.'
As you can see, not only are validate
rules fairly easy to understand but they wield an incredible amount of power in helping you set best practices in your environment. Although I've shown how simple they are to write, the fact is a great deal of these best practices have already been written in the sample policy library. So you may not need to write anything from scratch at all, or, worst case just adapt an existing policy to meet your needs.
Hopefully this article has explained what a validate
policy does, the types of capabilities they offer, and an insight on how you can pick them up today and start putting them to good use.
Be on the lookout for the next article covering the mutation capabilities of Kyverno.