Exploring Kyverno: Part 1, Validation

Welcome to the first part of my "Exploring Kyverno" blog series. If you haven't read the introduction, I encourage you to do so first. This series is a multi-part exploration of the open-source, Kubernetes-native policy engine called Kyverno. In this article, I'll be covering the first major capability offered by Kyverno: validations.

By far, the biggest use case for admission controllers such as Kyverno is the ability to check incoming requests and validate that they conform to approved standards. These standards can be well-known and established standards like many best practices, or they can be something totally custom which is important to you and your way of operating. Regardless of the motivation, Kyverno's first and most popular ability is to validate requests.

Before getting into that, let's quickly get Kyverno installed. It's trivial to install and also extremely quick, so follow the quick start guide and choose your method. I'll just opt for this one liner which will give you the bleeding edge release (which may be a pre-release):

1kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/main/definitions/release/install.yaml

Once installed, Kyverno will register itself as mutating and validating webhooks. If you'd like to see what those look like, there's a very cool plug-in for krew released here which will visualize those for you in a terminal. It's installed simply with kubectl krew install view-webhook.

 1+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+
 2|    KIND    |                  NAME                   |                   WEBHOOK                   |               SERVICE               | RESOURCES&OPERATIONS | REMAINING DAY |       ACTIVE NS        |
 3+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+
 4| Mutating   | kyverno-policy-mutating-webhook-cfg     | nirmata.kyverno.policy-mutating-webhook     | └─┬kyverno-svc                      | ├──clusterpolicies/* | 9 years       | ✖ No Active Namespaces |
 5|            |                                         |                                             |   ├──NS  : kyverno                  | └─┬policies/*        |               |                        |
 6|            |                                         |                                             |   ├──Path: /policymutate            |   ├──+CREATE         |               |                        |
 7|            |                                         |                                             |   └─┬IP  : 100.70.92.19 (ClusterIP) |   └──^UPDATE         |               |                        |
 8|            |                                         |                                             |     └──443/TCP                      |                      |               |                        |
 9+            +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+               +                        +
10|            | kyverno-resource-mutating-webhook-cfg   | nirmata.kyverno.resource.mutating-webhook   | └─┬kyverno-svc                      | └─┬*/*               |               |                        |
11|            |                                         |                                             |   ├──NS  : kyverno                  |   ├──+CREATE         |               |                        |
12|            |                                         |                                             |   ├──Path: /mutate                  |   └──^UPDATE         |               |                        |
13|            |                                         |                                             |   └─┬IP  : 100.70.92.19 (ClusterIP) |                      |               |                        |
14|            |                                         |                                             |     └──443/TCP                      |                      |               |                        |
15+            +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+               +                        +
16|            | kyverno-verify-mutating-webhook-cfg     | nirmata.kyverno.verify-mutating-webhook     | └─┬kyverno-svc                      | └─┬deployments/*     |               |                        |
17|            |                                         |                                             |   ├──NS  : kyverno                  |   └──^UPDATE         |               |                        |
18|            |                                         |                                             |   ├──Path: /verifymutate            |                      |               |                        |
19|            |                                         |                                             |   └─┬IP  : 100.70.92.19 (ClusterIP) |                      |               |                        |
20|            |                                         |                                             |     └──443/TCP                      |                      |               |                        |
21+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+               +                        +
22| Validating | kyverno-policy-validating-webhook-cfg   | nirmata.kyverno.policy-validating-webhook   | └─┬kyverno-svc                      | ├──clusterpolicies/* |               |                        |
23|            |                                         |                                             |   ├──NS  : kyverno                  | └─┬policies/*        |               |                        |
24|            |                                         |                                             |   ├──Path: /policyvalidate          |   └──^UPDATE         |               |                        |
25|            |                                         |                                             |   └─┬IP  : 100.70.92.19 (ClusterIP) |                      |               |                        |
26|            |                                         |                                             |     └──443/TCP                      |                      |               |                        |
27+            +-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+               +                        +
28|            | kyverno-resource-validating-webhook-cfg | nirmata.kyverno.resource.validating-webhook | └─┬kyverno-svc                      | └─┬*/*               |               |                        |
29|            |                                         |                                             |   ├──NS  : kyverno                  |   └──-DELETE         |               |                        |
30|            |                                         |                                             |   ├──Path: /validate                |                      |               |                        |
31|            |                                         |                                             |   └─┬IP  : 100.70.92.19 (ClusterIP) |                      |               |                        |
32|            |                                         |                                             |     └──443/TCP                      |                      |               |                        |
33+------------+-----------------------------------------+---------------------------------------------+-------------------------------------+----------------------+---------------+------------------------+

Let's now look at some of Kyverno's validate functionality.

Kyverno's policies are very orderly and simply laid out. A policy can either be scoped to the entire cluster (a ClusterPolicy) or scoped to a namespace (a Policy). Each policy contains one or more rules. And each rule contains a match statement which is the filter which selects to which resources the rule will apply, and an action statement which can be either validate, mutate, or generate. I'm just covering the validate one in this article; the other two I'll look at separately in forthcoming articles.

In the match statement, Kyverno can check basically any type of resource in Kubernetes, including subjects, roles, and even custom resources. For more fine-grained control, you can add an exclude statement that filters matches out further. You can combine various clauses in both match and exclude to get down to very fine levels if that's what you need. There's more, of course, but that's the gist.

Enough of that. Let's look at some concrete examples.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: require-labels
 5spec:
 6  validationFailureAction: enforce
 7  rules:
 8  - name: check-for-labels
 9    match:
10      resources:
11        kinds:
12        - Pod
13    validate:
14      message: "The label `app.kubernetes.io/name` is required."
15      pattern:
16        metadata:
17          labels:
18            app.kubernetes.io/name: "?*"

Above is a ClusterPolicy, which means it works across the entire cluster. It contains a single rule whose job is to check all incoming Pods to ensure they have a label called app.kubernetes.io/name, which is a fairly common label. So in our match statement, we're telling Kyverno we want to match on a resource and that resource should be a Pod. Although we could tell it more specific details of what to include or exclude, let's just let this be.

Our action here is to validate so that's also in the rule. The pattern is what we're looking to validate, which is the label. We don't care the value of that label, only that a value exists. Kyverno can use wildcards, so this statement is just saying "ensure there is some value". And the message is what gets displayed if a request is invalid. Finally, there's a validationFailureAction field which tells Kyverno how to respond. It's set to enforce meaning it will immediately block the request and the user will get the message specified. Kyverno can also give you a report of what things are invalid but still allow their creation. By setting this field to audit instead, Kyverno will generate a PolicyReport object (another standard K8s resource) which is essentially a log book of resources which passed or failed validation. This can be extremely handy when evaluating new policies in your environment to assess what's going to happen rather than just throwing a switch (like a PodSecurityPolicy) and figuring out what broke.

So with this ClusterPolicy saved into a manifest, apply it with kubectl. Now let's test it by creating a Deployment containing a single Pod which doesn't have this label specified.

1kubectl create deployment nginx --image=nginx

We're just creating your standard nginx container wrapped in a Deployment. But since Deployments are just wrappers for ReplicaSets which are wrappers for Pods, and Kyverno was told to match on Pods (regardless of the source controller), Kyverno is going to see this and block it.

1Error from server: admission webhook "nirmata.kyverno.resource.validating-webhook" denied the request:
2
3resource Deployment/default/nginx was blocked due to the following policies
4
5require-labels:
6  autogen-check-for-labels: 'Validation error: label `app.kubernetes.io/name` is required;
7    Validation rule autogen-check-for-labels failed at path /spec/template/metadata/labels/app.kubernetes.io/name/'

The response above shows the name of the policy and the name of the rule which blocked the creation of the Deployment, useful when backtracking to figure out how that behavior came to be.

Now, create a Pod directly that does specify the label (could also be a Deployment).

1$ kubectl run nginx --image nginx --labels app.kubernetes.io/name=nginx
2pod/nginx created

Now Kyverno permits its creation, because it is deemed valid according to the policy.

As you can see, not only is the policy structure and language in Kyverno very straightforward and logical, you just define your intent as you're used to doing with Kubernetes. And you don't need to program it to do so.

"So what," you say, "why is that valuable?" Well, let's look at the alternative via a different example.

Another one of the common best practices in Kubernetes land is to ensure that all Pods have resource requests and limits defined. Having these values defined avoids a whole host of problems that can crop up including capacity issues and even whole cluster crashes. Let's say you wanted to impose a restriction that all Pods in your namespace called "production" must have limits defined and yet none can use more than 200m of CPU and 1 gig of memory. Resource Quotas, while useful, aren't going to help you here because they work at an aggregate level, and so normally you'd be on your own.

In Kyverno, no problem, super simple to define. It looks something like this.

 1apiVersion: kyverno.io/v1
 2kind: Policy
 3metadata:
 4  name: require-pod-requests-and-limits
 5  namespace: production
 6spec:
 7  validationFailureAction: enforce
 8  rules:
 9  - name: validate-resource-limits
10    match:
11      resources:
12        kinds:
13        - Pod
14    validate:
15      message: "Pod limits are required in this namespace with a max of 200m of CPU and 1Gi of memory."
16      pattern:
17        spec:
18          containers:
19          - resources:
20              limits:
21                memory: "<=1Gi"
22                cpu: "<=200m"

Although that Policy manifest is just over 20 lines, the real "work" is only about 2.

How about doing this same thing in OPA/Gatekeeper?

I&#39;m not even kidding. Expand if you dare.

  1rego: |
  2        package k8scontainerlimits
  3
  4        missing(obj, field) = true {
  5          not obj[field]
  6        }
  7
  8        missing(obj, field) = true {
  9          obj[field] == ""
 10        }
 11
 12        canonify_cpu(orig) = new {
 13          is_number(orig)
 14          new := orig * 1000
 15        }
 16
 17        canonify_cpu(orig) = new {
 18          not is_number(orig)
 19          endswith(orig, "m")
 20          new := to_number(replace(orig, "m", ""))
 21        }
 22
 23        canonify_cpu(orig) = new {
 24          not is_number(orig)
 25          not endswith(orig, "m")
 26          re_match("^[0-9]+$", orig)
 27          new := to_number(orig) * 1000
 28        }
 29
 30        # 10 ** 21
 31        mem_multiple("E") = 1000000000000000000000 { true }
 32
 33        # 10 ** 18
 34        mem_multiple("P") = 1000000000000000000 { true }
 35
 36        # 10 ** 15
 37        mem_multiple("T") = 1000000000000000 { true }
 38
 39        # 10 ** 12
 40        mem_multiple("G") = 1000000000000 { true }
 41
 42        # 10 ** 9
 43        mem_multiple("M") = 1000000000 { true }
 44
 45        # 10 ** 6
 46        mem_multiple("k") = 1000000 { true }
 47
 48        # 10 ** 3
 49        mem_multiple("") = 1000 { true }
 50
 51        # Kubernetes accepts millibyte precision when it probably shouldn't.
 52        # https://github.com/kubernetes/kubernetes/issues/28741
 53        # 10 ** 0
 54        mem_multiple("m") = 1 { true }
 55
 56        # 1000 * 2 ** 10
 57        mem_multiple("Ki") = 1024000 { true }
 58
 59        # 1000 * 2 ** 20
 60        mem_multiple("Mi") = 1048576000 { true }
 61
 62        # 1000 * 2 ** 30
 63        mem_multiple("Gi") = 1073741824000 { true }
 64
 65        # 1000 * 2 ** 40
 66        mem_multiple("Ti") = 1099511627776000 { true }
 67
 68        # 1000 * 2 ** 50
 69        mem_multiple("Pi") = 1125899906842624000 { true }
 70
 71        # 1000 * 2 ** 60
 72        mem_multiple("Ei") = 1152921504606846976000 { true }
 73
 74        get_suffix(mem) = suffix {
 75          not is_string(mem)
 76          suffix := ""
 77        }
 78
 79        get_suffix(mem) = suffix {
 80          is_string(mem)
 81          count(mem) > 0
 82          suffix := substring(mem, count(mem) - 1, -1)
 83          mem_multiple(suffix)
 84        }
 85
 86        get_suffix(mem) = suffix {
 87          is_string(mem)
 88          count(mem) > 1
 89          suffix := substring(mem, count(mem) - 2, -1)
 90          mem_multiple(suffix)
 91        }
 92
 93        get_suffix(mem) = suffix {
 94          is_string(mem)
 95          count(mem) > 1
 96          not mem_multiple(substring(mem, count(mem) - 1, -1))
 97          not mem_multiple(substring(mem, count(mem) - 2, -1))
 98          suffix := ""
 99        }
100
101        get_suffix(mem) = suffix {
102          is_string(mem)
103          count(mem) == 1
104          not mem_multiple(substring(mem, count(mem) - 1, -1))
105          suffix := ""
106        }
107
108        get_suffix(mem) = suffix {
109          is_string(mem)
110          count(mem) == 0
111          suffix := ""
112        }
113
114        canonify_mem(orig) = new {
115          is_number(orig)
116          new := orig * 1000
117        }
118
119        canonify_mem(orig) = new {
120          not is_number(orig)
121          suffix := get_suffix(orig)
122          raw := replace(orig, suffix, "")
123          re_match("^[0-9]+$", raw)
124          new := to_number(raw) * mem_multiple(suffix)
125        }
126
127        violation[{"msg": msg}] {
128          general_violation[{"msg": msg, "field": "containers"}]
129        }
130
131        violation[{"msg": msg}] {
132          general_violation[{"msg": msg, "field": "initContainers"}]
133        }
134
135        general_violation[{"msg": msg, "field": field}] {
136          container := input.review.object.spec[field][_]
137          cpu_orig := container.resources.limits.cpu
138          not canonify_cpu(cpu_orig)
139          msg := sprintf("container <%v> cpu limit <%v> could not be parsed", [container.name, cpu_orig])
140        }
141
142        general_violation[{"msg": msg, "field": field}] {
143          container := input.review.object.spec[field][_]
144          mem_orig := container.resources.limits.memory
145          not canonify_mem(mem_orig)
146          msg := sprintf("container <%v> memory limit <%v> could not be parsed", [container.name, mem_orig])
147        }
148
149        general_violation[{"msg": msg, "field": field}] {
150          container := input.review.object.spec[field][_]
151          not container.resources
152          msg := sprintf("container <%v> has no resource limits", [container.name])
153        }
154
155        general_violation[{"msg": msg, "field": field}] {
156          container := input.review.object.spec[field][_]
157          not container.resources.limits
158          msg := sprintf("container <%v> has no resource limits", [container.name])
159        }
160
161        general_violation[{"msg": msg, "field": field}] {
162          container := input.review.object.spec[field][_]
163          missing(container.resources.limits, "cpu")
164          msg := sprintf("container <%v> has no cpu limit", [container.name])
165        }
166
167        general_violation[{"msg": msg, "field": field}] {
168          container := input.review.object.spec[field][_]
169          missing(container.resources.limits, "memory")
170          msg := sprintf("container <%v> has no memory limit", [container.name])
171        }
172
173        general_violation[{"msg": msg, "field": field}] {
174          container := input.review.object.spec[field][_]
175          cpu_orig := container.resources.limits.cpu
176          cpu := canonify_cpu(cpu_orig)
177          max_cpu_orig := input.parameters.cpu
178          max_cpu := canonify_cpu(max_cpu_orig)
179          cpu > max_cpu
180          msg := sprintf("container <%v> cpu limit <%v> is higher than the maximum allowed of <%v>", [container.name, cpu_orig, max_cpu_orig])
181        }
182
183        general_violation[{"msg": msg, "field": field}] {
184          container := input.review.object.spec[field][_]
185          mem_orig := container.resources.limits.memory
186          mem := canonify_mem(mem_orig)
187          max_mem_orig := input.parameters.memory
188          max_mem := canonify_mem(max_mem_orig)
189          mem > max_mem
190          msg := sprintf("container <%v> memory limit <%v> is higher than the maximum allowed of <%v>", [container.name, mem_orig, max_mem_orig])
191        }        

That's almost 200 LINES of boilerplate code you are required to write (or own, as it were) in order to teach Gatekeeper, using the language of Rego, what these units of measurement mean and how to convert between them. All for the sake of checking if two simple fields are within an acceptable limit.

In actuality, a fully-functional policy is more than these ~200 lines. I've omitted the rest of the ConstraintTemplate and left out entirely the Constraint. Source.

The point here is that because Kyverno is Kubernetes native, it owns all the logic allowing you to keep your rules simple, focus on delivering value quickly, and eliminating as much cognitive and technical debt as possible. In OPA/Gatekeeper, because it has no native awareness of Kubernetes, you have to program it yourself to do anything and everything you need. Which would you rather write?

There are many other features of validate rules including things like variables and conditionals. But one feature that's particularly useful when doing validations is to validate or block specific actions against resources. In this regard, Kyverno can actually augment and improve Kubernetes' native RBAC with abilities that you couldn't otherwise get.

For example, say you had a Kubernetes cluster where you had RBAC defined at the namespace level and several different Roles defined, including the admin Role. You wanted to give those users access to the things they need in that namespace while protecting others. Kubernetes only gives you the ability to define resource types and verbs, things like GET on ConfigMap, for example. But with Kyverno, you can even get more granular.

What if you wanted to lock resources down on an à la carte basis but didn't want to screw with your Roles? With a policy like this below, you can allow users to mess with NetworkPolicy objects except those that follow a certain naming pattern.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: deny-netpol-changes
 5spec:
 6  validationFailureAction: enforce
 7  background: false
 8  rules:
 9  - name: deny-netpol-changes
10    match:
11      resources:
12        kinds:
13        - NetworkPolicy
14        name: "*-default"
15    exclude:
16      clusterRoles:
17      - cluster-admin
18    validate:
19      message: "Changing default network policies is not allowed."
20      deny: {}

In the above, only NetworkPolicy resources with the name ending in "-default" will be locked down across the cluster. This means your users who have permission to work with NetworkPolicy objects are free to do so, just not any of the "-default" NetworkPolicy objects. So by using Kyverno as the validating admission controller, you can further extend RBAC capabilities to sub-resource levels.

Let's do one more that pulls together several pieces of functionality into one policy. What I've shown has to do with validation of either native Kubernetes resources or actions, but Kyverno can validate any custom resource you have in your environment (and it still doesn't require you to program anything).

In this next example, I've got a TKGI (once known as PKS or Enterprise PKS) Kubernetes cluster which has a concept called LogSinks. This is a custom resource which allows easy forwarding of logs to a destination on either a Namespace or cluster basis without having to jump through the complex hoops of things like fluentd. As logs are really important, especially in stateless apps, you want to ensure that, although you delegate that ability to your Namespace admins, they can only go to designated locations in the company and not some rogue system that gets stood up. Using a combination of a Kyverno policy, variables from the AdmissionRequest payload, and also reading in data from a ConfigMap, it's most certainly possible.

I'm going to allow my Namespace admins to forward their logs to one of three possible destinations in my organization. I'll define these in a ConfigMap like so.

1apiVersion: v1
2kind: ConfigMap
3metadata:
4  name: logsink-destinations
5  namespace: qa-falco
6data:
7  allowed-destinations: "[\"splunk.zoller.com\", \"loginsight.zoller.com\", \"solarwinds.zoller.com\"]"

Next, I'll create a Kyverno Policy that watches for these LogSink resources in my qa-falco Namespace and checks their host field against the allowable list in my earlier ConfigMap.

 1apiVersion: kyverno.io/v1
 2kind: Policy
 3metadata:
 4  name: logsink-policy
 5  namespace: qa-falco
 6spec:
 7  validationFailureAction: enforce
 8  background: false
 9  rules:
10  - name: validate-logsink-destination
11    context:
12      - name: logsink-allowed-destinations
13        configMap:
14          name: logsink-destinations
15          namespace: qa-falco
16    match:
17      resources:
18        kinds:
19        - LogSink
20    validate:
21      message: "The destination {{ request.object.spec.host }} is not in the list of allowed destinations."
22      deny:
23        conditions:
24        - key: "{{ request.object.spec.host }}"
25          operator: NotIn
26          value:  "{{ \"logsink-allowed-destinations\".data.\"allowed-destinations\" }}"

By creating a context element in the Policy which refers to a ConfigMap, that data can subsequently be consumed as a variable in the validate.deny statement. And by tapping in to the AdmissionReview request data that the kube-apiserver sent over, we can check it against our ConfigMap data.

Now, if I try and create a LogSink in that same Namespace that uses a host I didn't authorize, it's immediately blocked.

 1apiVersion: pksapi.io/v1beta1
 2kind: LogSink
 3metadata:
 4  name: falco-sink
 5  namespace: qa-falco
 6spec:
 7  type: syslog
 8  host: wavefront.zoller.com
 9  port: 9000
10  enable_tls: true
1$ k create -f logsink.yaml
2Error from server: error when creating "logsink.yaml": admission webhook "nirmata.kyverno.resource.validating-webhook" denied the request:
3
4resource LogSink/default/falco-sink was blocked due to the following policies
5
6logsink-policy:
7  validate-logsink-destination: 'The destination wavefront.zoller.com is not in the list of allowed destinations.'

As you can see, not only are validate rules fairly easy to understand but they wield an incredible amount of power in helping you set best practices in your environment. Although I've shown how simple they are to write, the fact is a great deal of these best practices have already been written in the sample policy library. So you may not need to write anything from scratch at all, or, worst case just adapt an existing policy to meet your needs.

Hopefully this article has explained what a validate policy does, the types of capabilities they offer, and an insight on how you can pick them up today and start putting them to good use.

Be on the lookout for the next article covering the mutation capabilities of Kyverno.


Articles in the Exploring Kyverno series

Introduction

Part 1, Validation

Part 2, Mutation

Part 3, Generation