Taking Out The Trash: Automatic Cleanup of Bad Resources with Kyverno

Are you tired of your developers not fixing their apps even when they know they're violating policies? Tired of harassing them via email or Slack? In this blog, I'll show you how you can use Kyverno to find and automatically remove those "bad" resources allowing you to take out your cluster's trash.

One of the lesser-used, yet insanely useful, capabilities in Kyverno is its cleanup policies, which allow you to apply a declarative, policy-based approach to removal of resources in a Kubernetes cluster, in a very fine-grained manner, based on a recurring schedule. By contrast, one of the heavily-used capabilities is Kyverno's policy reports, which give you a decoupled, in-cluster window to how your developers'/users' resources are doing compared to, particularly, validation policies.

The typical Kyverno adoption journey goes something like this: Once upon a time, there was a Kubernetes cluster with a bunch of resources in it. One day, a battle-scared platform engineer was tasked with implementing policy in the cluster to improve security and prevent "bad stuff" from happening. Not wanting to disrupt operations, he adds Kyverno policies to the cluster in Audit mode, which produce policy reports. Those reports record the success or failure status of the existing resources in that cluster when compared to the policies which were added. "Good" resources are said to succeed; "bad" resources are said to fail. The engineer now has a naughty list of the things that need to be fixed, so he sets off into the corporate wilderness to find the owners of these resources and inform them of their wicked ways, delivering an ultimatum from on high that they should be fixed within a certain number of days. Except those days come and go and the resources still aren't fixed. He once again pesters them to remediate yet the cacophony of silence reigns.

If this sounds like you, or maybe you fear this could become you, I have some good news. Kyverno can find all the naughty resources in the cluster and automatically remove them for you after your grace period has expired.

Before going further, I should add a strong word of caution here. Deleting resources is a serious business and before you go and try anything below you should do the responsible thing and establish agreements with the necessary stakeholders. You should absolutely test this in a non-production environment, scoping down as tightly as possible so as not to produce adverse effects.

The flow here is pretty simple:

  1. Scan policy reports for failed results.
  2. Annotate the resource with the time the first failure was observed.
  3. Periodically run cleanup, deleting only the resources which have failed for whatever grace period you want.

Doing this, while not super complex, does involve a few parts to ensure the cleanup process only removes the resources it should.

For example, in my strategy covered herein, I'm not looking for a specific policy to have failed; I want any failures. This can, of course, be configured. I also want to be careful so that in the instance where developers remediate their issues and policy reports have all successes, the cleanup should not occur. To build this out, I'm demonstrating the concept with Deployments and only in a specific Namespace. Also note that I built this assuming Kyverno 1.11 and its per-resource reports, not the earlier style of reporting. This could still work for earlier versions, but modifications would have to be made.

To start, we need some permissions for both the background controller (which will handle mutations of existing resources) and the cleanup controller (which will do the actual deletions).

Permission below use labels to aggregate to the proper "parent" ClusterRoles.

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 4  labels:
 5    app.kubernetes.io/component: background-controller
 6    app.kubernetes.io/instance: kyverno
 7    app.kubernetes.io/part-of: kyverno
 8  name: kyverno:update-deployments
10- apiGroups:
11  - apps
12  resources:
13  - deployments
14  verbs:
15  - update
17apiVersion: rbac.authorization.k8s.io/v1
18kind: ClusterRole
20  labels:
21    app.kubernetes.io/component: cleanup-controller
22    app.kubernetes.io/instance: kyverno
23    app.kubernetes.io/part-of: kyverno
24  name: kyverno:clean-violating-resources
26- apiGroups:
27  - apps
28  resources:
29  - deployments
30  verbs:
31  - get
32  - watch
33  - list
34  - delete

Now, let's look at the meat of the system. We need a couple of "mutate existing" policies here to add and remove annotations. Although it would have been nice if we could just simply look at policy reports entirely, that's not possible for a couple of reasons. First, the policy report's creation date is unreliable because you cannot assume it was created due to a failure; it could have been a success or skip. Second, the rule result timestamp (given in Unix time) gets refreshed at every background scan. In order to begin the calculation properly, you need to know the first time a failure was observed and then, conversely, when ALL the failures are gone to remove/change that annotation.

With that said, here's the policy below with both rules.

 1apiVersion: kyverno.io/v2beta1
 2kind: ClusterPolicy
 4  name: manage-cleanup-annotations
 6  background: true
 7  mutateExistingOnPolicyUpdate: true
 8  rules:
 9    - name: failed-annotation
10      match:
11        any:
12        - resources:
13            kinds:
14            - Deployment
15            namespaces:
16            - engineering
17      mutate:
18        targets:
19        - apiVersion: apps/v1
20          kind: Deployment
21          name: "{{ request.object.metadata.name }}"
22          namespace: "{{ request.object.metadata.namespace }}"
23          context:
24          - name: failedpolicyresults
25            apiCall:
26              urlPath: "/apis/wgpolicyk8s.io/v1alpha2/namespaces/{{ request.namespace }}/policyreports?fieldSelector=metadata.name={{ request.object.metadata.uid }}"
27              jmesPath: items[0].summary.fail || `0`
28          preconditions:
29            all:
30              - key: "{{ failedpolicyresults }}"
31                operator: GreaterThan
32                value: 0
33        patchStrategicMerge:
34          metadata:
35            annotations:
36              corp.io/hasfailedresults: "true"
37              +(corp.io/lastfailedtime): "{{ time_now_utc() }}"
38    - name: success-annotation
39      match:
40        any:
41        - resources:
42            kinds:
43            - Deployment
44            namespaces:
45            - engineering
46      mutate:
47        targets:
48        - apiVersion: apps/v1
49          kind: Deployment
50          name: "{{ request.object.metadata.name }}"
51          namespace: "{{ request.object.metadata.namespace }}"
52          context:
53          - name: failedpolicyresults
54            apiCall:
55              urlPath: "/apis/wgpolicyk8s.io/v1alpha2/namespaces/{{ request.namespace }}/policyreports?fieldSelector=metadata.name={{ request.object.metadata.uid }}"
56              jmesPath: items[0].summary.fail || `0`
57          preconditions:
58            all:
59              - key: "{{ failedpolicyresults }}"
60                operator: Equals
61                value: 0
62              - key: '{{ request.object.metadata.annotations."corp.io/hasfailedresults" || "" }}'
63                operator: Equals
64                value: "true"
65              - key: '{{ request.object.metadata.annotations."corp.io/lastfailedtime" || "" }}'
66                operator: Equals
67                value: "?*"
68        patchesJson6902: |-
69          - path: /metadata/annotations/corp.io~1hasfailedresults
70            op: remove
71          - path: /metadata/annotations/corp.io~1lastfailedtime
72            op: remove          

The policy operates on Deployments in the engineering Namespace. The first rule adds the annotations corp.io/hasfailedresults with a value of true when greater than zero failures are detected, and corp.io/lastfailedtime with the current time when a failure was detected. The +() anchor is used so this value does not get overwritten on subsequent background scans because that would wipe our record of when a failure was first observed.

You wouldn't want to delete just Pods because, more than likely, they're owned by some higher-level controller. Deleting them will just involve those controllers spitting the same bad Pods out again resulting in an eternal battle of whack-a-mole.

The second rule looks for successes (no failures) and, if found, will remove those two annotations if they exist.

These both run in Kyverno's background mode, which is key, because a policy report is delayed from when a Deployment is created or updated. By default, background scanning occurs every hour, so the next time the scan occurs, if there is even one failure, the annotations will be applied. They will not be removed until no failures remain.

Although, as I mentioned, this looks for any failures, it could be modified if you have a specific policy you're interested in. The following JMESPath query could be used to consider only failures in the "require-run-as-nonroot" policy: items[0].results[?policy=='require-run-as-nonroot'].result | [0] || 'empty'

Now that annotations are created and removed reliably, it's time for the cleanup.

The below ClusterCleanupPolicy again targets Deployments in the engineering Namespace and deletes the ones which have failed results but only if the time at which the failure was observed is greater than, in this case, one hour. The cleanup runs on the tenth minute of every hour, so you have the effect where their grace period is one hour, possibly up to two. Almost certainly you'll want to increase this grace period if you were to adopt this process "for real", but in testing you probably don't want to wait, for example, a week or two to see the effects.

 1apiVersion: kyverno.io/v2beta1
 2kind: ClusterCleanupPolicy
 4  name: clean-violating-resources
 6  match:
 7    any:
 8    - resources:
 9        kinds:
10          - Deployment
11        namespaces:
12          - engineering
13  conditions:
14    all:
15    - key: '{{ target.metadata.annotations."corp.io/hasfailedresults" || "" }}'
16      operator: Equals
17      value: "true"
18    - key: "{{ time_since('','{{ target.metadata.annotations.\"corp.io/lastfailedtime\" }}','') || '' }}"
19      operator: GreaterThan
20      value: 1h
21  schedule: "10 * * * *"

Lastly, because these two annotations are considered sensitive, it'd be best to add in a validate rule to ensure that only Kyverno's ServiceAccounts have permission to manipulate them. This policy below does just that.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 4  name: protect-annotation
 6  validationFailureAction: Enforce
 7  background: false
 8  rules:
 9  - name: protect-hasfailedresults
10    match:
11      any:
12      - resources:
13          kinds:
14          - Deployment
15          operations:
16          - UPDATE
17          namespaces:
18          - engineering
19    exclude:
20      any:
21      - subjects:
22        - kind: ServiceAccount
23          name: kyverno-?*
24          namespace: kyverno
25    validate:
26      message: "Removing or modifying this annotation is prohibited."
27      deny:
28        conditions:
29          any:
30          - key: "{{request.object.metadata.annotations.\"corp.io/hasfailedresults\" || 'empty' }}"
31            operator: NotEquals
32            value: "{{request.oldObject.metadata.annotations.\"corp.io/hasfailedresults\" || 'empty' }}"
33          - key: "{{request.object.metadata.annotations.\"corp.io/lastfailedtime\" || 'empty' }}"
34            operator: NotEquals
35            value: "{{request.oldObject.metadata.annotations.\"corp.io/lastfailedtime\" || 'empty' }}"

With this combination of policies, you now have a neat automated cleanup system for resources which are failing policies. Plenty of other things you could do here, but I hope you find this useful.