Preserving Authorship in a GitOps World with Kyverno

It seems just about everyone is doing GitOps in Kubernetes these days. With so many available tools and the maturity of them, it's hard to avoid it. But with only one tool being responsible for the actual creation in the cluster of the resources stored in git, it makes it difficult or impossible for someone to answer the question, "who is the author of this thing?" In this post, I'll show one nifty method for getting more mileage out of Kyverno by using its CLI to help you answer this question in an automated fashion.

One of the inevitabilities of Kubernetes--or, indeed, any IT system--is that once it proves to be successful, more people within an organization begin to adopt it. More people equals more hands in the pie as it were, and being able to identify those people is often times quite important. From an organizational perspective, people are grouped into teams and it's a well-known and commonly-accepted practice for things like team names to be required, often as labels or annotations, when creating Kubernetes resources. This desire falls under the governance category when it comes to policy and Kyverno handles this extremely easily today with validate rules. Teams are often necessary, but it still doesn't capture the individual person. Which person on this team was responsible here? When multiple people participate in authoring resources into a single git repo, things get tricky.

Assigning owners automatically is also possible today as shown in this mutate policy. In cases where users are allowed to individually and directly create resources against a cluster, this can be a viable approach. But in the GitOps world, this isn't how things happen. The tool of choice (Argo CD, Flux, etc) is the sole creator of these resources and so using this policy would result in indicating all resources were created by the same ServiceAccount like shown below in a snippet of a Pod being deployed to a cluster by Argo CD.

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  annotations:
 5    kyverno.io/created-by: '{"username":"system:serviceaccount:argocd:argocd-application-controller"}'
 6  creationTimestamp: "2023-02-12T15:51:27Z"
 7  generation: 1
 8  labels:
 9    app.kubernetes.io/instance: signed-development
10  name: nginx-platform
11  namespace: platform-a
12  resourceVersion: "1750857"
13  uid: ad2b0ce9-1b6e-47ed-a9d2-946c7856c179
14spec:
15  <snip>

The resulting workflow this represents looks like the below.

Individual authorship is lost when using a GitOps tool to deploy to a Kubernetes cluster.

Even though the ServiceAccount named, in this case, argocd-application-controller was the principal responsible for the creation of the resource, it wasn't the author. What's needed here is an automated way to add this authorship information inside of the pipeline to the resources being created by their human operators.

The Kyverno CLI has a wide range of functionalities. One of its capabilities is being able to apply a Kubernetes resource manifest to a policy and view the result. This works not just for validate rules but for mutate rules as well. When used with a mutate rule, it can also show the final result of that mutation. This result can very easily be sent to a file by using the -o flag.

See the output of the kyverno apply -h command for all the possibilities.

Kyverno also has a rich variable system and has the ability to set the values of those variables during runtime in the CLI. A mutate rule with a variable can be written to add a label or annotation to any Kubernetes resource quite simply. In this example, I've chosen a label named corp.org/author. The value of this will be dynamically set in the next step.

 1apiVersion: kyverno.io/v1
 2kind: ClusterPolicy
 3metadata:
 4  name: add-labels
 5spec:
 6  background: false
 7  rules:
 8  - name: add-author
 9    match:
10      any:
11      - resources:
12          kinds:
13          - "*"
14    mutate:
15      patchStrategicMerge:
16        metadata:
17          labels:
18            corp.org/author: "{{request.githubprauthor}}"

If you're already familiar with Kyverno and its CLI, the "trick" you might have noticed here is using a variable in the policy which begins with request.. Although variables which begin with this word normally come from the Kubernetes API server via its AdmissionReview, I'm piggybacking off of that to define my own. This obviously would never work in a live cluster, but it allows the CLI to permit the variable rather than flagging it as unrecognizable.

Basically all of the CI tools out there have some type of pre-defined variable which captures the ID of the user who initiated a pull/merge request. In GitHub Actions, which is where I'll be showing this, it's found under the event type at github.event.pull_request.user.login. We should then be able to pair a mutate rule, like the above, with this information to capture the ID of that user at the time such a request is opened. Once pieced together, we can do something like the example GitHub Actions workflow shown below. Let's break it down.

  1. A user writes their manifests into the /incoming directory.
  2. When a PR is submitted, all your normal workflows fire. This is where you could validate those resources and fail or return messages allowing the user to correct them as needed.
  3. When the PR is merged, the workflow uses the Kyverno CLI to mutate each of the manifests in /incoming adding the user responsible for the PR as the value of the corp.org/author label.
  4. The mutated manifests are sent to the /outgoing directory and the /incoming directory is scrubbed clean.
  5. Your GitOps tool of choice is configured to look at the /outgoing directory and deploy whatever is inside.
  6. Once a new manifest has been committed, your tool then deploys those changes to your cluster.
 1name: Merge workflow
 2
 3# only trigger on pull request closed events
 4on:
 5  pull_request:
 6    types: [ closed ]
 7env:
 8  VERSION: v1.9.0
 9
10jobs:
11  merge_job:
12    # this job will only run if the PR has been merged
13    if: github.event.pull_request.merged == true
14    runs-on: ubuntu-latest
15    permissions:
16      contents: write
17      actions: read
18      id-token: write
19    steps:
20    - name: Checkout
21      uses: actions/checkout@v3
22      with:
23        fetch-depth: 0
24    - name: Write author
25      run: |
26        curl -sLO https://github.com/kyverno/kyverno/releases/download/${{ env.VERSION }}/kyverno-cli_${{ env.VERSION }}_linux_x86_64.tar.gz
27        tar -xf kyverno-cli_${{ env.VERSION }}_linux_x86_64.tar.gz
28        ./kyverno version
29        for f in $(ls ./incoming)
30        do
31        if [[ "$f" = *\.yaml ]]
32        then
33            echo "Adding authorship to incoming/$f"
34            ./kyverno apply author.yaml -r incoming/$f --set request.githubprauthor=${{github.event.pull_request.user.login}} -o outgoing/temp.yaml
35            sed '/^[[:space:]]*$/d' outgoing/temp.yaml > outgoing/$f
36            rm incoming/$f
37            rm outgoing/temp.yaml
38        fi
39        done        
40    - name: Push manifests
41      uses: EndBug/add-and-commit@v9
42      with:
43        author_name: GitHub Actions
44        commit: --signoff
45        default_author: github_actions
46        message: 'Manifests committed.'

After this flow completes and your GitOps tool deploys the resources, you should now be able to very easily and conveniently see who the original author of that resource was. Below you can tell by the value of corp.org/author that my GitHub account ("chipzoller") was used to author this resource irrespective of the GitOps tool used to actually deploy it.

 1$ kubectl get deploy product-bravo -o yaml
 2
 3apiVersion: apps/v1
 4kind: Deployment
 5metadata:
 6  annotations:
 7    deployment.kubernetes.io/revision: "1"
 8    kubectl.kubernetes.io/last-applied-configuration: |
 9            {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"crest","corp.org/author":"chipzoller"},"name":"product-bravo","namespace":"default"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"prodb"}},"template":{"metadata":{"labels":{"app":"prodb"}},"spec":{"containers":[{"args":["sleep","1d"],"image":"busybox:1.28","name":"busybox"}]}}}}
10  creationTimestamp: "2023-02-12T18:49:11Z"
11  generation: 1
12  labels:
13    app.kubernetes.io/instance: crest
14    corp.org/author: chipzoller
15  name: product-bravo
16  namespace: default
17  resourceVersion: "1769507"
18  uid: 7779e7f9-0a95-477b-9098-3758c5330e80
19spec:
20  progressDeadlineSeconds: 600
21  replicas: 1
22  revisionHistoryLimit: 10
23  selector:
24    matchLabels:
25      app: prodb
26  strategy:
27    rollingUpdate:
28      maxSurge: 25%
29      maxUnavailable: 25%
30    type: RollingUpdate
31  template:
32    metadata:
33      creationTimestamp: null
34      labels:
35        app: prodb
36    spec:
37      containers:
38      - args:
39        - sleep
40        - 1d
41        image: busybox:1.28
42        imagePullPolicy: IfNotPresent
43        name: busybox
44        resources: {}
45        terminationMessagePath: /dev/termination-log
46        terminationMessagePolicy: File
47      dnsPolicy: ClusterFirst
48      restartPolicy: Always
49      schedulerName: default-scheduler
50      securityContext: {}
51      terminationGracePeriodSeconds: 30
52status:
53  availableReplicas: 1
54  conditions:
55  - lastTransitionTime: "2023-02-12T18:49:13Z"
56    lastUpdateTime: "2023-02-12T18:49:13Z"
57    message: Deployment has minimum availability.
58    reason: MinimumReplicasAvailable
59    status: "True"
60    type: Available
61  - lastTransitionTime: "2023-02-12T18:49:11Z"
62    lastUpdateTime: "2023-02-12T18:49:13Z"
63    message: ReplicaSet "product-bravo-84fc4998bd" has successfully progressed.
64    reason: NewReplicaSetAvailable
65    status: "True"
66    type: Progressing
67  observedGeneration: 1
68  readyReplicas: 1
69  replicas: 1
70  updatedReplicas: 1

This technique can be used to further enhance these manifests with even more information about the individuals involved in the authoring or approval process. For example, you might also want to know the pull request from which this particular resource manifest originated, or the account responsible for merging it. You can add this information as additional labels and pass these as variables to the Kyverno CLI referencing the appropriate context variables.

1    mutate:
2      patchStrategicMerge:
3        metadata:
4          labels:
5            corp.org/author: "{{request.githubprauthor}}"
6            corp.org/pr: "{{request.githubpr | to_string(@) }}"
7            corp.org/merger: "{{request.githubmerger}}"
1--set request.githubprauthor=${{github.event.pull_request.user.login}},\
2request.githubpr=${{github.event.number}},\
3request.githubmerger=${{github.event.pull_request.merged_by.login}}
1apiVersion: v1
2kind: Namespace
3metadata:
4  labels:
5    corp.org/author: realshuting
6    corp.org/merger: chipzoller
7    corp.org/pr: "13"
8  name: org-ns-bar

And that's basically it. Not a super complex bit of automation but nevertheless can assist in the process of following back to its source any given resource that's deployed into your Kubernetes environments. This method works with any GitOps tool you want and on basically any CI tool you wish to use.

If you liked this post, I'm always glad to hear feedback so feel free to drop me a note on Twitter or come find me on Slack.