Signing and Automating Policy Exceptions

Policy Exceptions are a new feature introduced in Kyverno 1.9 which allow decoupled, self-service, and granular exclusion of resources to one or more Kyverno policies. Because they effectively allow bypassing a policy, great care should be taken when employing them. In this post, I'll show how you can use another feature of Kyverno, manifest signature verification, in an automated approach to sign-off and then verify these Policy Exceptions in your environment to ensure they are vetted and tamper proof.

I have written a bit about, and even demoed, policy exceptions before when I showed how you can use them with an expiration date system, so for a primer on what they are and how they work, I encourage you to read that article. A very brief synopsis goes like this: Kyverno Policy Exceptions are a new Custom Resource which permit a matching resource to bypass an installed Kyverno Policy (one either Namespace-scoped or cluster-scoped) once created. Since Kyverno will obey the Policy Exception once it is created in the cluster (assuming the feature has been properly enabled, which it isn't by default), you should develop some sort of system by which those Policy Exceptions can be created. An obvious starting point is Kubernetes RBAC itself. Since Policy Exceptions are another Custom Resource, ensuring you only allow them to be created by the Roles and/or ClusterRoles necessary is straightforward. A second step might be to write (or use a pre-written) Kyverno policy which validates Policy Exceptions have a certain "shape". And a third, which is what I want to show here, is how, perhaps combined with the previous, you can sign a Policy Exception to "lock" its contents in place and validate that signature in the cluster (also by Kyverno) prior to it being persisted.

Kyverno has an ability called manifest validation which builds on a Sigstore subproject to verify signatures on YAML manifest files. If you're familiar with Cosign and how Sigstore does this for OCI images, it's extremely similar only the k8s-manifest-sigstore subproject does this for YAML files. Since this ability is governed by a standard validate rule, and validate rules just like anything in Kyverno can apply to any type of Kubernetes resource (even other Kyverno resources!), we can use this manifest validation capability to verify Policy Exceptions themselves.

This ability to sign and then verify signatures on Policy Exception documents is great, but it really almost requires some system of automation to make it feature complete. After all, it's probably not likely that you'd want to allow your end users to sign their own Policy Exceptions and have Kyverno just accept them blindly. Fortunately, we have a great solution with git and the various CI processes offered through git-as-a-Service vendors and GitOps deployment tools. So next I want to show a concept for a fully-automated, end-to-end, signing, verification, and deployment flow which uses GitHub Actions, Argo CD, and, of course, Kyverno.

An architecture of the Policy Exception signing, deployment, and verification flow. Although Argo CD here is shown separating the two, it is actually running inside the Kubernetes cluster.

The diagram above attempts to capture one possible flow here, but let's walk through it.

  1. A user initially commits an unsigned Policy Exception to a branch called "unsigned" in a directory called unsigned in some GitHub repository. Once they create a pull request to the "main" branch, the process begins. When I mentioned above that you could use an additional Kyverno policy to validate the Policy Exception's shape, this would be the phase where that happens in a workflow which uses the Kyverno CLI to check it against that sample policy.
  2. Once that pull request has been approved and merged, a GitHub Actions workflow kicks off (very similar to the one I showed in a previous article) which uses the k8s-manifest-sigstore CLI to sign the Policy Exception YAML manifest using keyless signing.
  3. The Policy Exception is signed and committed to a directory called signed by the actions workflow.
  4. Argo CD has an Application definition which is configured to deploy from this repository's main branch signed directory only. Once it sees a new file show up, it tries to deploy it.
  5. Kyverno has a matching validate policy which requires that all Policy Exceptions be signed where the signature must state that the signing process happened from this specific repository and the main branch only. Additionally, the signature is only valid for a Policy Exception that looked identical to when it entered the signing flow; should a malicious actor have been able to alter the Policy Exception after it was saved to main, although Argo CD still would have tried to deploy it, Kyverno would have blocked it.
  6. Once the signed, validated Policy Exception is persisted into the Kubernetes cluster, the exception takes effect immediately and a user is able to submit a resource which will be exempted from the policy/policies.

Here's an example GitHub Actions workflow which takes care of the first part. Assume this is named sign.yaml.

 1name: Policy Exception Signing
 2# only trigger on pull request closed events
 4  pull_request:
 5    types: [ closed ]
 7  merge_job:
 8    # this job will only run if the PR has been merged
 9    if: github.event.pull_request.merged == true
10    runs-on: ubuntu-latest
11    permissions:
12      contents: write
13      actions: read
14      id-token: write # Needed for OIDC and keyless signing
15    steps:
16    - run: |
17                echo PR #${{ github.event.number }} has been merged, beginning signing flow
18    - name: Checkout
19      uses: actions/checkout@v3
20      with:
21        fetch-depth: 0
22    - name: Sign
23      run: |
24        curl -sLO
25        chmod +x kubectl-sigstore-linux-amd64
26        sudo mv kubectl-sigstore-linux-amd64 kubectl-sigstore
27        ./kubectl-sigstore version
28        ls -lah
29        for f in $(ls ./unsigned)
30        do
31        if [[ "$f" = *\.yaml ]]
32        then
33          echo "Signing unsigned/$f"
34          COSIGN_EXPERIMENTAL=1 ./kubectl-sigstore sign --tarball=no -f unsigned/$f -o signed/$f
35          rm unsigned/$f
36        fi
37        done        
38    - name: Push signed manifests
39      uses: EndBug/add-and-commit@v9
40      with:
41        author_name: GitHub Actions
42        commit: --signoff
43        default_author: github_actions
44        message: 'Signed manifest committed.'

As you can see, this workflow only fires when a pull request (PR) has been merged. It then fetches the k8s-manifest-sigstore CLI to sign the manifests in the unsigned directory, pushing them to the signed one. The second step then commits those back to the main branch.

A signed Policy Exception coming out of this process may look something like the following:

 2kind: PolicyException
 4  annotations:
 5 H4sIAAAAAAA<snip>hwq2jNfkA//8BvIaSkAsAAA==
 6 H4sI<snip>zpfkEAAA=
 7 H4sIAAAA<snip>0BAAA=
 8 MEUC<snip>Eu7ZgzvE=
 9  name: davepolex
10  namespace: platform
12  exceptions:
13  - policyName: require-runasnonroot
14    ruleNames:
15    - runasnonroot
16  match:
17    any:
18    - resources:
19        kinds:
20        - Pod
21        names:
22        - mongo*

From here, an Argo CD Application definition, which looks like the below, is configured to watch this repository in the signed path and then deploy to the platform Namespace.

 2kind: Application
 4  name: policyexceptions
 5  namespace: argocd
 7  project: default
 8  source:
 9    repoURL:
10    targetRevision: HEAD
11    path: signed
12  destination:
13    server: https://kubernetes.default.svc
14    namespace: platform
15  syncPolicy:
16    automated:
17      prune: false
18      selfHeal: false
19    syncOptions:
20    - ServerSideApply=true

Finally, a Kyverno validate policy already exists which specifically watches for the PolicyException resource and verifies that the signature comes from the expected origin.

 2kind: ClusterPolicy
 4  name: validate-polex
 6  validationFailureAction: Enforce
 7  background: false
 8  rules:
 9    - name: github-polex
10      match:
11        any:
12        - resources:
13            kinds:
14              - PolicyException
15      validate:
16        manifests:
17          attestors:
18          - entries:
19            - keyless:
20                subject:
21                issuer:

Since the ClusterPolicy, which operates across the entire cluster, is in Enforce mode, any Policy Exception that fails verification will not be persisted into the cluster. The subject here ensures that the Policy Exception was signed as a result of the Actions workflow, on the main branch, named sign.yaml. The issuer ensures that GitHub OIDC was used to issue the token for the signature process.

Some additional important notes here:

  • Following this process requires Kyverno 1.9.1+ to pick up changes made in the k8s-manifest-sigstore CLI to support keyless signing in GitHub Actions (thanks, Hiro Kitahara!)
  • If you used the optional --exceptionNamespace flag when installing Kyverno, Policy Exceptions will only be considered if they are created in that Namespace. If the value of the flag you passed is the same where Kyverno itself is deployed, verification will not work unless you opted to include that Namespace in webhooks (the default is to exclude Kyverno's Namespace). You may want to choose a separate Namespace where these Policy Exceptions can be created, for example a Namespace called platform is what I used in this demo.

And, really, that's just about it. Hopefully you can see how something like YAML manifest signature verification can give you an extra layer of protection in helping to build a robust system for Kyverno's Policy Exceptions feature. Because Kyverno has so many abilities that can function like "tools", you get to pick and choose which tools to use to best suit your organization.

Thanks for reading!