Attesting Image Scans With Kyverno
(Last Updated August 2022)
The subject of vulnerabilities in container images is a serious business. As an image author yourself, one of the things you should be doing is ensuring you know what those vulnerabilities are and that you aren't relying on what a scan told you three months ago to make decisions about running it today. Bring Kubernetes into the mix, and you're probably running lots of them at scale which makes this more difficult. In this article, I want to show something which can help with both ends of that process: producing vulnerability scans and using them to make decisions on whether to run an image under Kubernetes. Using a combination of GitHub actions, Trivy, Cosign, and Kyverno, you will have an automated, end-to-end system which allows you to not only produce vulnerability information but be able to make those critical decisions and in an on-going, automated fashion.
The first step to gaining visibility into your image vulnerabilities is to produce such a list. There are a few tools which can assist here, the most popular seem to be Grype and Trivy. The second step is to plumb this scan into your CI pipeline. Scanning at build time is good (and essential) but because new vulnerabilities are found all the time, a one-time scan is no good if you're running that same image version weeks or months down the line. You need a way for that image to get re-scanned periodically. The third step is to make that list of discovered vulnerabilities available somewhere else so you can make decisions based upon it. Good for you I have all three things covered (if you're using GitHub, that is).
The follow is a GitHub Action I have been tinkering on for some time now and I think I finally have it in a good enough position where it checks all of these boxes and works quite well. With the release of Trivy 0.31, it makes things even simpler and more streamlined and so if you read this earlier I have since updated the article with new information.
1name: vulnerability-scan
2on:
3 workflow_dispatch: {}
4 schedule:
5 - cron: '23 1 * * *' # Every day at 01:23
6env:
7 REGISTRY: ghcr.io
8 IMAGE_NAME: ${{ github.repository }}
9jobs:
10 scan:
11 runs-on: ubuntu-20.04
12 permissions:
13 contents: read
14 outputs:
15 scan-digest: ${{ steps.calculate-scan-hash.outputs.scan_digest }}
16 steps:
17 - name: Scan for vulnerabilities
18 uses: aquasecurity/trivy-action@1db49f532692e649dc5dc43c7c0444dac4790137 # v0.7.0 (Trivy v0.31.2)
19 with:
20 image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
21 format: cosign-vuln
22 ignore-unfixed: true
23 output: scan.json
24
25 - name: Calculate scan file hash
26 id: calculate-scan-hash
27 run: |
28 SCAN_DIGEST=$(sha256sum scan.json | awk '{print $1}')
29 echo "::set-output name=scan_digest::$SCAN_DIGEST"
30 echo "Hash of scan.json is: $SCAN_DIGEST"
31
32 - name: Upload vulnerability scan report
33 uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # v3.1.0
34 with:
35 name: scan.json
36 path: scan.json
37 if-no-files-found: error
38
39 attest:
40 runs-on: ubuntu-20.04
41 permissions:
42 contents: write
43 actions: read
44 packages: write
45 id-token: write
46 env:
47 SCAN_DIGEST: "${{ needs.scan.outputs.scan-digest }}"
48 needs: scan
49 steps:
50 - name: Download scan
51 uses: actions/download-artifact@fb598a63ae348fa914e94cd0ff38f362e927b741 # v3.0.0
52 with:
53 name: scan.json
54
55 - name: Verify scan
56 run: |
57 set -euo pipefail
58 echo "Hash of scan.json should be: $SCAN_DIGEST"
59 COMPUTED_HASH=$(sha256sum scan.json | awk '{print $1}')
60 echo "The current computed hash for scan.json is: $COMPUTED_HASH"
61 echo "If the two above hashes don't match, scan.json has been tampered with."
62 echo "$SCAN_DIGEST scan.json" | sha256sum --strict --check --status || exit -2
63
64 - name: Install Cosign
65 uses: sigstore/cosign-installer@09a077b27eb1310dcfb21981bee195b30ce09de0 # v2.5.0
66 with:
67 cosign-release: v1.10.0
68
69 - name: Log in to GHCR
70 uses: docker/login-action@49ed152c8eca782a232dede0303416e8f356c37b # v2.0.0
71 with:
72 registry: ${{ env.REGISTRY }}
73 username: ${{ github.actor }}
74 password: ${{ secrets.GITHUB_TOKEN }}
75
76 - name: Attest Scan
77 run: cosign attest --replace --predicate scan.json --type vuln ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
78 env:
79 COSIGN_EXPERIMENTAL: "true"
This action has two jobs: scan and attest.
The scan job will:
- Use Trivy to scan your image (right now it's just set to the
latest
tag which you can configure). - Hash the scan file as a tamper-detection mechanism.
- Upload the scan as an artifact on the workflow.
The attest job will immediately follow and will:
- Download the scan report from the previous job.
- Verify the hash to ensure it hasn't been altered.
- Install Cosign which will be used to attest the scan.
- Log in to GitHub Container Registry as this is where we will push the signed attestation.
- Uses Cosign's keyless signing ability to attest the scan and replace one if it exists. Note that the predicate type used here is
vuln
. Trivy 0.31 will output the appropriate predicate type which supports this. Thevuln
attestation is short forcosign.sigstore.dev/attestation/vuln/v1
which is what will actually be set.
The last step here is important and why I decided it was time to write this article. Prior to Cosign 1.10, the replacement wasn't working properly. Now that it is, this means you can schedule this workflow (as I have done already) and know that each time the image has its scan checked, you can be assured it's always the latest one. I'm coming to that latter point just now.
Now you have scheduled scans of your images taking place. You're also attesting them in an automated and secured way. The final step in the process is to verify these scans and make decisions based upon them prior to allowing your image to actually run.
For this last piece, we can use Kyverno's image verification abilities to do all sorts of useful things. In this example, the most basic things I want to check prior to allowing this image to run in my Kubernetes environment are 1) is it signed the way I expect? and 2) is the vulnerability scan current? The first check is designed to establish that not only is it signed but that, in the case here, the signer was my specific GitHub Action. The second check is designed to ensure we're always looking at fresh data and an attacker hasn't prevented access to it in order to hide a potentially harmful vulnerability they were able to inject somehow.
Below is an example of that Kyverno policy which performs these checks.
1apiVersion: kyverno.io/v1
2kind: ClusterPolicy
3metadata:
4 name: check-vulnerabilities
5spec:
6 validationFailureAction: enforce
7 webhookTimeoutSeconds: 10
8 failurePolicy: Fail
9 rules:
10 - name: not-older-than-one-week
11 match:
12 any:
13 - resources:
14 kinds:
15 - Pod
16 verifyImages:
17 - imageReferences:
18 - "ghcr.io/chipzoller/zulu:*"
19 attestors:
20 - entries:
21 - keyless:
22 subject: "https://github.com/chipzoller/zulu/.github/workflows/*"
23 issuer: "https://token.actions.githubusercontent.com"
24 attestations:
25 - predicateType: cosign.sigstore.dev/attestation/vuln/v1
26 conditions:
27 - all:
28 - key: "{{ time_since('','{{metadata.scanFinishedOn}}','') }}"
29 operator: LessThanOrEquals
30 value: "168h"
Although, even without Kyverno knowledge, I'm willing to bet you can figure out most of what this policy is designed to do. That's because Kyverno does not require a programming language and uses common YAML paradigms and idioms with which you're already familiar making it simple to read and write. If you aren't familiar with Kyverno, I recommend my Exploring Kyverno blog series here.
Let's walk through this policy and explain its functions.
spec.validationFailureAction
tells Kyverno what action to take if the validation defined in the rule fails. In this caseenforce
means, "block this thing from running."spec.webhookTimeoutSeconds
tells Kyverno the maximum length of time it should wait before timing this check out. It's at 10 seconds which is plenty long.spec.failurePolicy
tells the Kubernetes API server what should happen if it doesn't get a response from Kyverno.Fail
here means to deny the request. Another option isIgnore
.spec.rules[0].match
shows we're matching on Pods, no matter where they come from.spec.rules[0].verifyImages[0].imageReferences
tells Kyverno on which container images it should perform this validation check. I'm naming my test image (which you can also use) namedghcr.io/chipzoller/zulu
and I'm checking against all the tags. You can list multiple images here, name specific tags, and other things.spec.rules[0].verifyImages[0].attestors[0]
tells Kyverno that for the specific image reference, it needs to make sure it was signed in keyless mode and that the subject is a workflow from my specific GitHub repository (note the wildcard at the end as I'm not naming a specific workflow file) and that the issuer of that signature is GitHub Actions.spec.rules[0].verifyImages[0].attestations[0]
tells Kyverno to search for a predicate specifically namedcosign.sigstore.dev/attestation/vuln/v1
and to deny the Pod if the scan is any older than seven (7) days (168 hours).
Let's now put all these pieces together and test this policy with a real image which was scanned, signed, and attested with this process.
Create the Kyverno policy in your cluster and try to submit a Pod which names a container image which meets these criteria. I'll be using the same demo image named in the policy, and hopefully if you, the reader, try this out at some point in the future I haven't broken anything--but it's very possible!
1apiVersion: v1
2kind: Pod
3metadata:
4 name: mypod
5spec:
6 containers:
7 - name: zulu
8 image: ghcr.io/chipzoller/zulu:latest
1$ k apply -f pod.yaml
2pod/mypod created
I'm using a release candidate of Kyverno 1.7.2 here and the Pod is successfully created. Let's test a failure to ensure the policy is truly working. It just so happens that, as of this writing, the last attested scan took place about twelve hours ago based upon my GitHub Action run history, so let's crank down on that time field in our Kyverno policy so it's even lower than that which should produce a failure. How about changing that value
field from 168h
to something like 10h
. Replace the Kyverno policy and let's try again.
1$ k apply -f pod.yaml
2Error from server: error when creating "pod.yaml": admission webhook "mutate.kyverno.svc-fail" denied the request:
3
4resource Pod/default/mypod was blocked due to the following policies
5
6check-vulnerabilities:
7 not-older-than-one-week: 'failed to verify signature for ghcr.io/chipzoller/zulu:latest:
8 .attestors[0].entries[0].keyless: attestation checks failed for ghcr.io/chipzoller/zulu:latest
9 and predicate cosign.sigstore.dev/attestation/vuln/v1'
As you can see Kyverno was able to catch this and see the attested scan wasn't within the time we wanted and so that Pod was prevented from running.
This is a really good start to ensuring you are continually attesting to the latest vulnerabilities in your images, but the next thing you'll probably want to do is build some sort of an allow/deny list of the CVEs that might be reported so you can state which do/don't matter in your case. We'll have to save that subject for another blog, however. So hopefully this was helpful and please ping me with any feedback or criticism you may have on this post.
Special thanks to Jim Bugwadia, the bringer of these abilities to Kyverno and also the original author of the GitHub Action workflow I presented here. Thanks, Jim!