Using Custom Registries with Tanzu Kubernetes Grid
I've had several requests from people who want to use Tanzu Kubernetes Grid (TKG) with their own registries and have had problems doing so. This could be either something in a lab environment or even in a production environment where they have replaced TLS certificates with those signed by an internal, enterprise certificate authority. If you've ever seen the message
Failed to pull image "<registry_name>/<image_name>:": rpc error: code = Unknown desc = failed to pull and unpack image "<registry_name>/<image_name>:": failed to resolve reference "<registry_name>/<image_name>:": failed to do request: Head https://<registry_name>/<image_name>:: x509: certificate signed by unknown authority
then you know what I mean. The problem is, most of the solutions out there basically tell you, "screw security, just turn it off!" in order to get it to work, but I always take issue with these things. While it does demonstrate the functionality in question, I think it teaches bad habits despite coming with warnings like "don't do this in production." Well, but what if you are in production, you're using an existing registry, and you do have custom certificates (whether they be self-signed or otherwise)? You need to be able to pull images but you also want to do it securely. That's what this post is about: integrating TKG with your own registries and doing it in a SECURE manner.
I'm going to assume a deployment to vSphere here, using Photon OS, and on TKG v1.2 which was released a couple of weeks ago. Make sure to have your certificate authority's root certificate available (the one used to sign the client cert assigned to your registry). Although I'm in a lab, I try to model things like real production environments so I can pass off practical info to you. I've got a Harbor registry established (see my article here if you'd like help installing it standalone) which is using a custom TLS certificate signed by my internal CA. Make sure your CA cert is in PEM format.
We need to modify a file which contains the base Cluster API manifest used when deploying to vSphere. Open the file at ~/.tkg/providers/infrastructure-vsphere/v0.7.1/ytt/base-template.yaml
and jump down to the KubeadmControlPlane
resource. If you read my article on Behind the Scenes with Cluster API Provider vSphere you'll be familiar with this resource type. The KubeadmControlPlane
is a resource which describes how Cluser API will provision the control plane nodes using the kubeadm
bootstrap provider. There is a similiar resource called KubeadmConfigTemplate
which is used to provide the same but for the worker nodes. We will have to modify both of these resources in the base-template.yaml
file if we want all of our nodes to trust our internal CA.
Scroll down to where you see the preKubeadmCommands
section. It should have this by default.
1preKubeadmCommands:
2- hostname "{{ ds.meta_data.hostname }}"
3- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
4- echo "127.0.0.1 localhost" >>/etc/hosts
5- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
6- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
This section of the file contains a list of commands that should be run on the deployed machine prior to running kubeadm
. As you can see, the default entries are just performing operations on the hostname and /etc/hosts
. But this is also the time when we want to inject our CA certificate so that when the CRI (containerd in this case) comes online, there is trust in place. Add the following lines immediately under the last line of what already exists. Ensure you get the indentation just right.
1- |
2 cat <<EOF > /etc/ssl/certs/myca.pem
3 -----BEGIN CERTIFICATE-----
4 MIIDWzCCAkOgAwIBAgIQeXc+Qv+ngYZDNaSsPUI7kDANBgkqhkiG9w0BAQUFADBA
5 <snip>
6 mSddDjR+db3N3XEpThE4AyFnJYErRSnZdQSROmKQGgvXx/qk+TEvzpQa/6oqebE=
7 -----END CERTIFICATE-----
8 EOF
9- openssl x509 -in /etc/ssl/certs/myca.pem -text >> /etc/pki/tls/certs/ca-bundle.crt
10- c_rehash
The full section should then look similar to this.
1preKubeadmCommands:
2- hostname "{{ ds.meta_data.hostname }}"
3- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
4- echo "127.0.0.1 localhost" >>/etc/hosts
5- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
6- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
7- |
8 cat <<EOF > /etc/ssl/certs/myca.pem
9 -----BEGIN CERTIFICATE-----
10 MIIDWzCCAkOgAwIBAgIQeXc+Qv+ngYZDNaSsPUI7kDANBgkqhkiG9w0BAQUFADBA
11 <snip>
12 mSddDjR+db3N3XEpThE4AyFnJYErRSnZdQSROmKQGgvXx/qk+TEvzpQa/6oqebE=
13 -----END CERTIFICATE-----
14 EOF
15- openssl x509 -in /etc/ssl/certs/myca.pem -text >> /etc/pki/tls/certs/ca-bundle.crt
16- c_rehash
What we're doing here is writing our CA root or signing certificate into a file called myca.pem
and later adding it into the list of trusted CA certificates for the entire system. We are not telling containerd or anything else to consider this an untrusted registry. Our registry will be fully trusted as any other resource would be.
Now repeat the process for the KubeadmConfigTemplate
resource near the bottom of the file. Once again, ensure your indentation is correct. Finally, save the file.
Once complete, let's deploy a test cluster to ensure it works.
1tkg create cluster cz08 --plan=dev --vsphere-controlplane-endpoint-ip=192.168.1.223
If the tkg
command errors out before printing the line "Creating workload cluster" then you've foobar'd your base-template.yaml
file, probably in the infernal YAML indentation rules. So check that and straighten it out as needed.
Once we've gotten our credentials and are in the cluster, let's test and ensure we can pull an image from our internal registry.
1$ k run util --image harbor2.zoller.com/library/chipzoller/util:latest -- "echo hello"
2pod/util created
We'll check the logs and see if we have our message.
1$ k logs util
2hello
And after describing the pod when completed we can see the message Successfully pulled image "harbor2.zoller.com/library/chipzoller/util:latest" in 242.971197ms
so we know it did indeed work.
So, there you go, it worked as intended. We are now able to securely address any container image registry we wish, whether deployed as a TKG shared service or not, and whether signed by a public CA or not.
I hope this helps you to feel more confident (and secure) in being able to operate TKG in your vSphere environment.
EDIT (11/2/20): It was pointed out by a couple of folks that the better way to accomplish this goal is not to modify the base-template.yaml
file but rather use ytt
, which TKG uses by default as a templating tool, to modify an overlay file. Thank you to those who suggested this improvement. Here are steps below on how to use the overlay method.
Inside of your providers folder (for vSphere this would be ~/.tkg/providers/infrastructure-vsphere/ytt/
) there is a file called vsphere-overlay.yaml
that exists but is almost empty. Rather than modifying base-template.yaml
directly, we will add our preKubeadmCommands
into this overlay file which, when rendered by the tkg
CLI tool, will add these commands to the finalized manifest which gets used to create the clusters. This power is afforded by the ytt
tool which was formerly part of k14s but now Carvel. I haven't yet really dug into ytt
, so a more thorough article will have to wait until that time.
Open the vsphere-overlay.yaml
file and insert the following contents.
1#! Please add any overlays specific to vSphere provider under this file.
2#@ load("@ytt:overlay", "overlay")
3
4#! Add and trust your custom CA certificate on all Control Plane nodes.
5#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
6---
7spec:
8 kubeadmConfigSpec:
9 preKubeadmCommands:
10 #@overlay/append
11 - |
12 cat <<EOF > /etc/ssl/certs/myca.pem
13 -----BEGIN CERTIFICATE-----
14 MIIDWzCCAkOgAwIBAgIQeXc+Qv+ngYZDNaSsPUI7kDANBgkqhkiG9w0BAQUFADBA
15 <snip>
16 mSddDjR+db3N3XEpThE4AyFnJYErRSnZdQSROmKQGgvXx/qk+TEvzpQa/6oqebE=
17 -----END CERTIFICATE-----
18 EOF
19 #@overlay/append
20 - openssl x509 -in /etc/ssl/certs/myca.pem -text >> /etc/pki/tls/certs/ca-bundle.crt
21 #@overlay/append
22 - c_rehash
23
24
25#! Add and trust your custom CA certificate on all worker nodes.
26#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"})
27---
28spec:
29 template:
30 spec:
31 preKubeadmCommands:
32 #@overlay/append
33 - |
34 cat <<EOF > /etc/ssl/certs/myca.pem
35 -----BEGIN CERTIFICATE-----
36 MIIDWzCCAkOgAwIBAgIQeXc+Qv+ngYZDNaSsPUI7kDANBgkqhkiG9w0BAQUFADBA
37 <snip>
38 mSddDjR+db3N3XEpThE4AyFnJYErRSnZdQSROmKQGgvXx/qk+TEvzpQa/6oqebE=
39 -----END CERTIFICATE-----
40 EOF
41 #@overlay/append
42 - openssl x509 -in /etc/ssl/certs/myca.pem -text >> /etc/pki/tls/certs/ca-bundle.crt
43 #@overlay/append
44 - c_rehash
Once again, ensure that your spacing and indentation is correct. The #@overlay/append
statements which appear before each additional command we're adding must be aligned with the dash marker. This tells ytt
to add the next command in the array to the existing commands.
Provision a TKG cluster and ensure there are no errors thrown. If everything in the overlay file is correct, your workload clusters should be built just as they were previously, except now this makes for a cleaner separation of default and user-added commands.