Kubernetes Cluster Add-On Bootstrapping: Part 2, TKG

In the first part, I illustrated a simple and flexible way to automate the add-on deployment process to TKGI clusters using a container that clones a Git repo and applies those manifests. In this part, I'm going to show how you can use that same method with Tanzu Kubernetes Grid (TKG) clusters.

Just to levelset since there may be confusion on the differences between TKGI and TKG, Tanzu Kubernetes Grid Integrated Edition (TKGI) is a rename of what was once Enterprise PKS (AKA VMware PKS AKA just "PKS" if you will). This is an offering jointly developed by Pivotal and VMware which uses BOSH as the lifecycle manager and, most commonly, uses NSX-T for its CNI. Tanzu Kubernetes Grid (TKG), although sharing most of the same letters in the acronym, is a completely separate product which is based on Cluster API rather than BOSH and is in its nascent stages presently. The TKG offering I'm covering here is the so-called "multi-cloud" or "standalone" offering which does not ship with vSphere with Kubernetes (either Standard, requiring VCF, or Basic, without). The steps I outline here should also work with the TKG "service" flavor that ships with vSphere with Kubernetes, but only with the tkg CLI tool used in addition.

In the TKGI scenario in part one, you'll recall we leveraged Ops Manager's add-ons functionality found within a plan to store our bootstrap scaffolding manifests. TKG has no Ops Manager, but it does have "plans" per se. Specifically, these plans define the Cluster API resources and other files needed to build a K8s cluster. In order for us to insert our add-ons bootstrapper, we will need to create and modify some of these files.

If, after creating a TKG management cluster, you check your home directory, you'll find a file tree structure like the following below.

NOTE: You can expand and collapse long code blocks like these in our Clarity theme for Hugo.

  1$ tree ~/.tkg/
  2/home/chip/.tkg/
  3├── bom
  4│   ├── bom-1.1.0+vmware.1.yaml
  5│   ├── bom-1.1.2+vmware.1.yaml
  6│   ├── bom-1.1.3+vmware.1.yaml
  7│   ├── bom-1.17.11+vmware.1.yaml
  8│   ├── bom-1.17.6+vmware.1.yaml
  9│   ├── bom-1.17.9+vmware.1.yaml
 10│   ├── bom-1.18.8+vmware.1.yaml
 11│   ├── bom-1.2.0+vmware.1.yaml
 12│   └── bom-tkg-1.0.0.yaml
 13├── config.yaml
 14├── features.json
 15└── providers
 16    ├── bootstrap-kubeadm
 17    │   └── v0.3.10
 18    │       └── bootstrap-components.yaml
 19    ├── cluster-api
 20    │   └── v0.3.10
 21    │       └── core-components.yaml
 22    ├── config.yaml
 23    ├── config_default.yaml
 24    ├── control-plane-kubeadm
 25    │   └── v0.3.10
 26    │       └── control-plane-components.yaml
 27    ├── infrastructure-aws
 28    │   ├── v0.5.5
 29    │   │   ├── cluster-template-definition-dev.yaml
 30    │   │   ├── cluster-template-definition-prod.yaml
 31    │   │   ├── infrastructure-components.yaml
 32    │   │   └── ytt
 33    │   │       ├── base-template.yaml
 34    │   │       └── overlay.yaml
 35    │   └── ytt
 36    │       └── aws-overlay.yaml
 37    ├── infrastructure-azure
 38    │   ├── v0.4.8
 39    │   │   ├── cluster-template-definition-dev.yaml
 40    │   │   ├── cluster-template-definition-prod.yaml
 41    │   │   ├── infrastructure-components.yaml
 42    │   │   └── ytt
 43    │   │       ├── base-template.yaml
 44    │   │       └── overlay.yaml
 45    │   └── ytt
 46    │       └── azure-overlay.yaml
 47    ├── infrastructure-docker
 48    │   ├── v0.3.10
 49    │   │   ├── cluster-template-definition-dev.yaml
 50    │   │   ├── cluster-template-definition-prod.yaml
 51    │   │   ├── infrastructure-components.yaml
 52    │   │   ├── metadata.yaml
 53    │   │   └── ytt
 54    │   │       ├── base-template.yaml
 55    │   │       └── overlay.yaml
 56    │   └── ytt
 57    │       └── docker-overlay.yaml
 58    ├── infrastructure-tkg-service-vsphere
 59    │   ├── v1.0.0
 60    │   │   ├── cluster-template-definition-dev.yaml
 61    │   │   ├── cluster-template-definition-prod.yaml
 62    │   │   └── ytt
 63    │   │       ├── base-template.yaml
 64    │   │       └── overlay.yaml
 65    │   └── ytt
 66    │       └── tkg-service-vsphere-overlay.yaml
 67    ├── infrastructure-vsphere
 68    │   ├── v0.7.1
 69    │   │   ├── cluster-template-definition-dev.yaml
 70    │   │   ├── cluster-template-definition-prod.yaml
 71    │   │   ├── infrastructure-components.yaml
 72    │   │   └── ytt
 73    │   │       ├── add_csi.yaml
 74    │   │       ├── base-template.yaml
 75    │   │       ├── csi-vsphere.lib.txt
 76    │   │       ├── csi.lib.yaml
 77    │   │       └── overlay.yaml
 78    │   └── ytt
 79    │       └── vsphere-overlay.yaml
 80    ├── providers.md5sum
 81    └── ytt
 82        ├── 01_plans
 83        │   ├── dev.yaml
 84        │   ├── oidc.yaml
 85        │   └── prod.yaml
 86        ├── 02_addons
 87        │   ├── cni
 88        │   │   ├── add_cni.yaml
 89        │   │   ├── antrea
 90        │   │   │   ├── antrea.lib.yaml
 91        │   │   │   └── antrea_overlay.lib.yaml
 92        │   │   └── calico
 93        │   │       ├── calico.lib.yaml
 94        │   │       └── calico_overlay.lib.yaml
 95        │   └── metadata
 96        │       └── add_cluster_metadata.yaml
 97        ├── 03_customizations
 98        │   ├── add_sc.yaml
 99        │   ├── filter.yaml
100        │   ├── registry_skip_tls_verify.yaml
101        │   └── remove_mhc.yaml
102103        └── lib
104            ├── helpers.star
105            └── validate.star

We need to do a couple of things if we want to run arbitrary manifests after the cluster is created, similar to TKGI's ability to run an add-ons job through BOSH. First, depending on the infrastructure provider you're using (I'm using just simple vSphere here), we need to create a folder. This should be called 04_user_customizations and created at ~/.tkg/providers/ytt/. Inside this directory, we'll also create a file called cluster_addons.yaml which will contain not only the bootstrapping manifests, but also directives to the ytt tool on how and when to insert these YAML documents into our cluster. Create the file now, and paste in the following content.

 1#@ load("@ytt:overlay", "overlay")
 2#@ load("@ytt:data", "data")
 3#@ load("@ytt:yaml", "yaml")
 4
 5#@ def cluster_addons():
 6---
 7apiVersion: v1
 8kind: ServiceAccount
 9metadata:
10  name: cluster-build
11  namespace: default
12---
13apiVersion: rbac.authorization.k8s.io/v1
14kind: ClusterRoleBinding
15metadata:
16  name: cluster-build
17subjects:
18- kind: ServiceAccount
19  name: cluster-build
20  namespace: default
21roleRef:
22  kind: ClusterRole
23  name: cluster-admin
24  apiGroup: rbac.authorization.k8s.io
25---
26apiVersion: v1
27kind: Secret
28metadata:
29  name: cluster-build
30  namespace: default
31type: Opaque
32data:
33  gitrepo: <base64_encoded_git_repo_URL_goes_here>
34---
35apiVersion: batch/v1
36kind: Job
37metadata:
38  name: cluster-build
39  namespace: default
40spec:
41  template:
42    spec:
43      initContainers:
44      - name: cluster-build-apply
45        image: chipzoller/k8s-addon-bootstrap:latest
46        command: ["/bin/sh"]
47        args: ["-c", "git clone $(GIT_REPO) manifests/ && kubectl apply -f manifests/ -R"]
48        env:
49        - name: GIT_REPO
50          valueFrom:
51            secretKeyRef:
52              name: cluster-build
53              key: gitrepo
54      containers:
55      - name: cluster-build-cleanup
56        image: chipzoller/k8s-addon-bootstrap:latest
57        command: ["kubectl", "delete", "secrets/cluster-build"]
58      restartPolicy: Never
59      serviceAccountName: cluster-build
60#@ end
61
62#@ if data.values.TKG_CLUSTER_ROLE == "workload" and data.values.CLUSTER_PLAN == "dev":
63
64---
65apiVersion: addons.cluster.x-k8s.io/v1alpha3
66kind: ClusterResourceSet
67metadata:
68  name: #@ "{}-cluster-addons".format(data.values.CLUSTER_NAME)
69  labels:
70    cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
71spec:
72  strategy: "ApplyOnce"
73  clusterSelector:
74    matchLabels:
75      tkg.tanzu.vmware.com/cluster-name: #@ data.values.CLUSTER_NAME
76  resources:
77  - name: #@ "{}-cluster-addons".format(data.values.CLUSTER_NAME)
78    kind: ConfigMap
79---
80apiVersion: v1
81kind: ConfigMap
82metadata:
83  name: #@ "{}-cluster-addons".format(data.values.CLUSTER_NAME)
84data:
85  value: #@ yaml.encode(cluster_addons())
86
87#@ end

Without going into too much detail, here's what's essentially happening. When creating a cluster with the tkg CLI, you specify a plan. By default, TKG ships with plans named dev and prod. Each plan is a template which gathers multiple files from different places, each with their own directives, that then results in a fully-defined cluster. Using the ytt tool and its powerful templating and overlaying abilities, pieces from multiple files are inserted at the appropriate places and at the correct times. What we're doing here is making use of a primitive in the Cluster API project known as a ClusterResourceSet. This resource type defines additional user-defined add-ons that should be deployed once a cluster is built. It's almost identical in purpose to the function of BOSH add-ons in TKGI.

The top of the manifest creates a definition called cluster_addons which will be referenced later similar to a variable. The bottom portion actually defines the Kubernetes manifests for a ClusterResourceSet and its source of data using a ConfigMap in this case, which has as its data the cluster_addons definition created above. The if statement sandwiched in between them instructs ytt and thereby the tkg CLI when to apply these manifests. Those two conditions are when the K8s cluster is a workload cluster (i.e., not a management cluster) and what plan is being called. In this case, the only plan which will receive these bootstrapper add-ons is the dev plan.

Save the manifest ensuring the YAML indentation is correct and don't forget to replace the gitrepo key in the Secret resource with your base64-encoded URL to your own Git repo. I've pre-populated the images in this manifest with the image I built and pushed to Docker Hub, but for production use you should strongly consider building your own from upstream. If/when you do build and push to your own registry, one of my previous articles on how to securely trust an internal registry should help tremendously, and it complements this workflow nicely. In fact, I'm using both here in this article.

Once you've got everything situated, create a new cluster from your dev plan.

1$ tkg create cluster pickles02 --plan=dev --vsphere-controlplane-endpoint-ip=192.168.1.223
2Logs of the command execution can also be found at: /tmp/tkg-20201107T091123523747106.log
3Validating configuration...
4Creating workload cluster 'pickles02'...
5Waiting for cluster to be initialized...
6Waiting for cluster nodes to be available...
7Waiting for addons installation...
8
9Workload cluster 'pickles02' created

Notice the highlighted line in the code block above. If the new add-ons file was taken, they should be installed at the final step here (along with default add-ons such as CNI). If there was a syntax error somewhere in your file, ytt will fail to parse it and you'll see an error message of some sort shortly after you issue the tkg create cluster command.

Check your cluster and see if your add-on bootstrapper ran.

1$ k get po
2NAME                    READY   STATUS       RESTARTS   AGE
3cluster-build-z9kzz     0/1     Completed    0          18m

Here we see the Job for cluster-build completed. You should now be able to see the results of all the manifests you had defined in your Git repo.

One last thing. Why don't we create a new plan so we can test these updates safely and not impact the default dev or prod plans. The process is fairly simple: In your infrastructure-vsphere folder, go into the v0.7.1 subfolder (or whatever the provider version happens to be when you read this) and copy the cluster-template-definition-dev.yaml file to a new one and call it cluster-template-definition-test.yaml. The tkg CLI will parse anything after the cluster-template-definition- string as the name of a plan, so in this case our plan is simply called test. Once the new file exists, it doesn't need to be modified.

Next, up in our cluster_addons.yaml file we created earlier, find the if statement line and change dev to test like so.

1#@ if data.values.TKG_CLUSTER_ROLE == "workload" and data.values.CLUSTER_PLAN == "test":

Alternatively, let's say you wanted to enable these cluster bootstrap add-ons for multiple plans, you can add multiple the following way.

1#@ if data.values.TKG_CLUSTER_ROLE == "workload" and data.values.CLUSTER_PLAN in ["test", "dev"]:

Depending on how you modified this line, your bootstrap add-ons should either get deployed when rolling out a test plan, or for both the test and dev plans.

And I think that's it. I hope this has been a useful article and two-part series in showing you one way you can use across different installments in the Tanzu portfolio to easily install your custom add-ons to Kubernetes clusters. Using this as a base, you can extend the capabilities many different ways, all easily by checking in new manifests to your Git repo. As always, feel free to hit me up on Twitter or LinkedIn with your feedback.