Topic 3: Kubernetes Affinity (Part 1 Taints and Tolerations)

·

4 min read

Kubernetes (K8s) scheduler often uses simple rules based on resource availability to place pods on nodes. What if you would want to specify your own rule where the pods go. That’s where Kubernetes affinity and anti-affinity come in. They are advanced K8s scheduling techniques that can help you create flexible scheduling policies.

In general, affinity enables the Kubernetes scheduler to place a pod either on a group of nodes or a pod relative to the
placement of other pods. To control pod placements on a group of nodes, a user needs to use node affinity rules. In contrast,
pod affinity or pod anti-affinity rules provide the ability to control pod placements relative to other pods.

Lets look into the different affinity in Kubernetes

TechniqueSummary
Taints and TolerationsAllowing a Node to control which pods can be run on them and which pods will be repelled.
NodeSelectorAssigning a Pod to a specific Node using Labels
Node AffinitySimilar to NodeSelector but flexible such as adding “Required” and “Preferred” Rules
Pod Affinity and Anti-AffinityCo-locating Pods or placing Pods away from each other based on Affinity and Anti-Affinity Rules

Taints and Tolerations

Taints, nodes have control over pod placement. Taints allow nodes to define which pods can be placed on them and which pods are repelled away from them.

For example, suppose you have a node with special hardware and want the scheduler only to deploy pods requiring the special hardware.
You can use Tolerations for the node’s Taints to meet this requirement.

The pods that require special hardware must define toleration for the Taints on those nodes. When you taint a node, it will repel all the pods except those that have a toleration for that taint. A node can have one or many taints associated with it.

A taint can produce three possible outcome:

  • NoSchedule-The Kubernetes scheduler will only allow scheduling pods that have tolerations for the tainted nodes.

  • PreferNoSchedule-The Kubernetes scheduler will try to avoid scheduling pods that don’t have tolerations for the tainted nodes.

  • NoExecute-Kubernetes will evict the running pods from the nodes if the pods don’t have tolerations for the tainted nodes.

For example

If you need to dedicate a group of worker nodes for a set of users, you can add a taint to those nodes, such as by using this command:

kubectl taint nodes nodename dedicated=groupName:NoSchedule

For specialized hardware

kubectl taint nodes nodename special=true:NoSchedule 
or 
kubectl taint nodes nodename special=true:PreferNoSchedule

How to use Taint and Toleration

Let’s assume that we need to deploy the front-end application pods so that they are placed only on front-end nodes. We also must ensure that new pods are not scheduled into master nodes because those nodes run control plane components such as etcd.

Listing the nodes

kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
  NodeName                                        TaintKey                                                            TaintValue   TaintEffect
  cluster01-master-1  node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd   true,true    NoSchedule,NoExecute
  cluster01-master-2  node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd   true,true    NoSchedule,NoExecute
  cluster01-master-3  node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd   true,true    NoSchedule,NoExecute
  cluster01-worker-1   <none>                                                              <none>       <none>

Lets taint worker-1 node

kubectl taint nodes cluster01-worker-1 app=frontend:NoSchedule
  node/cluster01-worker-1 tainted

So let’s say you would want to deploy a pod to cluster01-worker-1. Notice the tolerations section of the pod spec: We have added a toleration for the taint so that the pod can be scheduled on the worker node.

kubectl edit deployment nginx -n frontend
  deployment.apps/nginx edited

  kubectl get deployment nginx -n frontend -o yaml
  apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "3"
    creationTimestamp: "2024-11-14T09:39:37Z"
    generation: 3
    labels:
      run: nginx
    name: nginx
    namespace: frontend
    resourceVersion: "13368509"
    selfLink: /apis/apps/v1/namespaces/frontend/deployments/nginx
    uid: f56f026f-3a92-4bbc-c185-3110426bba335
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 6
    selector:
      matchLabels:
        run: nginx
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        creationTimestamp: null
        labels:
          run: nginx
      spec:
        containers:
        - image: nginx
          imagePullPolicy: Always
          name: nginx
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: app
          operator: Equal
          value: frontend

By running pod’s status and events we shall notice the pod is deployed on the node

kubectl get events -n frontend

How do we untaint a node?

We can use kubectl taint but adding an hyphen at the end to remove the taint (untaint the node):

kubectl taint nodes cluster01-worker-1 app=frontend:NoSchedule~

So what happened to the pod then, well it will be evicted

  • NoExecute effect

    If a pod has no toleration for the taint, it will be evicted immediately. If a pod has a toleration for the taint, but it doesn't specify tolerationSeconds, it will stay bound to the node forever. If a pod has a toleration for the taint and it does specify tolerationSeconds, it will stay bound for that amount of time.

  • Other taints

    If a pod has a toleration that matches the taint on the node, the pod can be scheduled on the node.