Chaos Mesh

·

2 min read

Chaos engineering is a discipline that studies how these failures can occur and provides methodologies to help avoid them. By understanding the root cause of failures, chaos engineers can develop plans to prevent or mitigate them.

In this article I will look into Chaos Mesh. Chaos Mesh is an open source cloud-native Chaos Engineering platform. It offers various types of fault simulation and has an enormous capability to orchestrate fault scenarios. I am highlighting Chaos Mesh as a tool, there are many others out there like Litmus etc, I have implemented this in some companies I have work with as a tool to test the resilience of the Kubernetes cluster

So lets try out Chaos Mesh in your cluster

  1. Lets get an app, or you have your own you can skip this

     git clone https://github.com/dockersamples/example-voting-app.git
    

    Deploy the application to your Kubernetes cluster

     kubectl create -f example-voting-app/k8s-specifications/
    
  2. Install Chaos Mesh

     helm repo add chaos-mesh https://charts.chaos-mesh.org
     helm repo update
     helm install chaos-mesh chaos-mesh/chaos-mesh
    
  3. Testing out one of the killing pod script, apply and the vote pod will be killed, and a new pod will be created.

     apiVersion: chaos-mesh.org/v1alpha1
     kind: PodChaos
     metadata:
       name: pod-kill-example
       namespace: default
     spec:
       action: pod-kill
       mode: one
       selector:
         namespaces:
           - default
         labelSelectors:
           app: vote
    
     kubectl apply -f pod-kill.yaml
    
  4. There are also other scripts you can test out like this

     apiVersion: chaos-mesh.org/v1alpha1
     kind: PodChaos
     metadata:
       name: pod-failure-example
     spec:
       action: pod-failure
       mode: one
       duration: "30s"
       selector:
         labelSelectors:
           "app.kubernetes.io/component": "tikv"
    
  5. You can find more scripts here https://github.com/chaos-mesh/chaos-mesh/tree/master/examples

Conclusion

If you plan to roll out such in your organization, do always have a set of plans and script you would want to test out. Usually larger organization has their internal disaster recovery team performing this with DevOps assisting on a yearly basis to test out the resilience of their infrastructure.