Chaos Mesh
Chaos engineering is a discipline that studies how these failures can occur and provides methodologies to help avoid them. By understanding the root cause of failures, chaos engineers can develop plans to prevent or mitigate them.
In this article I will look into Chaos Mesh. Chaos Mesh is an open source cloud-native Chaos Engineering platform. It offers various types of fault simulation and has an enormous capability to orchestrate fault scenarios. I am highlighting Chaos Mesh as a tool, there are many others out there like Litmus etc, I have implemented this in some companies I have work with as a tool to test the resilience of the Kubernetes cluster
So lets try out Chaos Mesh in your cluster
Lets get an app, or you have your own you can skip this
git clone https://github.com/dockersamples/example-voting-app.git
Deploy the application to your Kubernetes cluster
kubectl create -f example-voting-app/k8s-specifications/
Install Chaos Mesh
helm repo add chaos-mesh https://charts.chaos-mesh.org helm repo update helm install chaos-mesh chaos-mesh/chaos-mesh
Testing out one of the killing pod script, apply and the vote pod will be killed, and a new pod will be created.
apiVersion: chaos-mesh.org/v1alpha1 kind: PodChaos metadata: name: pod-kill-example namespace: default spec: action: pod-kill mode: one selector: namespaces: - default labelSelectors: app: vote
kubectl apply -f pod-kill.yaml
There are also other scripts you can test out like this
apiVersion: chaos-mesh.org/v1alpha1 kind: PodChaos metadata: name: pod-failure-example spec: action: pod-failure mode: one duration: "30s" selector: labelSelectors: "app.kubernetes.io/component": "tikv"
You can find more scripts here https://github.com/chaos-mesh/chaos-mesh/tree/master/examples
Conclusion
If you plan to roll out such in your organization, do always have a set of plans and script you would want to test out. Usually larger organization has their internal disaster recovery team performing this with DevOps assisting on a yearly basis to test out the resilience of their infrastructure.