May 21, 2024

From Dev to Prod: A Comprehensive Guide to Kubernetes Best Practice

Kubernetes has revolutionized the way we deploy and manage applications, offering unparalleled flexibility and scalability. However, running Kubernetes in production environments requires adhering to best practices to ensure reliability, security, and performance. This detailed guide delves into these best practices, offering insights and examples to help you optimize your Kubernetes clusters.

Application Development

Health Checks

Ensuring that your application containers are healthy is crucial for maintaining a robust system. Kubernetes offers Readiness and Liveness probes to keep your applications in check.

Readiness Probes

Readiness probes determine if a container is ready to start accepting traffic. Implementing these ensures that only healthy pods receive traffic.

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Liveness Probes

Liveness probes detect and remedy unresponsive containers by restarting them.

livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20

Fault Tolerance

Redundancy is key to fault tolerance. Ensure you run more than one replica of each deployment:

replicas: 3

Additionally, set Pod Disruption Budgets (PDB) to maintain a minimum number of available pods during disruptions:

apiVersion: policy/v1beta1 
kind: PodDisruptionBudget 
metadata:
 name: myapp-pdb 
spec:
 minAvailable: 2 
 selector:
   matchLabels:
     app: myapp

Resources Utilization

Setting appropriate resource limits can prevent resource starvation and ensure fair distribution among containers.

Memory Limits and CPU Requests

Define memory limits and CPU requests explicitly:

resources:
 requests:
   memory: "64Mi"
   cpu: "250m"
 limits:
   memory: "128Mi"
   cpu: "500m"

Tagging Resources

Tagging resources with technical, business, and security labels helps in managing and auditing them efficiently:

metadata:
 labels:
   environment: production 
   team: backend 
   compliance: PCI-DSS

Scaling

Implement Horizontal Pod Autoscaler (HPA) for apps with variable workloads:

apiVersion: autoscaling/v1  
kind: HorizontalPodAutoscaler  
metadata:
 name: myapp-hpa  
spec:
 scaleTargetRef:
  kind: Deployment
  name: myapp
  apiVersion: apps/v1
 minReplicas: 2 
 maxReplicas: 10
 targetCPUUtilizationPercentage: 80

Be cautious with Vertical Pod Autoscaler as it’s still in beta.

Logging Setup

Effective logging is essential for monitoring and troubleshooting issues in production environments. Here’s how you can set up logging best practices:

Retention and Archival Strategy for Logs

Determine a log retention policy that meets your auditing requirements while balancing storage costs. Logs should be archived periodically based on this policy.

Retention Period: Define how long logs should be retained based on compliance requirements or operational needs.
Archival Storage: Use cost-effective storage solutions for archived logs, such as cloud-based object storage.
Automated Cleanup: Implement automated cleanup policies to delete old logs beyond the retention period.

Collecting Logs from Nodes, Control Plane, and Auditing

Ensure logs are collected from all critical components including nodes, control planes, and auditing systems.

Node Logs: Collect logs from all worker nodes to capture application-level events.
Control Plane Logs: Gather logs from Kubernetes control plane components (e.g., API server, scheduler) for cluster management insights.
Audit Logs: Enable auditing in Kubernetes to track API requests and user actions for security compliance.

Daemon on Each Node vs Sidecars for Log Collection

Prefer using a daemon on each node to collect logs instead of sidecars as it reduces overhead on individual pods.

DaemonSet Approach: Deploy log collection agents as DaemonSets which ensure that an agent runs on every node in the cluster.
- Pros: Centralized management, lower resource overhead compared to sidecars.
- Cons: Potential single point of failure if the daemon crashes.

Log Aggregation Tool

Provision a dedicated log aggregation tool like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd to centralize logs from all sources for easier analysis.

ELK Stack (Elasticsearch, Logstash, Kibana):
- Elasticsearch for storing and indexing logs.
- Logstash for processing log data before sending it to Elasticsearch.
Fluentd:
- A versatile log collector that can forward logs to various backends including Elasticsearch.
- Can be used as part of the EFK stack (Elasticsearch-Fluentd-Kibana).

By adhering to these logging best practices, you can maintain comprehensive visibility into your Kubernetes cluster’s operations while managing storage efficiency and ensuring compliance with audit requirements.

Example Configuration Using Fluentd DaemonSet

apiVersion: apps/v1  
 name: fluentd-daemonset   
 labels:   
    k8s-app: fluentd-logging   
spec:
 selector:
    matchLabels:
        k8s-app: fluentd-logging   
 template:
     metadata:
       labels:
           k8s-app: fluentd-logging   
     spec:
       containers:
         - name: fluentd   
           image: fluent/fluentd:v1.11-debian-1     
           resources:
             limits:
               memory: "200Mi"     
               cpu: "200m"     
             requests:
               memory: "200Mi"     
               cpu: "100m"