Kubernetes Troubleshoot

Debug pods, services, deployments, and networking issues in Kubernetes.

Instructions

Identify the affected resource (pod, service, deployment)
Get current state with kubectl get and kubectl describe
Check logs if applicable
Diagnose based on status/events
Provide specific remediation steps

Diagnostic commands

# Pod debugging
kubectl get pods -o wide
kubectl describe pod <pod>
kubectl logs <pod> [--previous] [-c container]
kubectl get events --sort-by=.lastTimestamp

# Service/networking
kubectl get svc,endpoints
kubectl describe svc <service>
kubectl get ingress

# Resource issues
kubectl top pods
kubectl describe node <node> | grep -A5 "Allocated resources"

# Debug pod (ephemeral container)
kubectl debug -it <pod> --image=busybox --target=<container>

Common issues

| Status | Cause | Solution | | -------------------------- | -------------------- | -------------------------------------- | | Pending | No resources | Check node capacity, resource requests | | Pending | No matching node | Check nodeSelector, taints/tolerations | | ImagePullBackOff | Bad image/auth | Verify image name, imagePullSecrets | | CrashLoopBackOff | App crashing | Check logs, entrypoint, health probes | | CreateContainerConfigError | Bad configmap/secret | Verify referenced configs exist | | Evicted | Node pressure | Check node conditions, resource limits |

Service not reachable checklist

Pod running? kubectl get pods -l app=<app>
Pod ready? Check readiness probe
Endpoints exist? kubectl get endpoints <svc>
Service selector matches pod labels?
Port/targetPort correct?
NetworkPolicy blocking traffic?

Rules

MUST check events with kubectl describe before diagnosing
MUST check logs for CrashLoopBackOff
Never delete pods/resources without user approval
Never apply changes without showing the diff first
Always specify namespace if not default: -n <namespace>