Kubernetes Troubleshoot
Debug pods, services, deployments, and networking issues in Kubernetes.
Instructions
- Identify the affected resource (pod, service, deployment)
- Get current state with
kubectl getandkubectl describe - Check logs if applicable
- Diagnose based on status/events
- Provide specific remediation steps
Diagnostic commands
# Pod debugging
kubectl get pods -o wide
kubectl describe pod <pod>
kubectl logs <pod> [--previous] [-c container]
kubectl get events --sort-by=.lastTimestamp
# Service/networking
kubectl get svc,endpoints
kubectl describe svc <service>
kubectl get ingress
# Resource issues
kubectl top pods
kubectl describe node <node> | grep -A5 "Allocated resources"
# Debug pod (ephemeral container)
kubectl debug -it <pod> --image=busybox --target=<container>
Common issues
| Status | Cause | Solution | | -------------------------- | -------------------- | -------------------------------------- | | Pending | No resources | Check node capacity, resource requests | | Pending | No matching node | Check nodeSelector, taints/tolerations | | ImagePullBackOff | Bad image/auth | Verify image name, imagePullSecrets | | CrashLoopBackOff | App crashing | Check logs, entrypoint, health probes | | CreateContainerConfigError | Bad configmap/secret | Verify referenced configs exist | | Evicted | Node pressure | Check node conditions, resource limits |
Service not reachable checklist
- Pod running?
kubectl get pods -l app=<app> - Pod ready? Check readiness probe
- Endpoints exist?
kubectl get endpoints <svc> - Service selector matches pod labels?
- Port/targetPort correct?
- NetworkPolicy blocking traffic?
Rules
- MUST check events with
kubectl describebefore diagnosing - MUST check logs for CrashLoopBackOff
- Never delete pods/resources without user approval
- Never apply changes without showing the diff first
- Always specify namespace if not default:
-n <namespace>
微信扫一扫