Kubernetes Expert
Full-stack Kubernetes guidance: kubectl, Helm, ArgoCD GitOps, networking, secrets, debugging. Includes patterns from this repo's k8s/ homelab setup.
Quick Reference
kubectl Essentials
# Pod debugging
kubectl get pods -A # all namespaces
kubectl describe pod <name> -n <ns> # events, status
kubectl logs <pod> -n <ns> --previous # crashed container logs
kubectl logs <pod> -c <container> -f # follow specific container
kubectl exec -it <pod> -n <ns> -- /bin/sh # shell into pod
# Resource inspection
kubectl get events -n <ns> --sort-by='.lastTimestamp'
kubectl top pods -n <ns> # resource usage
kubectl get all -n <ns> # everything in namespace
# Quick edits
kubectl edit deployment <name> -n <ns>
kubectl rollout restart deployment <name> -n <ns>
kubectl rollout status deployment <name> -n <ns>
kubectl rollout undo deployment <name> -n <ns>
# Ephemeral containers (debug without restart)
kubectl debug <pod> -it --image=busybox --target=<container> # attach to running pod
kubectl debug <pod> -it --copy-to=debug-pod --image=nicolaka/netshoot # copy pod for debug
kubectl debug node/<node> -it --image=busybox # debug node issues
Helm Commands
# Chart management
helm repo add <name> <url>
helm repo update
helm search repo <chart>
# Install/upgrade
helm install <release> <chart> -n <ns> --create-namespace -f values.yaml
helm upgrade <release> <chart> -n <ns> -f values.yaml
helm upgrade --install <release> <chart> -n <ns> # install or upgrade
# Debugging
helm list -A # all releases
helm status <release> -n <ns>
helm history <release> -n <ns>
helm get values <release> -n <ns> # current values
helm get manifest <release> -n <ns> # rendered templates
helm template <chart> -f values.yaml # dry-run render
# Rollback
helm rollback <release> <revision> -n <ns>
ArgoCD Operations
# CLI setup
argocd login <server> --grpc-web
# App management
argocd app list
argocd app get <app>
argocd app sync <app>
argocd app sync <app> --prune # remove orphaned resources
argocd app diff <app> # preview changes
# Troubleshooting
argocd app logs <app>
argocd app history <app>
argocd app rollback <app> <id>
Homelab Patterns (from k8s/)
This repo uses app-of-apps pattern with ArgoCD:
k8s/
├── argocd/
│ ├── app-of-apps.yaml # root app pointing to apps/
│ ├── apps/ # ArgoCD Application manifests
│ │ ├── cert-manager.yaml
│ │ ├── external-secrets.yaml
│ │ └── ...
│ ├── cluster-issuer/ # cert-manager ClusterIssuers
│ ├── external-secrets/ # ExternalSecret definitions
│ └── <app-name>/ # per-app configs (values.yaml, etc)
└── terraform/
├── main.tf # base infra
├── argocd/ # ArgoCD bootstrap
└── metallb-config/ # load balancer
Adding New App to Homelab
- Create ArgoCD Application in
k8s/argocd/apps/<app>.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/peterstorm/.dotfiles.git
targetRevision: HEAD
path: k8s/argocd/my-app
destination:
server: https://kubernetes.default.svc
namespace: my-app
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- Create app directory with manifests:
k8s/argocd/my-app/ - Push to git - ArgoCD auto-syncs
Debugging Workflows
Pod Not Starting
# 1. Check pod status
kubectl get pod <pod> -n <ns> -o wide
# 2. Check events
kubectl describe pod <pod> -n <ns> | grep -A 20 Events
# 3. Common issues:
# - ImagePullBackOff: wrong image/tag, missing imagePullSecrets
# - Pending: insufficient resources, node selector mismatch
# - CrashLoopBackOff: check logs
# - CreateContainerConfigError: missing configmap/secret
Service Not Reachable
# 1. Verify endpoints exist
kubectl get endpoints <svc> -n <ns>
# 2. Check service selector matches pod labels
kubectl get svc <svc> -n <ns> -o yaml | grep -A5 selector
kubectl get pods -n <ns> --show-labels
# 3. Test from within cluster
kubectl run debug --rm -it --image=busybox -- wget -qO- http://<svc>.<ns>.svc.cluster.local
Ingress Issues
# 1. Check ingress status
kubectl get ingress -n <ns>
kubectl describe ingress <name> -n <ns>
# 2. Verify TLS secret exists
kubectl get secret <tls-secret> -n <ns>
# 3. Check cert-manager certificate
kubectl get certificate -n <ns>
kubectl describe certificate <name> -n <ns>
# 4. Check ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
Secrets Management
External Secrets (homelab pattern)
ClusterSecretStore connects to secret backend:
# k8s/argocd/external-secrets/cluster-secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "https://vault.example.com"
path: "secret"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"
ExternalSecret pulls specific secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-secret
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: vault-backend
target:
name: my-secret
data:
- secretKey: password
remoteRef:
key: apps/myapp
property: password
Security best practices for ESO:
- Use namespaced SecretStore instead of ClusterSecretStore when possible
- Apply RBAC to limit which namespaces can access which secrets
- Add NetworkPolicy to restrict ESO controller egress
- Set
refreshIntervalappropriately (not too frequent)
Cert-Manager (homelab pattern)
Staging-first workflow (avoid rate limits):
# 1. Deploy with staging issuer first (untrusted cert, no rate limits)
cert-manager.io/cluster-issuer: letsencrypt-staging
# 2. Verify certificate issued successfully
kubectl get certificate -n <ns>
kubectl describe certificate <name> -n <ns>
# 3. Switch to production
cert-manager.io/cluster-issuer: letsencrypt-prod
# 4. Delete old secret to trigger re-issue
kubectl delete secret <tls-secret> -n <ns>
ClusterIssuers (staging + prod):
# k8s/argocd/cluster-issuer/letsencrypt-staging.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: your@email.com
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
---
# k8s/argocd/cluster-issuer/letsencrypt-prod.yaml (same but prod server)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
# ... same config
Ingress with auto-TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod # or staging for testing
spec:
tls:
- hosts:
- app.example.com
secretName: app-tls
Additional Resources
Reference Files
For detailed patterns and troubleshooting:
references/troubleshooting.md- Extended debugging workflowsreferences/helm-patterns.md- Helm chart best practicesreferences/homelab-components.md- MetalLB, external-dns, cloudflared, k9s
Homelab Context
Examine actual configs in this repo:
k8s/argocd/app-of-apps.yaml- Root ArgoCD appk8s/argocd/apps/- All managed applicationsk8s/terraform/- Infrastructure as code
微信扫一扫