#What it does
The /k8s-debug skill provides a systematic debugging workflow for Kubernetes clusters. It diagnoses pod failures, CrashLoopBackOff errors, networking issues, resource exhaustion, and configuration problems by following a structured investigation path.
#How to use
bash
/k8s-debug
/k8s-debug pod my-app-7d4b8c6f5-x2k9p
/k8s-debug service my-app --namespace production#Workflow
- Triage -- Checks cluster health, node status, and identifies failing resources
- Inspect -- Pulls events, logs, and describe output for the affected resource
- Diagnose -- Classifies the failure (OOM, image pull, config, networking, scheduling)
- Resolve -- Applies the fix or provides the exact commands to run
- Prevent -- Recommends resource limits, probes, or policies to avoid recurrence
#Common issues it resolves
- CrashLoopBackOff -- Application startup failures, missing config, port conflicts
- ImagePullBackOff -- Registry auth, image tag typos, private repo access
- OOMKilled -- Memory limits too low, memory leaks, JVM heap misconfiguration
- Pending pods -- Insufficient resources, node affinity rules, PVC binding failures
- Service unreachable -- Selector mismatches, missing endpoints, network policy blocks
- Failed scheduling -- Taint/toleration mismatches, resource requests exceeding capacity
#Example
bash
> /k8s-debug pod api-server-6b8f9c4d7-m3k2j
# Inspecting pod...
Status: CrashLoopBackOff (restarted 7 times)
Last exit code: 137 (OOMKilled)
# Resource analysis:
Memory limit: 256Mi
Memory at crash: 254Mi (99% of limit)
Typical usage: 180-240Mi (spikes during request bursts)
# Root cause:
Memory limit too tight for traffic spikes.
Container killed by OOM when concurrent requests exceed ~50.
# Fix applied:
Updated memory limit: 256Mi -> 512Mi
Added memory request: 256Mi (for scheduling accuracy)
Added readiness probe to shed load during startup
# Verification:
Pod running stable for 5 minutes
Memory usage: 195Mi / 512Mi (38%)
No OOM events