Log Analysis
Parse and analyze logs to identify errors, patterns, and issues.
Instructions
- Identify log source (journalctl, file, application)
- Establish time range of interest
- Filter for relevant entries (errors, specific service)
- Identify patterns and root causes
- Summarize findings with evidence
Common log sources
# Systemd journal
journalctl -u <service> --since "1 hour ago"
journalctl -p err --since today
journalctl -b # current boot
# System logs
/var/log/syslog
/var/log/messages
/var/log/auth.log
# Application logs
/var/log/nginx/error.log
/var/log/apache2/error.log
/var/log/postgresql/
Journalctl patterns
# Errors only
journalctl -p err -b
# Specific service with context
journalctl -u nginx --since "2024-01-01" --until "2024-01-02"
# Follow live
journalctl -f -u myapp
# Kernel messages
journalctl -k
# JSON output for parsing
journalctl -o json -u myapp | jq .
# Disk usage
journalctl --disk-usage
Analysis patterns
# Count errors by type
grep -oP 'ERROR: \K[^:]+' app.log | sort | uniq -c | sort -rn
# Find IPs with most errors
grep "error" access.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn
# Time distribution of errors
grep "ERROR" app.log | grep -oP '^\d{4}-\d{2}-\d{2} \d{2}' | uniq -c
# Errors around a specific time
grep -A5 -B5 "15:30:" error.log
Common error patterns
| Pattern | Indicates | | --------------------- | ----------------------------- | | OOM, "Killed" | Out of memory | | ENOSPC | Disk full | | ECONNREFUSED | Service not running/listening | | ETIMEDOUT | Network/firewall issue | | Permission denied | File permissions or SELinux | | "too many open files" | ulimit exhausted |
Output format
## Summary
[Brief description of what was found]
## Errors Found
- [timestamp] [error message] (occurred N times)
## Root Cause Analysis
[Explanation of likely cause]
## Recommendations
1. [Action to fix]
2. [Preventive measure]
Rules
- MUST establish time range before analyzing
- MUST quantify error frequency (not just "found errors")
- MUST provide specific log excerpts as evidence
- Never expose sensitive data from logs (passwords, tokens)
- Always check for time correlation between errors
微信扫一扫