Troubleshooting Guide
Common issues and their solutions.
Quick Diagnostics
Run the built-in diagnostics:
# Full pipeline test
./scripts/test-pipeline.sh
# Check service status
make status
# Quick health check
make health
# Diagnose common issues
make doctor
# View all logs
make logs
Installation Issues
Docker Desktop Not Supported
Error: Various permission or networking issues
Solution: Install Docker CE directly, not Docker Desktop:
# Uninstall Docker Desktop first, then:
# Ubuntu/Debian
curl -fsSL https://get.docker.com | sh
# Add your user to docker group
sudo usermod -aG docker $USER
# Log out and back in
Kernel Too Old for eBPF
Error: Falco driver failed to load or modern_ebpf not supported
Solution: Upgrade your kernel to 5.8+:
# Check current version
uname -r
# Ubuntu
sudo apt-get update && sudo apt-get upgrade linux-generic
sudo reboot
Permission Denied
Error: Got permission denied while trying to connect to the Docker daemon
Solution: Add your user to the docker group:
sudo usermod -aG docker $USER
# Log out and back in completely
Falco Issues
Falco Won’t Start
# Check logs
docker logs sib-falco
# Verify privileged mode works
docker run --rm --privileged alpine echo "OK"
# Check kernel version
uname -r # Need 5.8+
Falco High CPU Usage
Cause: Too many rules or very active system
Solutions:
- Disable unused rules in
detection/config/rules/ - Increase rule buffer size in
detection/config/falco.yaml - Use output rate limiting
No Falco Events
# Generate a test event
make test-alert
# Check if Falco sees it
docker logs sib-falco --tail 20
# Verify syscall source is working
docker exec sib-falco falco --list
Log Pipeline Issues
Events Not Reaching Grafana
Step-by-step diagnosis:
- Check Falco is detecting:
docker logs sib-falco --tail 10 - Check Sidekick is receiving:
docker logs sib-sidekick --tail 10 -
Check storage backend is storing:
VictoriaLogs (default):
curl -s "http://localhost:9428/select/logsql/query?query=source:syscall&limit=5" | jqLoki (Grafana stack):
curl -s "http://localhost:3100/loki/api/v1/query?query={source=\"syscall\"}" | jq '.data.result | length' - Check Grafana datasource:
- Go to Grafana → Settings → Data sources
- Click on VictoriaLogs or Loki → Click “Test” button
VictoriaLogs Query Returns Empty
# Check VictoriaLogs is healthy
curl http://localhost:9428/health
# Check what fields exist
curl "http://localhost:9428/select/logsql/field_names"
# Try a simple query
curl "http://localhost:9428/select/logsql/query?query=*&limit=10"
Loki Query Returns Empty
# Check Loki is healthy
curl http://localhost:3100/ready
# Check what labels exist
curl http://localhost:3100/loki/api/v1/labels
# Try a simple query
curl -G http://localhost:3100/loki/api/v1/query --data-urlencode 'query={job=~".+"}'
Sidekick Not Forwarding
Check configuration in alerting/config/config.yaml:
VictoriaLogs (default):
victorialogs:
hostport: http://victorialogs:9428
Loki (Grafana stack):
loki:
hostport: http://loki:3100
Restart after changes:
docker compose -f alerting/compose.yaml restart
Dashboard Issues
Dashboards Missing
# Check dashboard provisioning
docker logs sib-grafana | grep -i dashboard
# Re-provision dashboards
docker compose -f grafana/compose.yaml restart
Dashboards Show “No Data”
- Check time range: Ensure it covers when events occurred
- Check datasource: Verify VictoriaLogs/Loki datasource is working
- Generate events: Run
make demoto create test data - Check query: Open panel → Edit → check for errors
Grafana Won’t Start
# Check logs
docker logs sib-grafana
# Common issues:
# - Port 3000 already in use
# - Permission issues on data directory
# Try with fresh data
docker volume rm sib_grafana_data
make install-grafana
Fleet/Remote Collector Issues
SSH Connection Fails
# Test SSH manually
ssh -v -i ~/.ssh/id_rsa user@remote-host
# Check key permissions
chmod 600 ~/.ssh/id_rsa
chmod 700 ~/.ssh
# Verify remote host accepts the key
ssh-copy-id -i ~/.ssh/id_rsa user@remote-host
Collectors Not Sending Data
On the remote host:
# VM stack (default) — check Vector and vmagent
docker logs sib-vector --tail 50
docker logs sib-vmagent --tail 50
# Grafana stack — check Alloy
docker logs sib-alloy --tail 50
# Verify network connectivity to SIB server
# VictoriaLogs (default):
curl -s http://SIB_SERVER:9428/health
curl -s http://SIB_SERVER:8428/-/healthy
# Loki (Grafana stack):
curl -s http://SIB_SERVER:3100/ready
curl -s http://SIB_SERVER:9090/-/ready
Fleet Host Not Appearing
- Check the collector is running on remote host
- Verify firewall allows traffic to SIB server ports
- VictoriaMetrics stack: 9428, 8428, 2801
- Grafana stack: 3100, 9090, 2801
- Check metrics
hostlabel in VictoriaMetrics:curl 'http://localhost:8428/api/v1/label/host/values' - Check logs
hostnamefield in VictoriaLogs:curl 'http://localhost:9428/select/logsql/query?query=hostname:*|stats+by(hostname)+count()&limit=10'
AI Analysis Issues
Analysis Returns Empty
# Check Ollama is running
curl http://localhost:11434/api/tags
# Check analysis service logs
make logs-analysis
# Test the API directly
curl "http://localhost:5000/health"
Ollama Connection Refused
Ensure Ollama is accessible from Docker:
# Check analysis/config.yaml
# Use host.docker.internal for Ollama on host machine
llm:
base_url: http://host.docker.internal:11434
Slow Analysis
- Use smaller models (
llama3.1:8binstead of larger) - Enable caching in config
- Consider cloud LLM providers (OpenAI/Anthropic)
Performance Issues
High Memory Usage
# Check container memory
docker stats
# VictoriaLogs - already uses 10x less RAM than Loki
# For Grafana stack - reduce Loki retention
# Edit storage/config/loki-config.yml
retention_period: 168h # Reduce from default
# VictoriaMetrics - already uses 10x less RAM than Prometheus
# For Grafana stack - limit Prometheus retention
# Edit storage/config/prometheus.yml
storage.tsdb.retention.time: 7d
Disk Space Full
# Check disk usage
df -h
du -sh /var/lib/docker/*
# Clean up old data
docker system prune -a
# Reduce log retention (see above)
Slow Dashboard Loading
- Reduce time range for queries
- Add more specific label filters
- Increase Grafana memory limit in compose.yaml
Container Issues
Container Keeps Restarting
# Check logs for the specific container
docker logs <container-name>
# Check exit code
docker inspect <container-name> --format=''
# Common causes:
# - Port already in use
# - Volume permission issues
# - Out of memory
Port Already in Use
# Find what's using the port
sudo lsof -i :3000
sudo netstat -tulpn | grep 3000
# Change SIB ports in .env file
GRAFANA_PORT=3001
Cannot Pull Images
# Check Docker Hub access
docker pull alpine
# If behind proxy, configure Docker
# /etc/docker/daemon.json
{
"proxies": {
"http-proxy": "http://proxy:8080",
"https-proxy": "http://proxy:8080"
}
}
Getting Help
Collect Diagnostic Information
# Create a diagnostic bundle
{
echo "=== System Info ==="
uname -a
docker --version
docker compose version
echo "=== Container Status ==="
docker ps -a | grep sib
echo "=== Recent Logs ==="
docker logs sib-falco --tail 20 2>&1
docker logs sib-sidekick --tail 20 2>&1
docker logs sib-loki --tail 20 2>&1
echo "=== Disk Usage ==="
df -h
echo "=== Memory ==="
free -h
} > sib-diagnostics.txt
Where to Get Help
- GitHub Issues: github.com/matijazezelj/sib/issues
- Reddit: u/matijaz
When reporting issues, please include:
- SIB version (git commit hash)
- Linux distribution and version
- Kernel version (
uname -r) - Docker version
- Error messages from logs
- Steps to reproduce