Skip to the content.

Troubleshooting Guide

Common issues and their solutions.

← Back to Home


Quick Diagnostics

Run the built-in diagnostics:

# Full pipeline test
./scripts/test-pipeline.sh

# Check service status
make status

# Quick health check
make health

# Diagnose common issues
make doctor

# View all logs
make logs

Installation Issues

Docker Desktop Not Supported

Error: Various permission or networking issues

Solution: Install Docker CE directly, not Docker Desktop:

# Uninstall Docker Desktop first, then:
# Ubuntu/Debian
curl -fsSL https://get.docker.com | sh

# Add your user to docker group
sudo usermod -aG docker $USER
# Log out and back in

Kernel Too Old for eBPF

Error: Falco driver failed to load or modern_ebpf not supported

Solution: Upgrade your kernel to 5.8+:

# Check current version
uname -r

# Ubuntu
sudo apt-get update && sudo apt-get upgrade linux-generic
sudo reboot

Permission Denied

Error: Got permission denied while trying to connect to the Docker daemon

Solution: Add your user to the docker group:

sudo usermod -aG docker $USER
# Log out and back in completely

Falco Issues

Falco Won’t Start

# Check logs
docker logs sib-falco

# Verify privileged mode works
docker run --rm --privileged alpine echo "OK"

# Check kernel version
uname -r  # Need 5.8+

Falco High CPU Usage

Cause: Too many rules or very active system

Solutions:

  1. Disable unused rules in detection/config/rules/
  2. Increase rule buffer size in detection/config/falco.yaml
  3. Use output rate limiting

No Falco Events

# Generate a test event
make test-alert

# Check if Falco sees it
docker logs sib-falco --tail 20

# Verify syscall source is working
docker exec sib-falco falco --list

Log Pipeline Issues

Events Not Reaching Grafana

Step-by-step diagnosis:

  1. Check Falco is detecting:
    docker logs sib-falco --tail 10
    
  2. Check Sidekick is receiving:
    docker logs sib-sidekick --tail 10
    
  3. Check storage backend is storing:

    VictoriaLogs (default):

    curl -s "http://localhost:9428/select/logsql/query?query=source:syscall&limit=5" | jq
    

    Loki (Grafana stack):

    curl -s "http://localhost:3100/loki/api/v1/query?query={source=\"syscall\"}" | jq '.data.result | length'
    
  4. Check Grafana datasource:
    • Go to Grafana → Settings → Data sources
    • Click on VictoriaLogs or Loki → Click “Test” button

VictoriaLogs Query Returns Empty

# Check VictoriaLogs is healthy
curl http://localhost:9428/health

# Check what fields exist
curl "http://localhost:9428/select/logsql/field_names"

# Try a simple query
curl "http://localhost:9428/select/logsql/query?query=*&limit=10"

Loki Query Returns Empty

# Check Loki is healthy
curl http://localhost:3100/ready

# Check what labels exist
curl http://localhost:3100/loki/api/v1/labels

# Try a simple query
curl -G http://localhost:3100/loki/api/v1/query --data-urlencode 'query={job=~".+"}'

Sidekick Not Forwarding

Check configuration in alerting/config/config.yaml:

VictoriaLogs (default):

victorialogs:
  hostport: http://victorialogs:9428

Loki (Grafana stack):

loki:
  hostport: http://loki:3100

Restart after changes:

docker compose -f alerting/compose.yaml restart

Dashboard Issues

Dashboards Missing

# Check dashboard provisioning
docker logs sib-grafana | grep -i dashboard

# Re-provision dashboards
docker compose -f grafana/compose.yaml restart

Dashboards Show “No Data”

  1. Check time range: Ensure it covers when events occurred
  2. Check datasource: Verify VictoriaLogs/Loki datasource is working
  3. Generate events: Run make demo to create test data
  4. Check query: Open panel → Edit → check for errors

Grafana Won’t Start

# Check logs
docker logs sib-grafana

# Common issues:
# - Port 3000 already in use
# - Permission issues on data directory

# Try with fresh data
docker volume rm sib_grafana_data
make install-grafana

Fleet/Remote Collector Issues

SSH Connection Fails

# Test SSH manually
ssh -v -i ~/.ssh/id_rsa user@remote-host

# Check key permissions
chmod 600 ~/.ssh/id_rsa
chmod 700 ~/.ssh

# Verify remote host accepts the key
ssh-copy-id -i ~/.ssh/id_rsa user@remote-host

Collectors Not Sending Data

On the remote host:

# VM stack (default) — check Vector and vmagent
docker logs sib-vector --tail 50
docker logs sib-vmagent --tail 50

# Grafana stack — check Alloy
docker logs sib-alloy --tail 50

# Verify network connectivity to SIB server
# VictoriaLogs (default):
curl -s http://SIB_SERVER:9428/health
curl -s http://SIB_SERVER:8428/-/healthy

# Loki (Grafana stack):
curl -s http://SIB_SERVER:3100/ready
curl -s http://SIB_SERVER:9090/-/ready

Fleet Host Not Appearing

  1. Check the collector is running on remote host
  2. Verify firewall allows traffic to SIB server ports
    • VictoriaMetrics stack: 9428, 8428, 2801
    • Grafana stack: 3100, 9090, 2801
  3. Check metrics host label in VictoriaMetrics:
    curl 'http://localhost:8428/api/v1/label/host/values'
    
  4. Check logs hostname field in VictoriaLogs:
    curl 'http://localhost:9428/select/logsql/query?query=hostname:*|stats+by(hostname)+count()&limit=10'
    

AI Analysis Issues

Analysis Returns Empty

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check analysis service logs
make logs-analysis

# Test the API directly
curl "http://localhost:5000/health"

Ollama Connection Refused

Ensure Ollama is accessible from Docker:

# Check analysis/config.yaml
# Use host.docker.internal for Ollama on host machine
llm:
  base_url: http://host.docker.internal:11434

Slow Analysis


Performance Issues

High Memory Usage

# Check container memory
docker stats

# VictoriaLogs - already uses 10x less RAM than Loki

# For Grafana stack - reduce Loki retention
# Edit storage/config/loki-config.yml
retention_period: 168h  # Reduce from default

# VictoriaMetrics - already uses 10x less RAM than Prometheus

# For Grafana stack - limit Prometheus retention
# Edit storage/config/prometheus.yml
storage.tsdb.retention.time: 7d

Disk Space Full

# Check disk usage
df -h
du -sh /var/lib/docker/*

# Clean up old data
docker system prune -a

# Reduce log retention (see above)

Slow Dashboard Loading

  1. Reduce time range for queries
  2. Add more specific label filters
  3. Increase Grafana memory limit in compose.yaml

Container Issues

Container Keeps Restarting

# Check logs for the specific container
docker logs <container-name>

# Check exit code
docker inspect <container-name> --format=''

# Common causes:
# - Port already in use
# - Volume permission issues
# - Out of memory

Port Already in Use

# Find what's using the port
sudo lsof -i :3000
sudo netstat -tulpn | grep 3000

# Change SIB ports in .env file
GRAFANA_PORT=3001

Cannot Pull Images

# Check Docker Hub access
docker pull alpine

# If behind proxy, configure Docker
# /etc/docker/daemon.json
{
  "proxies": {
    "http-proxy": "http://proxy:8080",
    "https-proxy": "http://proxy:8080"
  }
}

Getting Help

Collect Diagnostic Information

# Create a diagnostic bundle
{
  echo "=== System Info ==="
  uname -a
  docker --version
  docker compose version
  
  echo "=== Container Status ==="
  docker ps -a | grep sib
  
  echo "=== Recent Logs ==="
  docker logs sib-falco --tail 20 2>&1
  docker logs sib-sidekick --tail 20 2>&1
  docker logs sib-loki --tail 20 2>&1
  
  echo "=== Disk Usage ==="
  df -h
  
  echo "=== Memory ==="
  free -h
} > sib-diagnostics.txt

Where to Get Help

When reporting issues, please include:

  1. SIB version (git commit hash)
  2. Linux distribution and version
  3. Kernel version (uname -r)
  4. Docker version
  5. Error messages from logs
  6. Steps to reproduce

← Back to Home