Prevent Linux OOM Freezes with earlyoom + Lightweight Memory Forensics

When a Linux box slowly bleeds memory over hours, the kernel OOM killer kicks in too late — by then the machine is already frozen. Two complementary fixes: earlyoom kills a hog before the system locks up, and a periodic memory snapshot gives you post-mortem data next time it happens.

earlyoom

Install and go — on Debian it's one command and the service auto-enables:

sudo apt-get install -y earlyoom
systemctl status earlyoom.service

Defaults (Debian package): kills when available RAM drops below 10% and available swap drops below 10%. Sends SIGTERM first, then SIGKILL. Configurable via /etc/default/earlyoom or override flags in the systemd unit.

Memory snapshot every 5 minutes

When the machine eventually does die, you want to know what was growing. A systemd timer + shell script + logrotate gives you that with near-zero overhead.

The capture script (/usr/local/bin/memsnap-capture):

#!/bin/sh
set -eu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LOG_DIR=/var/log/memsnap
LOG_FILE="$LOG_DIR/memsnap.log"
umask 022
mkdir -p "$LOG_DIR"
touch "$LOG_FILE"

echo "=== $(date -Is) host=$(hostname) ===" >> "$LOG_FILE"
echo "-- uptime --" >> "$LOG_FILE"
uptime >> "$LOG_FILE" 2>&1 || true

echo "-- meminfo --" >> "$LOG_FILE"
awk '/^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapCached|SwapTotal|SwapFree|Active\(anon\)|Inactive\(anon\)|AnonPages|Mapped|Shmem|Slab):/ { print }' /proc/meminfo >> "$LOG_FILE"

echo "-- top rss --" >> "$LOG_FILE"
ps -eo pid,ppid,user,comm,%mem,rss,vsz --sort=-rss | head -n 25 >> "$LOG_FILE"

echo "-- top swap --" >> "$LOG_FILE"
awk 'FNR==1{pid=FILENAME;sub(/^.*\//,"",pid);sub(/\/status$/,"",pid);name="";swap=0} /^Name:/{name=$2} /^VmSwap:/{swap=$2} ENDFILE{if(swap>0)printf "%s\t%s\t%s\n",pid,name,swap}' /proc/[0-9]*/status 2>/dev/null | sort -k3,3nr | head -n 20 >> "$LOG_FILE" || true
printf "\n" >> "$LOG_FILE"

The service unit (/etc/systemd/system/memsnap-capture.service):

[Unit]
Description=Capture lightweight memory snapshot

[Service]
Type=oneshot
ExecStart=/usr/bin/flock -n /run/memsnap-capture.lock /usr/local/bin/memsnap-capture

The timer (/etc/systemd/system/memsnap-capture.timer):

[Unit]
Description=Memory snapshot every 5 minutes

[Timer]
OnBootSec=3min
OnUnitActiveSec=5min
AccuracySec=1min
Persistent=true

[Install]
WantedBy=timers.target

Logrotate (/etc/logrotate.d/memsnap):

/var/log/memsnap/memsnap.log {
    daily
    rotate 14
    compress
    delaycompress
    copytruncate
    maxsize 20M
    missingok
    notifempty
}

Enable:

sudo systemctl daemon-reload
sudo systemctl enable --now memsnap-capture.timer

Each snapshot is 2 KB. At 288 snapshots/day that's under 1 MB/day uncompressed. 14 days of rotated + compressed logs won't exceed 10 MB total. Negligible SSD wear.

Comments

  1. Markdown is allowed. HTML tags allowed: <strong>, <em>, <blockquote>, <code>, <pre>, <a>.