高效運維:10個功能強悍的Shell監(jiān)控腳本
在當今復雜多變的運維環(huán)境中,Shell腳本因其靈活性和高效性,成為運維工程師不可或缺的工具之一。本文將詳細介紹10個功能強悍的Shell監(jiān)控腳本,旨在幫助運維人員更好地應對各種挑戰(zhàn),確保系統(tǒng)穩(wěn)定運行。每個腳本均附有詳細代碼及應用場景,助力運維工作更加智能化、自動化。
1. CPU使用率監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/cpu_usage.log"
THRESHOLD=80
get_cpu_usage() {
mpstat 1 1 | awk '/Average/ {print 100 - $NF"%"}' | sed 's/%//g'
}
current_usage=$(get_cpu_usage)
echo"$(date '+%Y-%m-%d %H:%M:%S') CPU Usage: $current_usage" >> $LOG_FILE
if [ $current_usage -ge $THRESHOLD ]; then
echo"Warning: CPU usage is above $THRESHOLD%!" | mail -s "CPU Usage Alert" admin@example.com
fi
應用場景:
該腳本每秒鐘監(jiān)控一次CPU使用率,并將結果記錄到日志文件中。當CPU使用率超過設定的閾值(如80%)時,發(fā)送郵件警報給管理員。適用于需要實時監(jiān)控CPU性能的場景,如高并發(fā)Web服務器或數(shù)據庫服務器。
2. 內存使用情況監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/memory_usage.log"
THRESHOLD=80
get_memory_usage() {
free | awk '/^Mem:/{print $3/$2 * 100.0}' | sed 's/%//g'
}
current_usage=$(get_memory_usage)
echo"$(date '+%Y-%m-%d %H:%M:%S') Memory Usage: $current_usage" >> $LOG_FILE
if [ $current_usage -ge $THRESHOLD ]; then
echo"Warning: Memory usage is above $THRESHOLD%!" | mail -s "Memory Usage Alert" admin@example.com
fi
應用場景:
類似于CPU監(jiān)控腳本,該腳本監(jiān)控內存使用情況,并在內存使用率超過閾值時發(fā)送警報。適用于內存資源緊張的環(huán)境,如大數(shù)據處理或內存密集型應用服務器。
3. 磁盤空間監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/disk_usage.log"
THRESHOLD=90
check_disk_usage() {
df -h | awk '$NF=="/"{print $5}' | sed 's/%//g'
}
current_usage=$(check_disk_usage)
echo"$(date '+%Y-%m-%d %H:%M:%S') Disk Usage: $current_usage" >> $LOG_FILE
if [ $current_usage -ge $THRESHOLD ]; then
echo"Warning: Disk usage is above $THRESHOLD%!" | mail -s "Disk Usage Alert" admin@example.com
fi
應用場景:
監(jiān)控根分區(qū)的磁盤使用情況,當磁盤使用率超過閾值時發(fā)送警報。適用于需要確保磁盤空間充足以避免數(shù)據丟失或系統(tǒng)崩潰的場景。
4. 網絡接口流量監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/network_traffic.log"
THRESHOLD_IN=100000 # 100Mbps in bytes/sec
THRESHOLD_OUT=100000 # 100Mbps in bytes/sec
get_network_traffic() {
iface="eth0"
rx=$(cat /sys/class/net/$iface/statistics/rx_bytes)
tx=$(cat /sys/class/net/$iface/statistics/tx_bytes)
sleep 1
rx_new=$(cat /sys/class/net/$iface/statistics/rx_bytes)
tx_new=$(cat /sys/class/net/$iface/statistics/tx_bytes)
rx_rate=$(( (rx_new - rx) / 1024 / 1024 ))
tx_rate=$(( (tx_new - tx) / 1024 / 1024 ))
echo"$rx_rate $tx_rate"
}
current_rx_rate $(echo $(get_network_traffic) | awk '{print $1}')
current_tx_rate $(echo $(get_network_traffic) | awk '{print $2}')
echo"$(date '+%Y-%m-%d %H:%M:%S') Network Traffic: RX $current_rx_rate MB/s, TX $current_tx_rate MB/s" >> $LOG_FILE
if [ $current_rx_rate -ge $THRESHOLD_IN ] || [ $current_tx_rate -ge $THRESHOLD_OUT ]; then
echo"Warning: Network traffic exceeds thresholds!" | mail -s "Network Traffic Alert" admin@example.com
fi
應用場景:
監(jiān)控特定網絡接口(如eth0)的入站和出站流量,并在流量超過設定閾值時發(fā)送警報。適用于需要確保網絡帶寬充足以避免性能瓶頸的場景,如Web服務器或視頻流媒體服務器。
5. 進程監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/process_monitor.log"
PROCESS_NAME="nginx"
check_process() {
pgrep -x $PROCESS_NAME > /dev/null
if [ $? -eq 0 ]; then
echo"Process $PROCESS_NAME is running."
return 0
else
echo"Process $PROCESS_NAME is NOT running."
return 1
fi
}
status=$(check_process)
echo"$(date '+%Y-%m-%d %H:%M:%S') $PROCESS_NAME Status: $status" >> $LOG_FILE
if [ $status -eq 1 ]; then
echo"Warning: $PROCESS_NAME is not running!" | mail -s "$PROCESS_NAME Down Alert" admin@example.com
# Optionally, you can add a command to start the process here
# systemctl start nginx
fi
應用場景:
監(jiān)控特定進程(如nginx)的運行狀態(tài),并在進程異常終止時發(fā)送警報。適用于需要確保關鍵服務持續(xù)運行的場景,如Web服務器或數(shù)據庫服務。
6. 日志輪轉監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_DIR="/var/log"
DAYS_THRESHOLD=7
rotate_logs() {
find $LOG_DIR -type f -name "*.log" -mtime +$DAYS_THRESHOLD -exec gzip {} \; -exec rm {} \;
}
rotate_logs
echo "$(date '+%Y-%m-%d %H:%M:%S') Logs rotated for files older than $DAYS_THRESHOLD days." >> /var/log/log_rotation.log
應用場景:
定期輪轉日志文件,壓縮并刪除超過指定天數(shù)(如7天)的舊日志文件。適用于需要管理大量日志文件以節(jié)省磁盤空間的場景,如大型Web應用或系統(tǒng)日志服務器。
7. 系統(tǒng)負載監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/system_load.log"
THRESHOLD=5.0
get_system_load() {
uptime | awk -F'load average:''{ print $2 }' | awk '{ print $1,$2,$3 }'
}
current_load=$(get_system_load)
echo"$(date '+%Y-%m-%d %H:%M:%S') System Load: $current_load" >> $LOG_FILE
load_avg=$(echo$current_load | awk '{print $1}')
if [ $(echo"$load_avg > $THRESHOLD" | bc -l) -eq 1 ]; then
echo"Warning: System load is above $THRESHOLD!" | mail -s "System Load Alert" admin@example.com
fi
應用場景:
監(jiān)控系統(tǒng)負載,并在負載超過設定閾值時發(fā)送警報。適用于需要確保系統(tǒng)性能穩(wěn)定的場景,如高并發(fā)應用服務器或數(shù)據庫服務器。
8. 服務狀態(tài)監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/service_status.log"
SERVICES=("nginx""mysql""redis")
check_services() {
for service in"${SERVICES[@]}"; do
systemctl is-active --quiet $service
if [ $? -ne 0 ]; then
echo"$(date '+%Y-%m-%d %H:%M:%S') Service $service is NOT active." >> $LOG_FILE
echo"Warning: Service $service is down!" | mail -s "$service Down Alert" admin@example.com
else
echo"$(date '+%Y-%m-%d %H:%M:%S') Service $service is active." >> $LOG_FILE
fi
done
}
check_services
應用場景:
監(jiān)控一組關鍵服務(如nginx、mysql、redis)的狀態(tài),并在服務異常時發(fā)送警報。適用于需要確保多個服務協(xié)同運行的環(huán)境,如微服務架構或復雜應用部署。
9. I/O性能監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/io_performance.log"
THRESHOLD_READ=5000 # IOPS threshold for read operations
THRESHOLD_WRITE=5000 # IOPS threshold for write operations
get_io_performance() {
iostat -dx 1 1 | awk '/^Device/ { if ($1 != "Device") print $1, $12, $13 }' | sed 's/ //g'
}
current_io=$(get_io_performance)
device=$(echo$current_io | awk '{print $1}')
read_iops=$(echo$current_io | awk '{print $2}')
write_iops=$(echo$current_io | awk '{print $3}')
echo"$(date '+%Y-%m-%d %H:%M:%S') $device IO Performance: Read $read_iops IOPS, Write $write_iops IOPS" >> $LOG_FILE
if [ $read_iops -ge $THRESHOLD_READ ] || [ $write_iops -ge $THRESHOLD_WRITE ]; then
echo"Warning: I/O performance exceeds thresholds for $device!" | mail -s "I/O Performance Alert" admin@example.com
fi
應用場景:
監(jiān)控特定磁盤設備的I/O性能,包括讀和寫操作的IOPS(每秒輸入輸出操作數(shù)),并在性能超過設定閾值時發(fā)送警報。適用于需要確保存儲系統(tǒng)性能的場景,如數(shù)據庫服務器或大數(shù)據處理平臺。
10. 系統(tǒng)安全監(jiān)控腳本
代碼示例:
#!/bin/bash
LOG_FILE="/var/log/security_monitor.log"
UNAUTHORIZED_USERS=("user1""user2")
check_unauthorized_users() {
for user in"${UNAUTHORIZED_USERS[@]}"; do
ifid"$user" &>/dev/null; then
echo"$(date '+%Y-%m-%d %H:%M:%S') Unauthorized user $user found on the system." >> $LOG_FILE
echo"Warning: Unauthorized user $user detected!" | mail -s "Security Alert" admin@example.com
fi
done
}
check_recent_logins() {
lastb | awk '{print $1, $3}' | sort | uniq -c | sort -nr | head -n 10
}
check_unauthorized_users
recent_logins=$(check_recent_logins)
echo"$(date '+%Y-%m-%d %H:%M:%S') Recent Failed Logins:\n$recent_logins" >> $LOG_FILE
# Optionally, you can set up an alert based on the number of failed logins or specific patterns
應用場景:
監(jiān)控系統(tǒng)中是否存在未經授權的用戶賬戶,并檢查最近的失敗登錄嘗試。適用于需要加強系統(tǒng)安全性的場景,如敏感數(shù)據處理環(huán)境或高安全性要求的服務器。
本文介紹了10個功能強悍的Shell監(jiān)控腳本,涵蓋了CPU、內存、磁盤、網絡、進程、日志、系統(tǒng)負載、服務狀態(tài)、I/O性能以及系統(tǒng)安全等多個方面的監(jiān)控需求。這些腳本不僅能夠幫助運維人員實時監(jiān)控系統(tǒng)的運行狀態(tài),還能在異常情況發(fā)生時及時發(fā)送警報,從而有效保障系統(tǒng)的穩(wěn)定性和安全性。希望這些腳本能夠成為你運維工具箱中的得力助手,助力你的運維工作更加高效、智能。