Smokes your problems, coughs fresh air.

Tag: mdadm

Preventing degraded array on every boot

I had a server that booted with a degraded array every time, because there was a USB drive attached to it, that messed up the auto detection. I solved it by putting this in mdadm.conf:

# This is to try to solve the problem that the array always boots as degraded when I boot the server with a USB disk attached.
# http://serverfault.com/questions/722360/debian-server-has-degraded-mdam-array-on-every-boot/
DEVICE /dev/disk/by-id/ata-*

Then run:

update-initramfs -u

I still don’t know what went wrong, though. It can plainly see what drivse should be in the array.

Sending SMS notifications of md device failure

I just wrote a script to send sms from a unix machine and I thought it would be a good idea to add an sms notification to mdadm. Therefore I wrote this script, called handle-md-event.sh:

#!/bin/bash
 
# Add 
# PROGRAM /usr/local/sbin/handle-md-event.sh
# To mdadm.conf
 
event="$1"
device="$2"
related="$3"
# Don't use the FQDN, because on machines with misconfigured DNS, it can take a long time to retrieve it and result in an error
hostname=`hostname`
 
mailto="root"
  [ -z "$related" ];
  related="none specified"
  [  "$event"|grep -E -i "^rebuild[0-9]{2}$"` ];
  event="$event% done"
  percentage_notice="true"
 
message="mdadm on $hostname reports an event with device: $device: $event. Related devices: $related."
 
# Don't sms on Rebuild20, Rebuild40, Rebuild60 events.
# And check if /proc/mdstat actually contains an [U_] pattern, so that you only get SMSes on failures and not just random events.
[ "$percentage_notice" != "true" ] && [ -n "`grep '\[[^]]*_[^]]*\]' /proc/mdstat`" ];
  send-sms.sh -m "$message"
 
message="$message \n\nBecause there is/was a bug in the kernel, the normal routine checkarray function also reports Rebuildxxxxxxx, as opposed to check or something. Therefore, This message is probably just causded by the periodic check of the array, but to be sure, here is /proc/mdstat for you to check whether there is a drive failure: \n\n`cat /proc/mdstat`" -e "$message"|mail -s "Mdadm on $hostname reports event $event on device $device" $mailto

In /etc/mdadm.conf you need to add the following line:

PROGRAM /usr/local/sbin/handle-md-event.sh

If you already have a handler defined, you could write a wrapper script that does both.

© 2024 BigSmoke

Theme by Anders NorenUp ↑