Smokes your problems, coughs fresh air.

Tag: RAID

Preventing degraded array on every boot

I had a server that booted with a degraded array every time, because there was a USB drive attached to it, that messed up the auto detection. I solved it by putting this in mdadm.conf:

# This is to try to solve the problem that the array always boots as degraded when I boot the server with a USB disk attached.
# http://serverfault.com/questions/722360/debian-server-has-degraded-mdam-array-on-every-boot/
DEVICE /dev/disk/by-id/ata-*

Then run:

update-initramfs -u

I still don’t know what went wrong, though. It can plainly see what drivse should be in the array.

Broken disk went undetected, but did corrupt data

I had a disk in one of my servers that was starting to give ATA errors in the syslog. Contrary to what you might think, ATA errors are fairly common, so I didn’t immediately sound the alarm. However, this disk turned out to be corrupting data. During upgrading Debian 6 to 7, the file system became read-only. Rebooting gave me a recovery shell and e2fsck gave me millions of questions.

In the end, I had to recreate the FS and restore from backup.

For the record, this was the error in question (although, this error can also be harmless):

[2013-09-01 01:32:19]  ata1: lost interrupt (Status 0x51)
[2013-09-01 01:32:19]  ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[2013-09-01 01:32:19]  ata1.01: failed command: READ DMA EXT
[2013-09-01 01:32:19]  ata1.01: cmd 25/00:00:3f:43:9c/00:04:05:00:00/f0 tag 0 dma 524288 in
[2013-09-01 01:32:19]           res 40/00:00:11:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
[2013-09-01 01:32:19]  ata1.01: status: { DRDY }
[2013-09-01 01:32:19]  ata1: soft resetting link
[2013-09-01 01:32:20]  ata1.00: configured for UDMA/133
[2013-09-01 01:32:20]  ata1.01: configured for UDMA/33
[2013-09-01 01:32:20]  ata1: EH complete

Port is ata 1.1. In other words, sdb; first controller, second disk (nice mixup of zero and one based counters; at first I thought it was sda).

The disk was a Western Digital WDC WD5000ABYS-01TNA0.

Adding a disk to a RAID5 array on a 3Ware array with tw_cli

I wanted to know if I could extend the size of a RAID5 array on the 3Ware 9650SE, so I tried something.

I first had this:

# tw_cli /c0 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK             -       -       256K    5587.9    RiW    ON
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   1.82 TB   SATA  0   -            ST32000542AS
p1    OK             u0   1.82 TB   SATA  1   -            ST32000542AS
p2    OK             u0   1.82 TB   SATA  2   -            ST32000542AS
p3    OK             u0   1.82 TB   SATA  3   -            ST32000542AS
p4    OK             -    1.82 TB   SATA  4   -            ST32000542AS
 
Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       0      xx-xxx-xxxx

A 4 disk raid 5 and one extra disk.

Then I did this:

# tw_cli /c0/u0 migrate type=raid5 disk=4
Sending migration message to /c0/u0 ... Done.

Then I have this:

# tw_cli /c0/u0 show
 
Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       Migrator  MIGRATING      -       0%      -     -       -
 
su0      RAID-5    OK             -       -       -     256K    5587.9
su0-0    DISK      OK             -       -       p0    -       1862.63
su0-1    DISK      OK             -       -       p1    -       1862.63
su0-2    DISK      OK             -       -       p2    -       1862.63
su0-3    DISK      OK             -       -       p3    -       1862.63
su0/v0   Volume    -              -       -       -     -       50
su0/v1   Volume    -              -       -       -     -       5537.9
 
du0      RAID-5    OK             -       -       -     256K    7450.54
du0-0    DISK      OK             -       -       p0    -       1862.63
du0-1    DISK      OK             -       -       p1    -       1862.63
du0-2    DISK      OK             -       -       p2    -       1862.63
du0-3    DISK      OK             -       -       p3    -       1862.63
du0-4    DISK      OK             -       -       p4    -       1862.63
du0/v0   Volume    -              -       -       -     -       N/A
du0/v1   Volume    -              -       -       -     -       N/A

su0 and du0 are probably source and destination, giving me a new and bigger u0 at the end. But this is going to take a week to migrate, so I won’t know for a while… (edit: I contacted 3Ware support and they said the change in size is only seen after driver reload, which means a reboot in most cases).

Aligning partitions with RAID and LVM on drives with 4 kB sectors

Hard disks are being released that abandon the long-established standard of 512 byte sectors. I just got two Western Digital WD15EARS ones, which uses 4 kB sectors. Western Digital Refers to this as “Advanced Format”. This poses some serious problems. I will describe those, and what I did to ‘solve’ it.

This article is as much for myself as for other people correcting me. So, if you see faults, let me know 🙂

First, you may want to read this IBM article and this LWN article. Reading these articles is important to understand this issue. Too summarize a bit, hard disks can’t really have 4 kB sectors, because then BIOS, bootloaders, operating systems, partition software and who knows what will just go beserk. So, they still report a sector size of 512 bytes. This emulation presents certain performance issues (explained in the links), which can only be avoided by aligning the filesystems on disk, which the IBM article explains well. In short, I configured fdisk to 224 heads and 56 sectors. This aligns the partitions properly. Do remember, however, to start at cylinder 2 for your first partition, otherwise there isn’t enough space for the bootloader.

Now, what happens if you don’t put a file system on your partition, but make a software RAID1+LVM out of it? It gets a lot more complicated to align the filesystem blocks this way, because of all the layers. I did the following:

First I created the aligned partition with fdisk on two disks. Then I made a RAID1 array out of it. Luckily, Linux MD has the ability to store the RAID superblock at the end of the partition. It can also store it at the beginning, but you shouldn’t do that! The RAID superblock is 256 bytes + 2 bytes per drive long, which means one logical sectors, which in turn means that the start of the MD partition is in the middle of a 4 kB sector on disk. You can use metadata format 1.0 or 0.9 to put the superblock at the end of the disk (see mdadm man page).

Then it’s time to create the logical volume. When preparing the RAID1 partition for use in the volume group, give pvcreate the –dataalignment 4096 option. Then, with “pvs -o +pe_start”, you should be able to see where the first PE (physical extent) starts. I accidentally created mine with alignment 8, but the first extent is at 136.00 kB (139264 bytes), which is divisable by 4096.

Then, I created the volume group with an extent size of 4MB. 4 MB is also divisible by 4096. Logical volumes are created as multiples of the extent size, so you can now create them at will, and they will be aligned.

When creating a RAID array that deals with striping, be sure to make the stripe size a multiple of 4 kB. I guess this also applies to logical volumes with striping.

What I still would like to know, is whether the file system journal is also made up of 4 kb blocks. Also, the RAID array’s write intent bitmap (if you have one) is also still unclear to me. Where is that stored? Does it write in multiples of 4k?

Installing Arch Linux on RAID+LVM

I just installed Arch Linux on a RAID1+LVM, which involved some work. There already is a nice article about it, but I wanted to summarize for myself.

Arch has no GUI or menu for you to do this. So, when the installer has started, just go to another VT and create the RAID+LVM.

In my first attempt, I created one RAID partition with LVM, from which I intended to boot using Grub2, since it understands LVM and RAID. However, the grub installer kept saying there was no mapping for /dev/mapper/lvmroot or something, so I decided to make two partitions: one boot and one rest, which was meant as physical volume for the LVM. The advantage of linux software raid (when you store the superblock at the end of the partition, at least), is that grub can access it like a normal disk; it doesn’t need to know RAID).

After the array was made using something like:

mdadm --create /dev/md0 -l 1 -n 2 -x 0 -e 1.0 /dev/sda1 /dev/sdb1
mdadm --create /dev/md1 -l 1 -n 2 -x 0 -e 1.0 /dev/sda2 /dev/sdb2

It was time to create the LVM. So, I ran (don’t know the exact syntax, so this is an abstraction):

pvcreate /dev/md1
 
# In my research for my initial lvm problem, I found people who had 
# problems with dashes in the volume group name, so I don't 
# use those anymore...
vgcreate lvmonraid /dev/md1
 
lvcreate -n root -l [wanted size/extentsize] lvmonraid
lvcreate -n home -l [wanted size/extentsize] lvmonraid
lvcreate -n swap -l [wanted size/extentsize] lvmonraid

Extent size is normally 4 MB, so a 40MB partition would have “-l 10”. You can also supply sizes in bytes, but that is not only inprecise because it gets rounded to the extent size, but I also noticed bugs in what metric and binary units were supposed to be; it seemed like the command line options don’t differentiate between G and g, for instance, and they’re all GiBi, whereas the –units option does.

When this is done, you can configure the block devices in arch to assign the purpose.

When the installation is almost done, it asks to modify some configuration files. This is important, otherwise the initramfs won’t load the LVM. You need to:

  • Make an mdadm.conf in /etc (of the live CD) with “mdadm –examine –scan > /etc/mdadm.conf”.
  • Add the raid1 and dm_mod module to the MODULES list in /etc/mkinitcpio.conf.
  • Add the mdadm and lvm2 hook to the HOOKS list in /etc/mkinitcpio.conf, before ‘filesystems’
  • Edit your /etc/rc.conf, and set the USELVM parameter to “yes”

Then it will create the ramdisk. The next thing is installing Grub. The config file it makes is fine, except you need to add (hd0,0) after the empty “root” directive, twice. Installing grub fails because of the RAID, so you have to do that by hand:

# grub
grub> root (hd0,0)
grub> setup(hd0)
grub> root (hd1,0)
grub> setup (hd1)

That should be it, basically. I actually did a lot more because of my new Western Digital WD15EARS disk with 4 kB sectors, but I’ll write about that soon.

Sending SMS notifications of md device failure

I just wrote a script to send sms from a unix machine and I thought it would be a good idea to add an sms notification to mdadm. Therefore I wrote this script, called handle-md-event.sh:

#!/bin/bash
 
# Add 
# PROGRAM /usr/local/sbin/handle-md-event.sh
# To mdadm.conf
 
event="$1"
device="$2"
related="$3"
# Don't use the FQDN, because on machines with misconfigured DNS, it can take a long time to retrieve it and result in an error
hostname=`hostname`
 
mailto="root"
  [ -z "$related" ];
  related="none specified"
  [  "$event"|grep -E -i "^rebuild[0-9]{2}$"` ];
  event="$event% done"
  percentage_notice="true"
 
message="mdadm on $hostname reports an event with device: $device: $event. Related devices: $related."
 
# Don't sms on Rebuild20, Rebuild40, Rebuild60 events.
# And check if /proc/mdstat actually contains an [U_] pattern, so that you only get SMSes on failures and not just random events.
[ "$percentage_notice" != "true" ] && [ -n "`grep '\[[^]]*_[^]]*\]' /proc/mdstat`" ];
  send-sms.sh -m "$message"
 
message="$message \n\nBecause there is/was a bug in the kernel, the normal routine checkarray function also reports Rebuildxxxxxxx, as opposed to check or something. Therefore, This message is probably just causded by the periodic check of the array, but to be sure, here is /proc/mdstat for you to check whether there is a drive failure: \n\n`cat /proc/mdstat`" -e "$message"|mail -s "Mdadm on $hostname reports event $event on device $device" $mailto

In /etc/mdadm.conf you need to add the following line:

PROGRAM /usr/local/sbin/handle-md-event.sh

If you already have a handler defined, you could write a wrapper script that does both.

Installing grub on RAID1

When you have your kernel image and grub boot stages on a Linux software RAID1 partition, installing grub needs a bit of trickery.

It determines on what hardware device those files are located by looking at /etc/mtab. It will find /dev/md/0 for / or /boot (or whatever) where the files are located. That device has no BIOS device, because it is managed by the Linux kernel. Therefore, when you try to install grub, it says: “/dev/md/0 does not have any corresponding BIOS drive.“. But because the underlying partitions in a RAID1 have the same data as the virtual device, we can trick grub.

All we have to do is edit /etc/mtab and replace the /dev/md/0 with /dev/sda1 (or whatever). You can then run grub-install hd0. Of course, make a backup of mtab first so you can revert it after having installed grub.

I find it kind of weird that grub doesn’t determine that the device in question is a RAID1 and that it can simply use the underlying device; the internet is filled with people having the same problem. If anybody knows a more elegant solution than this, I’m all ears.

Based on the comments, this is what you need to install it from the grub prompt:

# grub
grub> root (hd0,0)
grub> setup(hd0)
grub> root (hd1,0)
grub> setup (hd1)

© 2024 BigSmoke

Theme by Anders NorenUp ↑