Smokes your problems, coughs fresh air.

Author: halfgaar (Page 9 of 26)

Halfgaar is Wiebe. Wiebe is a contributing author on this weblog. He also has a lot of stuff (such as long, in-depth articles) on his personal website.

Wiebe's day job is as a senior software developer and system administrator at YTEC.

In his free time, he built the free, open-source FlashMQ software. Together with Jeroen and Rowan, he is now building a managed MQTT hosting business around his open masterpiece.

Creating a drbd for an existing Xen domain

I needed some VMs to be available on a backup node, which I accomplished with the distributed remote block device, or DRBD. My host machine is Debian 6.

This post replaced an older one I made.

First install drbd:

aptitude -P install drbd8-utils

Then make some config files. First adjust /etc/drbd.d/global.conf (I only had to uncomment the notify rules):

global {
        usage-count yes;
        # minor-count dialog-refresh disable-ip-verification
}
 
common {
        protocol C;
 
        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
 
        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb;
 
                # The timeout value when the last known state of the other side was available.
                wfc-timeout 0;
 
                # Timeout value when the last known state was disconnected.
                degr-wfc-timeout 180;
        }
 
        disk {
                # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
                # no-disk-drain no-md-flushes max-bio-bvecs   
        }
 
        net {
                # snd‐buf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
                # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
                # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
        }
 
        syncer {
                # rate after al-extents use-rle cpu-mask verify-alg csums-alg
        }
}

Then I made a resource for my existing logical volume:

resource r0
{
  meta-disk internal;
  device /dev/drbd1;
 
  startup
  {
    # The timeout value when the last known state of the other side was available.
    wfc-timeout 0;
 
    # Timeout value when the last known state was disconnected.
    degr-wfc-timeout 180;
  }
 
  syncer
  {
    # This is recommended only for low-bandwidth lines, to only send those
    # blocks which really have changed.
    #csums-alg md5;
 
    # Set to about half your net speed
    rate 8M;
 
    # It seems that this option moved to the 'net' section in drbd 8.4.
    verify-alg md5;
  }
 
  net
  {
    # The manpage says this is recommended only in pre-production (because of its performance), to determine
    # if your LAN card has a TCP checksum offloading bug. 
    #data-integrity-alg md5;
  }
 
  disk
  {
    # Detach causes the device to work over-the-network-only after the
    # underlying disk fails. Detach is not default for historical reasons, but is
    # recommended by the docs.
    # However, the Debian defaults in drbd.conf suggest the machine will reboot in that event...
    on-io-error detach;
 
    # LVM doesn't support barriers, so disabling it. It will revert to flush. Check wo: in /proc/drbd. If you don't disable it, you get IO errors.
    no-disk-barrier;
  }
 
  on top
  {
    disk /dev/universe/lvtest;
    address 192.168.2.6:7789;
  }
 
  on bottom
  {
    disk /dev/universe/lvtest;
    address 192.168.2.7:7790;
  }
}

Copy all config files to the slave machine (and write an rsync-script for it…).

I learned that Linux 3.1 now has write barriers enabled by default for ext3 (they already were for ext4). This causes bugs and IO errors with xen-blkfront, so that needs to be disabled:

# grep barrier /etc/fstab
/dev/xvda2 / ext3 barrier=0 0 1

I’ll see about finding out if there are bug reports and file them if necessary.

The drbd data is going to be written on the actual LV, so on the primary node, we need to make space (you can also grow the LV):

e2fsck -f /dev/universe/lvtest
resize2fs /dev/universe/lvtest 500M # or however big that's a tad smaller than the actual LV.
drbdadm create-md r0
drbdadm up r0

On the secondary node, make the device as well:

drbdadm create-md r0
drbdadm up r0

Then we can start syncing and re-grow it. On the primary:

drbdadm -- --overwrite-data-of-peer primary r0 # the -- is necessary because of weird option handling by drbdadm.
resize2fs /dev/drbd1

The logical volume has been converted from ext3 to drbd:

# mount /dev/universe/lvtest /mnt/temp
mount: unknown filesystem type 'drbd'

Then, it is recommended you create /etc/modprobe.d/drbd.conf with:

options drbd disable_sendpage=1

I don’t know what it does, but it’s recommended by the DRBD devices docs when you put Xen domains on DRBD devices.

In Xen, you can configure the disk device of a VM like this (actually, I learned that this doesn’t work with pygrub):

disk = [ 'drbd:resource,xvda,w' ]

Drbd has installed the necessary scripts in /etc/xen/scripts to support this. Xen will now automatically promote a drbd device to primary when you start a VM.

Bewarned: because of that, don’t put the VM in the /etc/xen/auto dir on the fallback node, otherwise whichever machine is faster will start the VM, preventing the other machine from starting it (because you can’t have two primaries).

Then, I noticed that Debian arranges it’s boot process erroneously, starting xemdomains before drbd. I comment on an old bug.

You can fix it by adding xendomains to the following lines in /etc/init.d/drbd:

# X-Start-Before: heartbeat corosync xendomains
# X-Stop-After:   heartbeat corosync xendomains 

Mdadm (software RAID) schedules monthly checks of your array. You can do that for DRBD too). You do that on the primary node with a cronjob in /etc/cron.d/:

42 0 * * 0    root    /sbin/drbdadm verify all

One last thing: the docs state that when you perform a verify and it detects an out-of-sync device, all you have to do is disconnect and connect. That didn’t work for me. Instead, I ran the following on the secondary node (the one I had destroyed with dd) to initiate a resync:

drbdadm invalidate r0

Fixing mailscanner insecure dependancy

Mailscanner cut out on me, without errors in the log. It was only after turning on debug (which prevents backgrounding and it then only processes one batch) that it showed me.

I needed to change the first line of Mailscanner in /usr/sbin/Mailscanner:

#!/usr/bin/perl -I/usr/share/MailScanner/ -U 

Source.

Of course, this is far from optimal and it bugs me that Debian didn’t fix this yet, because it’s an old issue.

Setting max memory of a Xen Dom0

I’ve had some issues with Xen crashing when I wanted to create a DomU for which the Dom0 had to shrink (see bug report). Therefore, it’s better to force a memory limit on the dom0. That is done with a kernel param.

Add this to /etc/default/grub:

# Start dom0 with less RAM
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=512M"

And make sure you disabling ballooning of dom0 in /etc/xen/xend-config.sxp:

(enable-dom0-ballooning no)

Then run update-grub2 and reboot.

Fixing a Xen DomU’s grub config

When you use xen-create-image to bootstrap an ubuntu, it sets up a grub config file menu.lst. However, this boot config is not kept up-to-date with newer ubuntu’s because they use grub2 (which uses grub.cfg and not menu.lst). And Xen’s pygrub first looks at menu.lst, so if you have a stale file, it will always boot an old kernel.

I ‘Fixed’ it like this (a real fix I have yet to devise, but this works. Actually, I think it is a bug):

The grub hooks in Debian and Ubuntu don’t take the fact into account that the machine might be running as paravirtualized VM. Therefore, it can’t find /dev/xvda to install grub on, which it shouldn’t try. Bug reports exist about this, but it is not deemed important, it seems. The result is that the menu.lst that was created by xen-create-image is never updated and updates to the kernel are never booted.

Pygrub considers menu.lst over grub.cfg (which would give problems with upgrading to grub2). But you can also use it to your advantage. You can edit /etc/kernel-img.conf to look like this (do_symlinks = yes and no hooks):

do_symlinks = yes
relative_links = yes
do_bootloader = no
do_bootfloppy = no
do_initrd = yes
link_in_boot = no
postinst_hook =
postrm_hook   =

And then make /boot/grub/menu.lst with this:

default         0
timeout         2
 
title           Marauder
root            (hd0,0)
kernel          /vmlinuz root=/dev/xvda2 ro
initrd          /initrd.img

Then uninstall grub. This way, you always boot the new kernel. (actually, I found that you can’t always uninstall grub. I now have machines with grub-pc and grub-common installed, but have an empty /boot/grub with only a menu.lst there. This will keep aptitude from complaining about grub-pc not being able to configure, because it’s unable to detect the bios device for /dev/xvda2 (or whatever)

CUPS printer driver for my HP4050

My HP4050 has a million different drivers to choose from. Some don’t have the maximum DPI, some don’t have other options, like selecting paper output. The one that works the best seems to be “HP LaserJet 4050 Series Postscript (recommended) (grayscale, 2-sided printing)”. However, I have no idea anymore if that came from hplip, foomatic, gutenprint, or whatever.

Converting a MySQL database from latin1 to utf8

mysqldump dbname > dbname_bak_before_messing_with_it.sql
mysqldump --default-character-set=latin1 --skip-set-charset dbname > dump.sql
sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql
mysql --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;" 
mysql --default-character-set=utf8 dbname < dump_utf.sql

Setting up a postfix fallback MX

These are some short stepts and pointers to setup a postfix fallback MX. It does not describe postfix basics, nor how to install Mailscanner (a necessisity, because spammers like sending spam to fallback MX’s because they’re usually not as well protected).

First, postfix has a special transport, called ‘relay’, which is the same as SMTP, except that it doesn’t send to the backup MX. In this case, that means it doesn’t send to itself, avoiding loops. The relay transport is the default for all the domains you specify in the relay_domains parameter.

Should the machine have a relayhost defined, you need to disable that for each of the domains you’re a backup MX for. You can do that with transport_maps. In my case, the transport_maps is hash:/etc/postfix/transport_maps. That is a key-value file on which you run ‘postmap’ after you’ve edited it. In it, specify this (the comments explain it):

# When you specify a transport without nexthop, it resets the relay to the
# recipient domain (see man 5 transport). And, when the transport is relay,
# postfix will not relay to the backup MX, to prevent loops back to itself. So,
# because this host has a default relayhost, use the folowwing when you want to
# specify a domain for which we are backup MX: 
# example.com            relay 

The next step you want to do is take measures to prevent mails to non-existent users piling up in the queue. Because spam is sent to the backup MX all the time, it will relay it to the primary, which rejects it. Your backup host will then bounce all kinds of spam mails…

To fix that, we instruct postfix to use recipient address verification. This causes it to probe the primary host to check the address exists (and caches that info) before relaying.

To enable it, do this:

smtpd_recipient_restrictions = permit_mynetworks, reject_unauth_destination, reject_unknown_recipient_domain, reject_unverified_recipient
unverified_recipient_reject_reason = Address lookup failed
# Prevent caching non-existent users, to prevent delivery failures when new users are unknown still.
address_verify_negative_cache = no
# Setting to 550 to make a reject a fatal error, not a defer.
unverified_recipient_reject_code = 550
# The point of a backup MX is to accept mail when the primary is down, so setting this prevents incoming mail being deferred when the address probe cannot be done.
# TODO: find out how to actually implement it, because it doesn't work; permit it not a valid option here.
#unverified_recipient_tempfail_action = permit 

The reject_unknown_recipient_domain prevents probes to domains that don’t exist.

Some extra options, not strictly related to being a backup MX:

smtpd_sender_restrictions = reject_unknown_sender_domain, reject_unknown_helo_hostname
unknown_address_reject_code = 550
# Override the default of 5 days, because the point of a backup MX is to keep it around for a while.
maximal_queue_lifetime = 21d

« Older posts Newer posts »

© 2025 BigSmoke

Theme by Anders NorenUp ↑