Dominic Cleal's Blog: Items from January 2010

mdadm + dm-multipath = Device or resource busy

Recently I've been working on a different type of cloud infrastructure with new storage options. One of the changes is to now use remote storage over iSCSI rather than FibreChannel directly attached storage.

On some of our CentOS 5.4 systems, we have the following layers above the iSCSI block devices:

iSCSI block devices /dev/sdc and /dev/sdd managed with iscsiadm
dm-multipath devices /dev/mapper/mpath1 for /dev/sdc and mpath2 for sdd
Linux software RAID0 at /dev/md0 over /dev/mapper/mpath1 and mpath2
LVM2 VG vgexample on /dev/md0

On one particular system, the setup had already been running once without the dm-multipath layer and was been working fine. When trying to enable the dm-multipath layer on this system (well, actually, when Puppet ran) it was unable to bring up the software RAID device /dev/md0.

Running the RAID assembly command gave the following output:

# mdadm -A /dev/md0 /dev/mapper/mpath[12]
mdadm: Cannot open /dev/mapper/mpath1: Device or resource busy
mdadm: create aborted

Strangely, the array could be assembled with the second device, but not the first (even though it wouldn't start):

# mdadm -A /dev/md0 /dev/mapper/mpath2
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.

Running mdadm under strace showed that it was unable to get an exclusive file handle on the device:

 44 15176 open("/dev/mapper/mpath1", O_RDONLY|O_EXCL) = -1 EBUSY (Device or resource busy)
 45 15176 write(2, "mdadm: cannot open device /dev/m"..., 70) = 70
 46 15176 write(2, "mdadm: /dev/mapper/mpath1 has no"..., 63) = 63
 47 15176 exit_group(1)                     = ?

Compared to the above mdadm assemble with the one working device:

 44 15180 open("/dev/mapper/mpath2", O_RDONLY|O_EXCL) = 4
 45 15180 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0
 46 15180 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0

It turned out to be caused by a partition table that existed on the second disk (from a previous life where it had been partitioned). This could be seen by looking at the devices generated by dm-multipath and fdisk -l:

# ls -l /dev/mapper/mpath*
total 0
brw-rw---- 1 root disk 253,  0 Jan  2 17:39 mpath1
brw-rw---- 1 root disk 253,  2 Jan  2 17:39 mpath1p1
brw-rw---- 1 root disk 253,  1 Jan  2 17:39 mpath2

By bad luck, the partition table on this disk was still intact underneath the RAID0 and so it was being read and a device for the partition was created, though multipath itself won't show you this:

# multipath -l
mpath2 (1494554000000000000000000010000006a09000011000000) dm-1 IET,VIRTUAL-DISK
[size=232G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:0:0 sdd 8:48  [active][undef]
mpath1 (1494554000000000000000000010000005e09000011000000) dm-0 IET,VIRTUAL-DISK
[size=232G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:0:0 sdc 8:32  [active][undef]

If a partition exists, dm-multipath creates a device and blocks exclusive access to the main device. The easiest way to fix this was to wipe the boot sector from the drive and to recreate the RAID on top. It may be possible to wipe the boot sector without recreating the RAID, but I wasn't going to risk this. To wipe the boot sector, simply wipe the first 512 bytes from the drive (at your own risk..):

# dd if=/dev/zero of=/dev/sdc bs=512 count=1

Posted on Sat Jan 2 17:50:23 GMT 2010 | Permalink | Add comment

BBS2 for a home NAS, part 5: OpenSolaris to FreeBSD

While updating my blog earlier, I noticed a comment stuck in the moderation queue of another article that I hadn't seen until now. Apologies to Steven who has probably long since given up hope of a reply, but here it is anyway.

Sorry to hijack this comment, but I was wondering what the state of your quest for zfs on the BBS2 is? I'm considering buying one myself, for the exact same setup, OpenSolaris + zfs (the alternative is a more traditional NAS like the QNAP 439). I'm especially interested in the performance of the system (as a NAS), and whether you made any progress on the problem with the network card drivers. Have you tried/considered using Nexenta iso OpenSolaris?

thanks!

Steven

I mentioned earlier last year that a kernel engineer was looking into my nasty Realtek RTL8111/8168B issues under OpenSolaris on the BBS2, but I didn't provide an update.

At the time, a kind engineer at Sun had seen this post and passed my contact details onto somebody who could help. For a number of weeks, I set up the BBS2 with a serial console via another box and a small GigE network that the Sun developer could use to try and trigger the bug between the two machines.

This was fairly successful and after a little time, he had managed to get the delay for a network restart down to about 20-30 seconds with modifications to the rge kernel module and was confident that it could be reduced to just seconds. I had on a couple of occassions seen complete network dropouts where the kernel didn't seem to notice that the chipset had died and so didn't reset it. Unfortunately, I couldn't reproduce these at all and so the developer couldn't provide any more help.

Since that fizzled out in April/May, I hope the fixes that Sun came up with have made their way into the OpenSolaris source tree, but I haven't checked the change logs recently. Thanks to both Sébastien and Winson for their efforts!

After leaving the BBS2 for a few months, I decided to install FreeBSD 7.2 after replacing my desktop in October. At first I ran into a few issues with lockups while trying to copy data on, but with some tweaks, including recompiling the kernel to increase the possible kernel address space the system has been pretty much perfectly stable since.

I'll be upgrading the box to FreeBSD 8.0 soon in the understanding that it has improved memory handling for ZFS and won't need the tweaking and recompiling that was necessary to run ZFS under 7.x. FreeBSD is definitely recommended if you're looking for ZFS, though it isn't for the fearless!

Posted on Sat Jan 2 20:53:49 GMT 2010 | Permalink | Add comment