Dominic Cleal's Blog

Cron, local time and daylight savings (DST)

Like many others, I prefer servers to run in UTC rather than the local time zone. Particularly with the Java services we run, I find that some logs always remain in UTC, while others observe the daylight savings (DST) of the local time zone (probably due to the multitude of logging APIs).

Sticking with UTC makes everything unambiguous, but can cause problems when scheduling crons that are dependent on the local time. Rather than change systems to run in local time, cron jobs that are linked to local time are now scheduled an hour earlier and run a script that sleeps if the local time hasn't yet been reached.

A job that needs in run at 7am UK time is now scheduled for 6am via cron. In the summer while the UK's observing DST, the script can be run at 6am UTC ("spring forwards"). Once DST is over, a job scheduled at 6am UTC will need to wait an hour before starting so it's 7am UK time and 7am UTC.

This is fairly trivial in shell script, see my cron_tz.sh. Once I'd written this, I discovered a Python version that does pretty much the same thing.

Now my crontab simply looks like this:

15 06 * * *   cron_tz.sh Europe/London 07 /my/command

Interestingly, on the same ServerFault question, SAnnukka mentions that Cfengine has a way of automating this with calendars. Could the same be done with Puppet? Perhaps a function that takes a time in a given time zone and converts it to UTC (or another TZ) that could then be used for a cron resource? If Puppet's being run regularly anyway, then the cron jobs would be quickly updated and kept to the correct time.

BBS2 for a home NAS, part 5: OpenSolaris to FreeBSD

While updating my blog earlier, I noticed a comment stuck in the moderation queue of another article that I hadn't seen until now. Apologies to Steven who has probably long since given up hope of a reply, but here it is anyway.

Sorry to hijack this comment, but I was wondering what the state of your quest for zfs on the BBS2 is? I'm considering buying one myself, for the exact same setup, OpenSolaris + zfs (the alternative is a more traditional NAS like the QNAP 439). I'm especially interested in the performance of the system (as a NAS), and whether you made any progress on the problem with the network card drivers. Have you tried/considered using Nexenta iso OpenSolaris?

thanks!

Steven

I mentioned earlier last year that a kernel engineer was looking into my nasty Realtek RTL8111/8168B issues under OpenSolaris on the BBS2, but I didn't provide an update.

At the time, a kind engineer at Sun had seen this post and passed my contact details onto somebody who could help. For a number of weeks, I set up the BBS2 with a serial console via another box and a small GigE network that the Sun developer could use to try and trigger the bug between the two machines.

This was fairly successful and after a little time, he had managed to get the delay for a network restart down to about 20-30 seconds with modifications to the rge kernel module and was confident that it could be reduced to just seconds. I had on a couple of occassions seen complete network dropouts where the kernel didn't seem to notice that the chipset had died and so didn't reset it. Unfortunately, I couldn't reproduce these at all and so the developer couldn't provide any more help.

Since that fizzled out in April/May, I hope the fixes that Sun came up with have made their way into the OpenSolaris source tree, but I haven't checked the change logs recently. Thanks to both Sébastien and Winson for their efforts!

After leaving the BBS2 for a few months, I decided to install FreeBSD 7.2 after replacing my desktop in October. At first I ran into a few issues with lockups while trying to copy data on, but with some tweaks, including recompiling the kernel to increase the possible kernel address space the system has been pretty much perfectly stable since.

I'll be upgrading the box to FreeBSD 8.0 soon in the understanding that it has improved memory handling for ZFS and won't need the tweaking and recompiling that was necessary to run ZFS under 7.x. FreeBSD is definitely recommended if you're looking for ZFS, though it isn't for the fearless!

mdadm + dm-multipath = Device or resource busy

Recently I've been working on a different type of cloud infrastructure with new storage options. One of the changes is to now use remote storage over iSCSI rather than FibreChannel directly attached storage.

On some of our CentOS 5.4 systems, we have the following layers above the iSCSI block devices:

  • iSCSI block devices /dev/sdc and /dev/sdd managed with iscsiadm
  • dm-multipath devices /dev/mapper/mpath1 for /dev/sdc and mpath2 for sdd
  • Linux software RAID0 at /dev/md0 over /dev/mapper/mpath1 and mpath2
  • LVM2 VG vgexample on /dev/md0

On one particular system, the setup had already been running once without the dm-multipath layer and was been working fine. When trying to enable the dm-multipath layer on this system (well, actually, when Puppet ran) it was unable to bring up the software RAID device /dev/md0.

Running the RAID assembly command gave the following output:

# mdadm -A /dev/md0 /dev/mapper/mpath[12]
mdadm: Cannot open /dev/mapper/mpath1: Device or resource busy
mdadm: create aborted
Strangely, the array could be assembled with the second device, but not the first (even though it wouldn't start):
# mdadm -A /dev/md0 /dev/mapper/mpath2
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.
Running mdadm under strace showed that it was unable to get an exclusive file handle on the device:
 44 15176 open("/dev/mapper/mpath1", O_RDONLY|O_EXCL) = -1 EBUSY (Device or resource busy)
 45 15176 write(2, "mdadm: cannot open device /dev/m"..., 70) = 70
 46 15176 write(2, "mdadm: /dev/mapper/mpath1 has no"..., 63) = 63
 47 15176 exit_group(1)                     = ?
Compared to the above mdadm assemble with the one working device:
 44 15180 open("/dev/mapper/mpath2", O_RDONLY|O_EXCL) = 4
 45 15180 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0
 46 15180 fstat(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0
It turned out to be caused by a partition table that existed on the second disk (from a previous life where it had been partitioned). This could be seen by looking at the devices generated by dm-multipath and fdisk -l:
# ls -l /dev/mapper/mpath*
total 0
brw-rw---- 1 root disk 253,  0 Jan  2 17:39 mpath1
brw-rw---- 1 root disk 253,  2 Jan  2 17:39 mpath1p1
brw-rw---- 1 root disk 253,  1 Jan  2 17:39 mpath2
By bad luck, the partition table on this disk was still intact underneath the RAID0 and so it was being read and a device for the partition was created, though multipath itself won't show you this:
# multipath -l
mpath2 (1494554000000000000000000010000006a09000011000000) dm-1 IET,VIRTUAL-DISK
[size=232G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 6:0:0:0 sdd 8:48  [active][undef]
mpath1 (1494554000000000000000000010000005e09000011000000) dm-0 IET,VIRTUAL-DISK
[size=232G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:0:0 sdc 8:32  [active][undef] 
If a partition exists, dm-multipath creates a device and blocks exclusive access to the main device. The easiest way to fix this was to wipe the boot sector from the drive and to recreate the RAID on top. It may be possible to wipe the boot sector without recreating the RAID, but I wasn't going to risk this. To wipe the boot sector, simply wipe the first 512 bytes from the drive (at your own risk..):
# dd if=/dev/zero of=/dev/sdc bs=512 count=1

Silent calls from Phones 4U

This week I've been getting repeated calls from Phones 4U on 08450264628, which has an entry on whocallsme.com with many other unhappy people.

Out of five calls in the past few days, two have been silent calls - thanks Ofcom. Three have been put through after a few seconds to a sales drone who delights in telling me that the call is coming directly from the Phones 4U head office - fantastic!

No matter how many times I've told them to call me back, they keep on trying to call me. Hopefully this will stop when my contract finally expires!

BBS2 for a home NAS, part 4: OpenSolaris + RTL8111/8168B

As mentioned in part 2, I've been experiencing network dropouts when using the Tranquil PC BBS2 heavily with OpenSolaris 2008.11. It's been mostly fine and speedy when my home directory's mounted from it to my desktop, except when doing a demanding operation, like extracting multi-GB archives to and from the NAS over NFS.

The system would simply stop responding to any network traffic, including no ARP responses and so no pings etc. Logging into the system itself, I was unable to use the network also and a mirrored switch port confirmed that no network traffic (ARP requests) was leaving the machine.

After about five or six minutes, the network would simply start working again. There were no obvious reasons why, no kernel messages, no trigger that seemed to restore it.

I've mentioned to reproduce this reliably using iperf to generate traffic back and forwards over the interface. On the NAS, run the server portion:

iperf -s
And on another machine (mine's connected over GigE):
iperf -c server -d -t 180
This will run iperf with parallel send/recv processes to stress the network for three minutes. Normally, this is enough to trigger the bug after a couple of minutes. I've yet to reproduce it doing one-way only tests, perhaps it's the heavy duplex traffic that's the trigger?

The network chipset is, unfortunately, a Realtek. A prtconf -v lists it as a Realtek RTL8102, lspci under Linux lists it as RTL8111/8102B (rev 02) and lastly the motherboard specs list it as an RTL8111C. Under Solaris, this is all handled under the rge module/driver.

There are two mentions of this elsewhere. Daz writes about the same issue and the various fixes he's tried applying to get the RTL8111 card running properly under OpenSolaris, though concludes by purchasing an Intel card. Unfortunately in the BBS2, there's only one PCI slot, which contains the SATA controller, so there's no option to replace it (plus the PCI backplane isn't visible through the case).

Secondly, there's a bug report at opensolaris.org which describes the same issue (currently accepted as a problem). Interestingly, it lists the bug as a regression since Solaris 10, so I may try installing that and testing it.

Update for concerned Linux users: the system seems to run fine with Linux, I wouldn't worry. The only thing I'm told to watch out for is the fact this chipset's revision 2, so you need to make sure you have a very recent kernel (i.e. 2.6.26 or above). Older kernels apparently may detect the card, but not pass any traffic.

Update for concerned OpenSolaris users: an engineer from the kernel team is currently investigating this issue. Will post more when there's news/progress.

Other related posts:

BBS2 for a home NAS, part 3: sharing configuration

One of the niceties of the OpenSolaris ZFS integration is how easy it was to set up NFS and CIFS network file sharing. I started off with a ZFS dataset layout like this (yeah, yeah, 'tank'):

$ zfs list -r tank
NAME                USED  AVAIL  REFER  MOUNTPOINT
tank                359G  2.32T  26.9K  none
tank/home           359G  2.32T  28.4K  /export/home
tank/home/dominic   359G  2.32T   354G  /export/home/dominic
You can see in the above, the mountpoint of tank/home has been changed (zfs set mountpoint=/export/home tank/home) to replace the default rpool/export/home dataset (which I destroyed). I wanted any dataset under tank/home to be automatically available over NFS, to enable file compression and cross-protocol file locking for CIFS:
$ pfexec zfs set sharenfs=on tank/home
$ pfexec zfs set compression=on tank/home
$ pfexec zfs set nbmand=on tank/home
And then tank/home/dominic will inherit these properties:
$ zfs get sharenfs,compression,nbmand tank/home/dominic
NAME               PROPERTY  VALUE              SOURCE
tank/home/dominic  sharenfs     on                 inherited from tank/home
tank/home/dominic  compression  on                 inherited from tank/home
tank/home/dominic  nbmand       on                 inherited from tank/home
For my Linux desktop, I used autofs to mount the NFS filesystem automatically from the server ("argon") as it was needed. This simply needed two config changes (plus the installation of NFS utils and autofs itself). To /etc/auto.master, I added one line to define this set of automounts:
/home     /etc/auto.home  --timeout=60
And then the referenced /etc/auto.home config looks like:
*        -fstype=nfs,rw,nosuid,soft,intr        argon:/export/home/&
This matches any request for /home/<username> and automounts the NFS share argon:/export/home/<username>.

Lastly, I wanted the files available over CIFS as well. This was simply a matter of following these docs to get SMB connectivity working (with the exception of enabling the svc:/network/smb/server:default service, no need to import it).

Once the SMB server was running, I used the autohome share feature in Solaris to make user directories available as users identify and login to the server with their username/password. To do this, /etc/smbautohome simply contains:

*       /export/home/&

Amazingly, that's about the extent of the configuration on the system!

Other related posts:

Older entries are hidden, but you can browse them with the date categories at the top-right of this page.

Created by Chronicle v3.5

Archives