[illumos-Developer] Important - time sensitive: Drive failures and infinite waits

Alasdair Lumsden alasdairrr at gmail.com
Thu May 26 06:41:37 PDT 2011


Hi All,

Twice in the past 2 weeks we've suffered a drive failure which caused an entire storage node to lock up not responding to IO, with iostat showing a 100% busy time against a single disk whilst the others sit idle. The only resolution was to yank the drive out.

These were two completely different machines as well, one a pair of Dell R710s attached to LSI SAS 6Gbps disk shelves via an LSI 9200-8e card using the mpt_sas driver, with 36 Seagate Constellation ES SAS disks. The other machine is a custom build with a Supermicro motherboard, LSI 3801E-R cards using the mpt driver, and 48 Western Digital SATA drives.

So this is two different machines, different RAID cards, different drivers, different disks, exhibiting exactly the same failure mode.

On the storage array this happened on today, I had already adjusted the sd timeout to 7 seconds, with 3 retries, using:

set sd:sd_io_time=7 (/etc/system)
sd-config-list = "ATA     WDC WD7501AALS-0", "retries-timeout:3"; (/kernel/drv/sd.conf)

So in theory, when a disk stalls, it should get removed by sd after 21 seconds. It has been over 30 mins now whilst the machine sits there attempting to write to the pool.

The good news is, that this SAN wasn't in production and has nothing on it (yet). I need to return it to service within the next 48 hours, but in the mean time this is an ideal opportunity for one of the Illumos kernel developers to get on the box and do some diagnosing.

This is one of the biggest and most serious issues with using ZFS in SAN/NAS environments that I've seen - that when a drive fails, it doesn't get taken out of service, and I've seen it quite a few times before.

I'm hoping that now it can be reproduced, the devs can nail this once and for all. Please contact me off-list and I'll provide SSH access details to get on it.

But this disk may fail completely soon, so please act quickly otherwise the window of opportunity may be lost.

Cheers,

Alasdair







More information about the Developer mailing list