[illumos-Developer] Important - time sensitive: Drive failures and infinite waits
Haudy Kazemi
kaze0010 at umn.edu
Thu May 26 14:46:16 PDT 2011
On 5/26/2011 2:10 PM, Alasdair Lumsden wrote:
> Hi Garrett,
>
> I've collected together all the info as best I could here:
>
> https://www.illumos.org/issues/1069
>
> I'm going to send another email with login details so if you find an opportunity to take a look it would of course be much appreciated. It sounds like quite a few other people have been bitten by this over the years. George Wilson believes he's seen it before, as does estibi, and a few others.
>
> Thanks,
>
> Alasdair
If anyone experiences this scenario running the whole system in a
virtual machine, perhaps you can use your virtual machine software to
save a state snapshot.
There have been sporadic reports sounding similar to Alasdair's over the
years. The zfs-discuss list has records of some of them and related
discussion (for examples, look at the archives from 2010-07-22 to 24 to
see posts in the thread called '1tb SATA drives' by Miles Nordin,
myself, and others.)
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg40826.html
IIRC, many discussions ended with "use enterprise drives, equipment, and
TLER" instead of less expensive equipment. That was not a satisfactory
response to discussion participants who held the view that ZFS should be
able to smoothly recover from failure of drives/controllers/storage
drivers, whether or not those subsystems behaved as expected during a
failure. To do otherwise, means ZFS is making storage subsystem
behavior and reliability assumptions are not in tune with its motto of
being the "last word in filesystems" and its goal of achieving high
reliability and high performance, even on less expensive storage devices.
The slow-death failure scenario comes to mind, where a particular drive
still works but is much slower than usual. No error is reported,
however the response time is severely impacted. Borderline sectors
failing slowly can be triggers. Given appropriate redundancy, and an
expected maximum response time threshold, ZFS could reconstruct the
needed data using the other devices anytime the maximum response time
threshold was exceeded. E.g. If a device access request is not
responded to within 7 seconds (configurable to allow for different pool
architectures), attempt to reconstruct the data; no need for TLER
support in the drive, no need for the storage drivers to respond in a
particular way. A monitoring of device performance changes over time
and observed delays could also serve as a warning that the devices may
be failing via slow-death.
IIRC, ZFS built on top of iSCSI devices was also impacted by timeout
issues. I don't know if those iSCSI timeouts were addressed or
otherwise made configurable or not.
More information about the Developer
mailing list