[illumos-Developer] Important - time sensitive: Drive failures and infinite waits

Piotr Jasiukajtis estseg at gmail.com
Thu May 26 07:26:36 PDT 2011


>From my old notes:

> ::walk zio_root | ::zio -r
ADDRESS                                  TYPE  STAGE            WAITER
ffffff0710074648                         NULL  OPEN             -
ffffff08f8076348                         NULL  CHECKSUM_VERIFY  ffffff07043f2080
 ffffff071d71a690                        READ  VDEV_IO_START    -
  ffffff08ed7f8680                       READ  VDEV_IO_START    -
   ffffff08ecf45678                      READ  VDEV_IO_DONE     -
    ffffff08c827c650                     READ  VDEV_IO_START    -
ffffff097a1ca020                         NULL  CHECKSUM_VERIFY  ffffff0030090c40
 ffffff08c5baa320                        WRITE VDEV_IO_START    -
  ffffff097acdb660                       WRITE VDEV_IO_START    -
   ffffff097a944648                      WRITE VDEV_IO_START    -
  ffffff097a7dd058                       WRITE VDEV_IO_START    -
   ffffff097a2f0680                      WRITE VDEV_IO_START    -
  ffffff085746b640                       WRITE VDEV_IO_START    -
   ffffff097ab43978                      WRITE VDEV_IO_START    -
    ffffff097a6da320                     WRITE VDEV_IO_START    -
   ffffff097a7a4688                      WRITE VDEV_IO_START    -
    ffffff0979f14340                     WRITE VDEV_IO_START    -
     ffffff097aba9988                    WRITE VDEV_IO_START    -
    ffffff097a1f89a8                     WRITE VDEV_IO_START    -
     ffffff08ad0ee328                    WRITE VDEV_IO_START    -
    ffffff08cc8f8350                     WRITE VDEV_IO_START    -
     ffffff097a574990                    WRITE VDEV_IO_START    -
   ffffff097a3cd380                      WRITE VDEV_IO_START    -
    ffffff097a147660                     WRITE VDEV_IO_START    -
     ffffff097a179038                    WRITE VDEV_IO_START    -
    ffffff097a196cc8                     WRITE VDEV_IO_START    -
     ffffff097aa789a8                    WRITE VDEV_IO_START    -
    ffffff071d126058                     WRITE VDEV_IO_START    -
     ffffff097aae79a8                    WRITE VDEV_IO_START    -
    ffffff097aa25688                     WRITE VDEV_IO_START    -
     ffffff097a585ca0                    WRITE VDEV_IO_START    -
      ffffff097a0d6018                   WRITE VDEV_IO_START    -
     ffffff08b71af338                    WRITE VDEV_IO_START    -
      ffffff097a355010                   WRITE VDEV_IO_START    -
     ffffff097abf29a0                    WRITE VDEV_IO_START    -
      ffffff097a859978                   WRITE VDEV_IO_START    -
     ffffff097acec970                    WRITE VDEV_IO_START    -
      ffffff0718201cd0                   WRITE VDEV_IO_START    -
       ffffff07100c8320                  WRITE VDEV_IO_START    -
      ffffff097aca0668                   WRITE VDEV_IO_START    -
       ffffff0781117030                  WRITE VDEV_IO_START    -
      ffffff0979fc8018                   WRITE VDEV_IO_START    -
       ffffff071d664340                  WRITE VDEV_IO_START    -
    ffffff0782094028                     WRITE VDEV_IO_START    -
     ffffff08ee9f9c88                    WRITE VDEV_IO_START    -
      ffffff086a0c4990                   WRITE VDEV_IO_START    -
     ffffff08952209b8                    WRITE VDEV_IO_START    -
      ffffff086ed02960                   WRITE VDEV_IO_START    -
     ffffff097a62ecb0                    WRITE VDEV_IO_START    -
      ffffff097ac9c048                   WRITE VDEV_IO_START    -
     ffffff097ad0e340                    WRITE VDEV_IO_START    -
      ffffff07820c0040                   WRITE VDEV_IO_START    -
       ffffff097a566370                  WRITE VDEV_IO_START    -
      ffffff0871a39ce0                   WRITE VDEV_IO_START    -
       ffffff08bdb66650                  WRITE VDEV_IO_START    -
      ffffff097abea9a0                   WRITE VDEV_IO_START    -
       ffffff08aa151cb8                  WRITE VDEV_IO_START    -
   ffffff08aa151678                      WRITE VDEV_IO_START    -
    ffffff097a498020                     WRITE VDEV_IO_START    -
     ffffff097a261378                    WRITE VDEV_IO_START    -
    ffffff097a1959b0                     WRITE VDEV_IO_START    -
     ffffff0871a37330                    WRITE VDEV_IO_START    -
    ffffff08b92249b0                     WRITE VDEV_IO_START    -
     ffffff097a657010                    WRITE VDEV_IO_START    -
    ffffff0979f65060                     WRITE VDEV_IO_START    -
     ffffff097a0fbc98                    WRITE VDEV_IO_START    -
      ffffff085e644c80                   WRITE VDEV_IO_START    -
     ffffff097a6ab9c0                    WRITE VDEV_IO_START    -
      ffffff097a6016a0                   WRITE VDEV_IO_START    -
     ffffff097a5d5c88                    WRITE VDEV_IO_START    -
      ffffff097ad04c80                   WRITE VDEV_IO_START    -
     ffffff097aadf970                    WRITE VDEV_IO_START    -
      ffffff097a911040                   WRITE VDEV_IO_START    -
       ffffff071d721cd8                  WRITE VDEV_IO_START    -
      ffffff097a50d688                   WRITE VDEV_IO_START    -
       ffffff07814ca018                  WRITE VDEV_IO_START    -
      ffffff097a199998                   WRITE VDEV_IO_START    -
       ffffff097a1cf648                  WRITE VDEV_IO_START    -
      ffffff097a4d0340                   WRITE VDEV_IO_START    -
       ffffff08b807a980                  WRITE VDEV_IO_START    -
        ffffff086214fcc0                 WRITE VDEV_IO_START    -
       ffffff097a03c348                  WRITE VDEV_IO_START    -
        ffffff08c9c3a368                 WRITE VDEV_IO_START    -
       ffffff097a8be9b0                  WRITE VDEV_IO_START    -
        ffffff097abf8668                 WRITE VDEV_IO_START    -
   ffffff08c8eac340                      WRITE VDEV_IO_START    -
    ffffff097a191cd8                     WRITE VDEV_IO_START    -
     ffffff0979fee9b8                    WRITE VDEV_IO_START    -
    ffffff097a06e9c0                     WRITE VDEV_IO_START    -
     ffffff086f9dd060                    WRITE VDEV_IO_START    -
    ffffff08aecf5968                     WRITE VDEV_IO_START    -
     ffffff08b4701ca0                    WRITE VDEV_IO_START    -
    ffffff088f131370                     WRITE VDEV_IO_START    -
     ffffff097a4b5658                    WRITE VDEV_IO_START    -
      ffffff08b1a9e008                   WRITE VDEV_IO_START    -
    ffffff097acaacd0                     WRITE VDEV_IO_START    -
     ffffff097a633678                    WRITE VDEV_IO_START    -
      ffffff088ebdd048                   WRITE VDEV_IO_START    -
     ffffff097a381008                    WRITE VDEV_IO_START    -
      ffffff097a361360                   WRITE VDEV_IO_START    -
       ffffff097a31e380                  WRITE VDEV_IO_START    -
      ffffff097ac436a0                   WRITE VDEV_IO_START    -
       ffffff08eead7350                  WRITE VDEV_IO_START    -
      ffffff08f42cf350                   WRITE VDEV_IO_START    -
       ffffff08a46fd338                  WRITE VDEV_IO_START    -
      ffffff097a5d9cd8                   WRITE VDEV_IO_START    -
       ffffff097a867ca8                  WRITE VDEV_IO_START    -
        ffffff097a42f370                 WRITE VDEV_IO_START    -
       ffffff08ecf6e320                  WRITE VDEV_IO_START    -
        ffffff07820aa980                 WRITE VDEV_IO_START    -
       ffffff0909da9cc8                  WRITE VDEV_IO_START    -
        ffffff097ab46648                 WRITE VDEV_IO_START    -
ffffff097a232ca0                         NULL  CHECKSUM_VERIFY  ffffff06ffd87bc0
 ffffff08b4701660                        READ  VDEV_IO_START    -
  ffffff097acf9698                       READ  VDEV_IO_START    -
   ffffff097a661cb0                      READ  VDEV_IO_DONE     -
    ffffff097a457040                     READ  VDEV_IO_START    -
ffffff086237dcd8                         NULL  CHECKSUM_VERIFY  ffffff070004f840
 ffffff08ed7e8698                        READ  VDEV_IO_START    -
  ffffff08ee8d8020                       READ  VDEV_IO_START    -
   ffffff0979ed4010                      READ  VDEV_IO_DONE     -
    ffffff0880e5b360                     READ  VDEV_IO_START    -
ffffff097a3decd8                         NULL  CHECKSUM_VERIFY  ffffff0702428b80
 ffffff088e270360                        READ  VDEV_IO_START    -
  ffffff0979fe2030                       READ  VDEV_IO_START    -
   ffffff097a168960                      READ  VDEV_IO_DONE     -
    ffffff097ac82040                     READ  VDEV_IO_START    -
   ffffff0979ee6ca8                      READ  VDEV_IO_DONE     -
    ffffff08bb44a658                     READ  VDEV_IO_START    -
ffffff06fc641c88                         NULL  OPEN             -
ffffff097a680368                         NULL  CHECKSUM_VERIFY  ffffff0a51b28860
 ffffff0979f1c058                        READ  VDEV_IO_START    -
  ffffff097a48c650                       READ  VDEV_IO_START    -
   ffffff078111d358                      READ  VDEV_IO_DONE     -
    ffffff097a514380                     READ  VDEV_IO_START    -
ffffff06fc8f2380                         NULL  OPEN             -
ffffff097a963358                         NULL  CHECKSUM_VERIFY  ffffff0710147820
 ffffff089c1f2338                        READ  VDEV_IO_START    -
  ffffff097a214060                       READ  VDEV_IO_START    -
   ffffff078208a038                      READ  VDEV_IO_DONE     -
    ffffff097a457040                     READ  VDEV_IO_START    -
ffffff097a9c29c0                         NULL  CHECKSUM_VERIFY  ffffff07023ddb40
 ffffff097a682cb0                        READ  VDEV_IO_START    -
  ffffff08af8c5040                       READ  VDEV_IO_START    -
   ffffff08738cf038                      READ  VDEV_IO_DONE     -
    ffffff097a90fcc8                     READ  VDEV_IO_START    -
ffffff0979ff2988                         NULL  CHECKSUM_VERIFY  ffffff0773cb4400
 ffffff097a9a4048                        READ  VDEV_IO_START    -
  ffffff08f8071050                       READ  VDEV_IO_START    -
   ffffff08cc8f9c80                      READ  VDEV_IO_DONE     -
    ffffff097a457040                     READ  VDEV_IO_START    -



> ffffff097a1ca020::zio
ADDRESS                                  TYPE  STAGE            WAITER
ffffff097a1ca020                         NULL  CHECKSUM_VERIFY  ffffff0030090c40



> ffffff097a1ca020::zio -r
ADDRESS                                  TYPE  STAGE            WAITER
ffffff097a1ca020                         NULL  CHECKSUM_VERIFY  ffffff0030090c40
 ffffff08c5baa320                        WRITE VDEV_IO_START    -
  ffffff097acdb660                       WRITE VDEV_IO_START    -
   ffffff097a944648                      WRITE VDEV_IO_START    -
  ffffff097a7dd058                       WRITE VDEV_IO_START    -
   ffffff097a2f0680                      WRITE VDEV_IO_START    -
  ffffff085746b640                       WRITE VDEV_IO_START    -
   ffffff097ab43978                      WRITE VDEV_IO_START    -
    ffffff097a6da320                     WRITE VDEV_IO_START    -
   ffffff097a7a4688                      WRITE VDEV_IO_START    -
    ffffff0979f14340                     WRITE VDEV_IO_START    -
     ffffff097aba9988                    WRITE VDEV_IO_START    -
    ffffff097a1f89a8                     WRITE VDEV_IO_START    -
     ffffff08ad0ee328                    WRITE VDEV_IO_START    -
    ffffff08cc8f8350                     WRITE VDEV_IO_START    -
     ffffff097a574990                    WRITE VDEV_IO_START    -
   ffffff097a3cd380                      WRITE VDEV_IO_START    -
    ffffff097a147660                     WRITE VDEV_IO_START    -
     ffffff097a179038                    WRITE VDEV_IO_START    -
    ffffff097a196cc8                     WRITE VDEV_IO_START    -
     ffffff097aa789a8                    WRITE VDEV_IO_START    -
    ffffff071d126058                     WRITE VDEV_IO_START    -
     ffffff097aae79a8                    WRITE VDEV_IO_START    -
    ffffff097aa25688                     WRITE VDEV_IO_START    -
     ffffff097a585ca0                    WRITE VDEV_IO_START    -
      ffffff097a0d6018                   WRITE VDEV_IO_START    -
     ffffff08b71af338                    WRITE VDEV_IO_START    -
      ffffff097a355010                   WRITE VDEV_IO_START    -
     ffffff097abf29a0                    WRITE VDEV_IO_START    -
      ffffff097a859978                   WRITE VDEV_IO_START    -
     ffffff097acec970                    WRITE VDEV_IO_START    -
      ffffff0718201cd0                   WRITE VDEV_IO_START    -
       ffffff07100c8320                  WRITE VDEV_IO_START    -
      ffffff097aca0668                   WRITE VDEV_IO_START    -
       ffffff0781117030                  WRITE VDEV_IO_START    -
      ffffff0979fc8018                   WRITE VDEV_IO_START    -
       ffffff071d664340                  WRITE VDEV_IO_START    -
    ffffff0782094028                     WRITE VDEV_IO_START    -
     ffffff08ee9f9c88                    WRITE VDEV_IO_START    -
      ffffff086a0c4990                   WRITE VDEV_IO_START    -
     ffffff08952209b8                    WRITE VDEV_IO_START    -
      ffffff086ed02960                   WRITE VDEV_IO_START    -
     ffffff097a62ecb0                    WRITE VDEV_IO_START    -
      ffffff097ac9c048                   WRITE VDEV_IO_START    -
     ffffff097ad0e340                    WRITE VDEV_IO_START    -
      ffffff07820c0040                   WRITE VDEV_IO_START    -
       ffffff097a566370                  WRITE VDEV_IO_START    -
      ffffff0871a39ce0                   WRITE VDEV_IO_START    -
       ffffff08bdb66650                  WRITE VDEV_IO_START    -
      ffffff097abea9a0                   WRITE VDEV_IO_START    -
       ffffff08aa151cb8                  WRITE VDEV_IO_START    -
   ffffff08aa151678                      WRITE VDEV_IO_START    -
    ffffff097a498020                     WRITE VDEV_IO_START    -
     ffffff097a261378                    WRITE VDEV_IO_START    -
    ffffff097a1959b0                     WRITE VDEV_IO_START    -
     ffffff0871a37330                    WRITE VDEV_IO_START    -
    ffffff08b92249b0                     WRITE VDEV_IO_START    -
     ffffff097a657010                    WRITE VDEV_IO_START    -
    ffffff0979f65060                     WRITE VDEV_IO_START    -
     ffffff097a0fbc98                    WRITE VDEV_IO_START    -
      ffffff085e644c80                   WRITE VDEV_IO_START    -
     ffffff097a6ab9c0                    WRITE VDEV_IO_START    -
      ffffff097a6016a0                   WRITE VDEV_IO_START    -
     ffffff097a5d5c88                    WRITE VDEV_IO_START    -
      ffffff097ad04c80                   WRITE VDEV_IO_START    -
     ffffff097aadf970                    WRITE VDEV_IO_START    -
      ffffff097a911040                   WRITE VDEV_IO_START    -
       ffffff071d721cd8                  WRITE VDEV_IO_START    -
      ffffff097a50d688                   WRITE VDEV_IO_START    -
       ffffff07814ca018                  WRITE VDEV_IO_START    -
      ffffff097a199998                   WRITE VDEV_IO_START    -
       ffffff097a1cf648                  WRITE VDEV_IO_START    -
      ffffff097a4d0340                   WRITE VDEV_IO_START    -
       ffffff08b807a980                  WRITE VDEV_IO_START    -
        ffffff086214fcc0                 WRITE VDEV_IO_START    -
       ffffff097a03c348                  WRITE VDEV_IO_START    -
        ffffff08c9c3a368                 WRITE VDEV_IO_START    -
       ffffff097a8be9b0                  WRITE VDEV_IO_START    -
        ffffff097abf8668                 WRITE VDEV_IO_START    -
   ffffff08c8eac340                      WRITE VDEV_IO_START    -
    ffffff097a191cd8                     WRITE VDEV_IO_START    -
     ffffff0979fee9b8                    WRITE VDEV_IO_START    -
    ffffff097a06e9c0                     WRITE VDEV_IO_START    -
     ffffff086f9dd060                    WRITE VDEV_IO_START    -
    ffffff08aecf5968                     WRITE VDEV_IO_START    -
     ffffff08b4701ca0                    WRITE VDEV_IO_START    -
    ffffff088f131370                     WRITE VDEV_IO_START    -
     ffffff097a4b5658                    WRITE VDEV_IO_START    -
      ffffff08b1a9e008                   WRITE VDEV_IO_START    -
    ffffff097acaacd0                     WRITE VDEV_IO_START    -
     ffffff097a633678                    WRITE VDEV_IO_START    -
      ffffff088ebdd048                   WRITE VDEV_IO_START    -
     ffffff097a381008                    WRITE VDEV_IO_START    -
      ffffff097a361360                   WRITE VDEV_IO_START    -
       ffffff097a31e380                  WRITE VDEV_IO_START    -
      ffffff097ac436a0                   WRITE VDEV_IO_START    -
       ffffff08eead7350                  WRITE VDEV_IO_START    -
      ffffff08f42cf350                   WRITE VDEV_IO_START    -
       ffffff08a46fd338                  WRITE VDEV_IO_START    -
      ffffff097a5d9cd8                   WRITE VDEV_IO_START    -
       ffffff097a867ca8                  WRITE VDEV_IO_START    -
        ffffff097a42f370                 WRITE VDEV_IO_START    -
       ffffff08ecf6e320                  WRITE VDEV_IO_START    -
        ffffff07820aa980                 WRITE VDEV_IO_START    -
       ffffff0909da9cc8                  WRITE VDEV_IO_START    -
        ffffff097ab46648                 WRITE VDEV_IO_START    -



> ffffff097a1ca020::print -t struct zio
struct zio {
    zbookmark_t io_bookmark = {
        uint64_t zb_objset = 0
        uint64_t zb_object = 0
        int64_t zb_level = 0
        uint64_t zb_blkid = 0
    }
    zio_prop_t io_prop = {
        enum zio_checksum zp_checksum = 0 (ZIO_CHECKSUM_INHERIT)
        enum zio_compress zp_compress = 0 (ZIO_COMPRESS_INHERIT)
        dmu_object_type_t zp_type = 0 (DMU_OT_NONE)
        uint8_t zp_level = 0
        uint8_t zp_copies = 0
        uint8_t zp_dedup = 0
        uint8_t zp_dedup_verify = 0
    }
    zio_type_t io_type = 0 (ZIO_TYPE_NULL)
    enum zio_child io_child_type = 3 (ZIO_CHILD_LOGICAL)
    int io_cmd = 0
    uint8_t io_priority = 0
    uint8_t io_reexecute = 0
    uint8_t [2] io_state = [ 0x1, 0 ]
    uint64_t io_txg = 0
    spa_t *io_spa = 0xffffff06fc5bdb00
    blkptr_t *io_bp = 0
    blkptr_t *io_bp_override = 0
    blkptr_t io_bp_copy = {
        dva_t [3] blk_dva = [
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
        ]
        uint64_t blk_prop = 0
        uint64_t [2] blk_pad = [ 0, 0 ]
        uint64_t blk_phys_birth = 0
        uint64_t blk_birth = 0
        uint64_t blk_fill = 0
        zio_cksum_t blk_cksum = {
            uint64_t [4] zc_word = [ 0, 0, 0, 0 ]
        }
    }
    list_t io_parent_list = {
        size_t list_size = 0x30
        size_t list_offset = 0x10
        struct list_node list_head = {
            struct list_node *list_next = 0xffffff097a1ca110
            struct list_node *list_prev = 0xffffff097a1ca110
        }
    }
    list_t io_child_list = {
        size_t list_size = 0x30
        size_t list_offset = 0x20
        struct list_node list_head = {
            struct list_node *list_next = 0xffffff097a3f0d18
            struct list_node *list_prev = 0xffffff097a3f0d18
        }
    }
    zio_link_t *io_walk_link = 0
    zio_t *io_logical = 0
    zio_transform_t *io_transform_stack = 0
    zio_done_func_t *io_ready = 0
    zio_done_func_t *io_done = 0
    void *io_private = 0
    int64_t io_prev_space_delta = 0
    blkptr_t io_bp_orig = {
        dva_t [3] blk_dva = [
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            },
        ]
        uint64_t blk_prop = 0
        uint64_t [2] blk_pad = [ 0, 0 ]
        uint64_t blk_phys_birth = 0
        uint64_t blk_birth = 0
        uint64_t blk_fill = 0
        zio_cksum_t blk_cksum = {
            uint64_t [4] zc_word = [ 0, 0, 0, 0 ]
        }
    }
    void *io_data = 0
    void *io_orig_data = 0
    uint64_t io_size = 0
    uint64_t io_orig_size = 0
    vdev_t *io_vd = 0
    void *io_vsd = 0
    const zio_vsd_ops_t *io_vsd_ops = 0
    uint64_t io_offset = 0
    uint64_t io_deadline = 0
    avl_node_t io_offset_node = {
        struct avl_node *[2] avl_child = [ 0, 0 ]
        uintptr_t avl_pcb = 0
    }
    avl_node_t io_deadline_node = {
        struct avl_node *[2] avl_child = [ 0, 0 ]
        uintptr_t avl_pcb = 0
    }
    avl_tree_t *io_vdev_tree = 0
    enum zio_flag io_flags = 0 (0)
    enum zio_stage io_stage = 0x80000 (ZIO_STAGE_CHECKSUM_VERIFY)
    enum zio_stage io_pipeline = 0x108000 (ZIO_STAGE_{READY|DONE})
    enum zio_flag io_orig_flags = 0 (0)
    enum zio_stage io_orig_stage = 0x1 (ZIO_STAGE_OPEN)
    enum zio_stage io_orig_pipeline = 0x108000 (ZIO_STAGE_{READY|DONE})
    int io_error = 0
    int [4] io_child_error = [ 0, 0, 0, 0 ]
    unsigned long [4][2] io_children = [
        unsigned long [2] [ 0, 0 ]
        unsigned long [2] [ 0, 0 ]
        unsigned long [2] [ 0, 0 ]
        unsigned long [2] [ 0, 0x1 ]
    ]
    uint64_t io_child_count = 0x1
    uint64_t io_parent_count = 0
    uint64_t *io_stall = 0xffffff097a1ca2e0
    zio_t *io_gang_leader = 0
    zio_gang_node_t *io_gang_tree = 0
    void *io_executor = 0xffffff0030090c40
    void *io_waiter = 0xffffff0030090c40
    kmutex_t io_lock = {
        void *[1] _opaque = [ 0 ]
    }
    kcondvar_t io_cv = {
        ushort_t _opaque = 0x1
    }
    zio_cksum_report_t *io_cksum_report = 0
    uint64_t io_ena = 0
}





> ffffff097a1ca020::print -t struct zio io_waiter
void *io_waiter = 0xffffff0030090c40



> ::walk thread ! grep ffffff0030090c40
ffffff0030090c40



> ffffff0030090c40::findstack
stack pointer for thread ffffff0030090c40: ffffff00300909b0
[ ffffff00300909b0 _resume_from_idle+0xf1() ]
  ffffff00300909e0 swtch+0x145()
  ffffff0030090a10 cv_wait+0x61()
  ffffff0030090a50 zio_wait+0x5d()
  ffffff0030090ad0 dsl_pool_sync+0x2bc()
  ffffff0030090b80 spa_sync+0x38d()
  ffffff0030090c20 txg_sync_thread+0x247()
  ffffff0030090c30 thread_start+8()



iostat -xen 1 look similar to this:

    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0
0   0 c7t11d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0
0   0 c7t12d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0    0.0   0 100   0   0
0   0 c7t13d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0
0   0 c7t14d0

No, I can't provide core dump.


On Thu, May 26, 2011 at 4:18 PM, Garrett D'Amore <garrett at damore.org> wrote:
> Please supply your zpool status -v  so that we can see your pools.
>
>  -- Garrett D'Amore
>
> On May 26, 2011, at 5:41 PM, Alasdair Lumsden <alasdairrr at gmail.com> wrote:
>
>> Hi All,
>>
>> Twice in the past 2 weeks we've suffered a drive failure which caused an entire storage node to lock up not responding to IO, with iostat showing a 100% busy time against a single disk whilst the others sit idle. The only resolution was to yank the drive out.
>>
>> These were two completely different machines as well, one a pair of Dell R710s attached to LSI SAS 6Gbps disk shelves via an LSI 9200-8e card using the mpt_sas driver, with 36 Seagate Constellation ES SAS disks. The other machine is a custom build with a Supermicro motherboard, LSI 3801E-R cards using the mpt driver, and 48 Western Digital SATA drives.
>>
>> So this is two different machines, different RAID cards, different drivers, different disks, exhibiting exactly the same failure mode.
>>
>> On the storage array this happened on today, I had already adjusted the sd timeout to 7 seconds, with 3 retries, using:
>>
>> set sd:sd_io_time=7 (/etc/system)
>> sd-config-list = "ATA     WDC WD7501AALS-0", "retries-timeout:3"; (/kernel/drv/sd.conf)
>>
>> So in theory, when a disk stalls, it should get removed by sd after 21 seconds. It has been over 30 mins now whilst the machine sits there attempting to write to the pool.
>>
>> The good news is, that this SAN wasn't in production and has nothing on it (yet). I need to return it to service within the next 48 hours, but in the mean time this is an ideal opportunity for one of the Illumos kernel developers to get on the box and do some diagnosing.
>>
>> This is one of the biggest and most serious issues with using ZFS in SAN/NAS environments that I've seen - that when a drive fails, it doesn't get taken out of service, and I've seen it quite a few times before.
>>
>> I'm hoping that now it can be reproduced, the devs can nail this once and for all. Please contact me off-list and I'll provide SSH access details to get on it.
>>
>> But this disk may fail completely soon, so please act quickly otherwise the window of opportunity may be lost.
>>
>> Cheers,
>>
>> Alasdair
>>
>>
>>
>>
>>
>> _______________________________________________
>> Developer mailing list
>> Developer at lists.illumos.org
>> http://lists.illumos.org/m/listinfo/developer
>
> _______________________________________________
> Developer mailing list
> Developer at lists.illumos.org
> http://lists.illumos.org/m/listinfo/developer
>



-- 
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com



More information about the Developer mailing list