Failover tests
We used Bonnie++ to generate some traffic on the fibre channel. The following will create a 10GB temporary file.
bonnie -s 10000 -d /mount/pointThis gives you enough time to run a couple of failover tests. Once bonnie is writing and reading busily you can start an
iostat 1in another terminal to find out which path(s) is currently active and used. Another window could run a
tail -f /var/log/messagesThen, plug-out (one of ) the active path. A bunch of SCSI-errors will be logged to syslog and iostat should show show the traffic dropping on the failed device. After a couple of seconds, traffic should rebound on the functioning device. Note that in the example below, device sda is missing from all multipath maps. This is due to a wrongly configured blacklist.
Syslog when path 1 fails
First path is unplugged 12:57:29 host kernel: qla2300 0000:07:09.0: LOOP DOWN detected (4). 12:57:30 host kernel: SCSI error : <1 0 1 0> return code = 0x10000 12:57:30 host kernel: end_request: I/O error, dev sdi, sector 69439 12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#69447): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 Failing first device 12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:128 12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#69439): Requeued sector as #1 12:57:30 host multipathd: 1HITACHI_D60H58500479: event checker started 12:57:30 host kernel: SCSI error : <1 0 0 2> return code = 0x10000 12:57:30 host kernel: end_request: I/O error, dev sdh, sector 1744 12:57:30 host kernel: device-mapper: dm-multipath: 8:112 (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 Failing second device 12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:112 12:57:30 host kernel: device-mapper: dm-multipath: 8:112 (#1744): Requeued sector as #1 12:57:30 host kernel: SCSI error : <1 0 0 0> return code = 0x10000 12:57:30 host kernel: end_request: I/O error, dev sdf, sector 1744 12:57:30 host kernel: device-mapper: dm-multipath: 8:80 (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 Failing third device 12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:80 12:57:30 host kernel: device-mapper: dm-multipath: 8:80 (#1744): Requeued sector as #1 12:57:30 host kernel: SCSI error : <1 0 1 0> return code = 0x10000 12:57:30 host kernel: end_request: I/O error, dev sdi, sector 1775 12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#1775): Requeued sector as #2 12:57:30 host kernel: SCSI error : <1 0 1 1> return code = 0x10000 12:57:30 host kernel: end_request: I/O error, dev sdj, sector 1775 12:57:30 host kernel: device-mapper: dm-multipath: 8:144 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 Failing fourth device 12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:144 12:57:30 host kernel: device-mapper: dm-multipath: 8:144 (#1775): Requeued sector as #1 12:57:30 host kernel: device-mapper: dm-multipath: NULL (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 10 - bi_error: 01000005 12:57:30 host kernel: device-mapper: dm-multipath: NULL (#1746): no valid paths left, failing IO 12:57:30 host multipathd: 1HITACHI_D60H58500424: event checker started 12:57:30 host multipathd: 360060e80042ae00000002ae000000443: event checker started 12:57:30 host multipathd: 360060e80042ae00000002ae000000442: event checker started 12:57:30 host multipathd: path checkers start up 12:57:30 host multipathd: 8:128: mark as failed 12:57:30 host multipathd: 8:112: mark as failed 12:57:30 host multipathd: 8:144: mark as failed 12:57:30 host multipathd: 8:80: mark as failed 12:57:54 host kernel: SCSI error : <1 0 0 0> return code = 0x10000 12:57:54 host multipathd: 1HITACHI_D60H58500424: switch to path group #1 12:57:54 host multipathd: 8:96: tur checker reports path is down 12:57:54 host multipathd: checker failed path 8:96 in map 1HITACHI_D60H58500434 Failing fifth device 12:57:54 host kernel: device-mapper: dm-multipath: Failing path 8:96 12:57:54 host kernel: SCSI error : <1 0 0 2> return code = 0x10000
Multipath map after path 1 failed
dm names N dm table 360060e80042ae00000002ae000000442p1 N dm table 360060e80042ae00000002ae000000443p1 N dm table 1HITACHI_D60H58500434 N dm table 1HITACHI_D60H58500434 N dm status 1HITACHI_D60H58500434 N dm info 1HITACHI_D60H58500434 O dm table system-lv_tmp N dm table 1HITACHI_D60H58500479 N dm table 1HITACHI_D60H58500479 N dm status 1HITACHI_D60H58500479 N dm info 1HITACHI_D60H58500479 O dm table 1HITACHI_D60H58500434p1 N dm table system-lv_usr N dm table system-lv_var N dm table system-lv_home N dm table 1HITACHI_D60H58500424p1 N dm table 1HITACHI_D60H58500479p1 N dm table 1HITACHI_D60H58500424 N dm table 1HITACHI_D60H58500424 N dm status 1HITACHI_D60H58500424 N dm info 1HITACHI_D60H58500424 O dm table 360060e80042ae00000002ae000000443 N dm table 360060e80042ae00000002ae000000443 N dm status 360060e80042ae00000002ae000000443 N dm info 360060e80042ae00000002ae000000443 O dm table 360060e80042ae00000002ae000000442 N dm table 360060e80042ae00000002ae000000442 N dm status 360060e80042ae00000002ae000000442 N dm info 360060e80042ae00000002ae000000442 O dm table system-lv_opt N 1HITACHI_D60H58500434 [size=6 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 0:0:0:1 sdb 8:16 [failed][faulty] \_ round-robin 0 [active] \_ 1:0:0:1 sdg 8:96 [active][ready] 1HITACHI_D60H58500479 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 1:0:0:2 sdh 8:112 [active][ready] \_ round-robin 0 [enabled] \_ 0:0:0:2 sdc 8:32 [active][ready] 1HITACHI_D60H58500424 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 1:0:0:0 sdf 8:80 [active][ready] 360060e80042ae00000002ae000000443 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:1:1 sde 8:64 [active][ready] \_ 1:0:1:1 sdj 8:144 [active][ready] 360060e80042ae00000002ae000000442 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:1:0 sdd 8:48 [failed][faulty] \_ 1:0:1:0 sdi 8:128 [active][ready]Depending on how you configured multipathd, the device should automaticllay failback once plugged-in again. This is governed by the
polling_interval parameter in /etc/multipath.conf.
Syslog when path 1 is restored
Path plugged-in again 12:59:47 host kernel: qla2300 0000:07:09.0: LOOP UP detected (2 Gbps). 12:59:55 host multipathd: 8:80: tur checker reports path is up 12:59:55 host multipathd: 8:80: reinstated 12:59:55 host multipathd: 1HITACHI_D60H58500424: switch to path group #1 12:59:55 host multipathd: 1HITACHI_D60H58500424: switch to path group #1 12:59:55 host multipathd: 8:96: tur checker reports path is up 12:59:55 host multipathd: 8:96: reinstated 12:59:56 host multipathd: 8:112: tur checker reports path is up 12:59:56 host multipathd: 8:112: reinstated 12:59:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1 12:59:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1 12:59:56 host multipathd: 8:128: tur checker reports path is up 12:59:56 host multipathd: 8:128: reinstated 12:59:56 host multipathd: 8:144: tur checker reports path is up 12:59:56 host multipathd: 8:144: reinstated
Syslog when path 2 fails
Second path is unplugged 13:01:07 host kernel: qla2300 0000:05:07.0: LOOP DOWN detected (4). 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 167 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#175): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 Failing device 13:01:09 host kernel: device-mapper: dm-multipath: Failing path 8:48 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#167): Requeued sector as #1 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10233415 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10233423): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10233415): Requeued sector as #2 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 32929615 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#32929623): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#32929615): Requeued sector as #3 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 36642879 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#36642887): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#36642879): Requeued sector as #4 13:01:09 host kernel: SCSI error : <0 0 0 1> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdb, sector 1734 13:01:09 host kernel: device-mapper: dm-multipath: 8:16 (#1736): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 Failing device 13:01:09 host kernel: device-mapper: dm-multipath: Failing path 8:16 13:01:09 host kernel: device-mapper: dm-multipath: 8:16 (#1734): Requeued sector as #1 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 1775 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#1775): Requeued sector as #5 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 8013775 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013783): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013775): Requeued sector as #6 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 8013783 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013791): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013783): Requeued sector as #7 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 9046111 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046119): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046111): Requeued sector as #8 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 9046119 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046127): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046119): Requeued sector as #9 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10046159 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046167): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046159): Requeued sector as #10 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10046167 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046175): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046167): Requeued sector as #11 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 Failing device 13:01:09 host multipathd: 8:48: mark as failed 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 11109615 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#11109623): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005 13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#11109615): Requeued sector as #12 Failing device 13:01:09 host multipathd: 8:16: mark as failed 13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000 13:01:09 host kernel: end_request: I/O error, dev sdd, sector 11109623
Multipath map after path 2 failed
dm names N dm table 360060e80042ae00000002ae000000442p1 N dm table 360060e80042ae00000002ae000000443p1 N dm table 1HITACHI_D60H58500434 N dm table 1HITACHI_D60H58500434 N dm status 1HITACHI_D60H58500434 N dm info 1HITACHI_D60H58500434 O dm table system-lv_tmp N dm table 1HITACHI_D60H58500479 N dm table 1HITACHI_D60H58500479 N dm status 1HITACHI_D60H58500479 N dm info 1HITACHI_D60H58500479 O dm table 1HITACHI_D60H58500434p1 N dm table system-lv_usr N dm table system-lv_var N dm table system-lv_home N dm table 1HITACHI_D60H58500424p1 N dm table 1HITACHI_D60H58500479p1 N dm table 1HITACHI_D60H58500424 N dm table 1HITACHI_D60H58500424 N dm status 1HITACHI_D60H58500424 N dm info 1HITACHI_D60H58500424 O dm table 360060e80042ae00000002ae000000443 N dm table 360060e80042ae00000002ae000000443 N dm status 360060e80042ae00000002ae000000443 N dm info 360060e80042ae00000002ae000000443 O dm table 360060e80042ae00000002ae000000442 N dm table 360060e80042ae00000002ae000000442 N dm status 360060e80042ae00000002ae000000442 N dm info 360060e80042ae00000002ae000000442 O dm table system-lv_opt N 1HITACHI_D60H58500434 [size=6 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:0:1 sdb 8:16 [active][ready] \_ round-robin 0 [enabled] \_ 1:0:0:1 sdg 8:96 [failed][faulty] 1HITACHI_D60H58500479 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdh 8:112 [failed][faulty] \_ round-robin 0 [active] \_ 0:0:0:2 sdc 8:32 [active][ready] 1HITACHI_D60H58500424 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [enabled] \_ 1:0:0:0 sdf 8:80 [failed][faulty] sda is missing due to a wrong blacklist configuration 360060e80042ae00000002ae000000443 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:1:1 sde 8:64 [active][ready] \_ 1:0:1:1 sdj 8:144 [failed][faulty] 360060e80042ae00000002ae000000442 [size=27 GB][features="0"][hwhandler="0"] \_ round-robin 0 [active] \_ 0:0:1:0 sdd 8:48 [active][ready] \_ 1:0:1:0 sdi 8:128 [failed][faulty]
Syslog when path 2 is restored
13:03:39 host kernel: qla2300 0000:05:07.0: LOOP UP detected (2 Gbps). 13:05:11 host multipathd: 8:16: tur checker reports path is up 13:05:11 host multipathd: 8:16: reinstated 13:05:11 host multipathd: 1HITACHI_D60H58500434: switch to path group #1 13:05:11 host multipathd: 1HITACHI_D60H58500434: switch to path group #1 13:05:11 host multipathd: 8:48: tur checker reports path is up 13:05:11 host multipathd: 8:48: reinstated 13:05:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1 13:05:58 host multipathd: 1HITACHI_D60H58500424: switch to path group #1 13:05:58 host multipathd: 1HITACHI_D60H58500434: switch to path group #1 13:05:58 host multipathd: 1HITACHI_D60H58500479: switch to path group #1 13:07:13 host multipathd: 1HITACHI_D60H58500434: switch to path group #1