Failover tests

We used Bonnie++ to generate some traffic on the fibre channel. The following will create a 10GB temporary file.
bonnie -s 10000 -d /mount/point
This gives you enough time to run a couple of failover tests. Once bonnie is writing and reading busily you can start an
iostat 1
in another terminal to find out which path(s) is currently active and used. Another window could run a
tail -f /var/log/messages
Then, plug-out (one of ) the active path. A bunch of SCSI-errors will be logged to syslog and iostat should show show the traffic dropping on the failed device. After a couple of seconds, traffic should rebound on the functioning device. Note that in the example below, device sda is missing from all multipath maps. This is due to a wrongly configured blacklist.

Syslog when path 1 fails

 First path is unplugged
12:57:29 host kernel: qla2300 0000:07:09.0: LOOP DOWN detected (4).
12:57:30 host kernel: SCSI error : <1 0 1 0> return code = 0x10000
12:57:30 host kernel: end_request: I/O error, dev sdi, sector 69439
12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#69447): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
 Failing first device
12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:128
12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#69439): Requeued sector as #1
12:57:30 host multipathd: 1HITACHI_D60H58500479: event checker started
12:57:30 host kernel: SCSI error : <1 0 0 2> return code = 0x10000
12:57:30 host kernel: end_request: I/O error, dev sdh, sector 1744
12:57:30 host kernel: device-mapper: dm-multipath: 8:112 (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
 Failing second device
12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:112
12:57:30 host kernel: device-mapper: dm-multipath: 8:112 (#1744): Requeued sector as #1
12:57:30 host kernel: SCSI error : <1 0 0 0> return code = 0x10000
12:57:30 host kernel: end_request: I/O error, dev sdf, sector 1744
12:57:30 host kernel: device-mapper: dm-multipath: 8:80 (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
 Failing third device
12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:80
12:57:30 host kernel: device-mapper: dm-multipath: 8:80 (#1744): Requeued sector as #1
12:57:30 host kernel: SCSI error : <1 0 1 0> return code = 0x10000
12:57:30 host kernel: end_request: I/O error, dev sdi, sector 1775
12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
12:57:30 host kernel: device-mapper: dm-multipath: 8:128 (#1775): Requeued sector as #2
12:57:30 host kernel: SCSI error : <1 0 1 1> return code = 0x10000
12:57:30 host kernel: end_request: I/O error, dev sdj, sector 1775
12:57:30 host kernel: device-mapper: dm-multipath: 8:144 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
 Failing fourth device
12:57:30 host kernel: device-mapper: dm-multipath: Failing path 8:144
12:57:30 host kernel: device-mapper: dm-multipath: 8:144 (#1775): Requeued sector as #1
12:57:30 host kernel: device-mapper: dm-multipath: NULL (#1746): IO error - error: -5 - bi_rw: 8 - bi_flags: 10 - bi_error: 01000005
12:57:30 host kernel: device-mapper: dm-multipath: NULL (#1746): no valid paths left, failing IO
12:57:30 host multipathd: 1HITACHI_D60H58500424: event checker started
12:57:30 host multipathd: 360060e80042ae00000002ae000000443: event checker started
12:57:30 host multipathd: 360060e80042ae00000002ae000000442: event checker started
12:57:30 host multipathd: path checkers start up
12:57:30 host multipathd: 8:128: mark as failed
12:57:30 host multipathd: 8:112: mark as failed
12:57:30 host multipathd: 8:144: mark as failed
12:57:30 host multipathd: 8:80: mark as failed
12:57:54 host kernel: SCSI error : <1 0 0 0> return code = 0x10000
12:57:54 host multipathd: 1HITACHI_D60H58500424: switch to path group #1
12:57:54 host multipathd: 8:96: tur checker reports path is down
12:57:54 host multipathd: checker failed path 8:96 in map 1HITACHI_D60H58500434
 Failing fifth device
12:57:54 host kernel: device-mapper: dm-multipath: Failing path 8:96
12:57:54 host kernel: SCSI error : <1 0 0 2> return code = 0x10000

Multipath map after path 1 failed

dm names   N
dm table 360060e80042ae00000002ae000000442p1  N
dm table 360060e80042ae00000002ae000000443p1  N
dm table 1HITACHI_D60H58500434  N
dm table 1HITACHI_D60H58500434  N
dm status 1HITACHI_D60H58500434  N
dm info 1HITACHI_D60H58500434  O
dm table system-lv_tmp  N
dm table 1HITACHI_D60H58500479  N
dm table 1HITACHI_D60H58500479  N
dm status 1HITACHI_D60H58500479  N
dm info 1HITACHI_D60H58500479  O
dm table 1HITACHI_D60H58500434p1  N
dm table system-lv_usr  N
dm table system-lv_var  N
dm table system-lv_home  N
dm table 1HITACHI_D60H58500424p1  N
dm table 1HITACHI_D60H58500479p1  N
dm table 1HITACHI_D60H58500424  N
dm table 1HITACHI_D60H58500424  N
dm status 1HITACHI_D60H58500424  N
dm info 1HITACHI_D60H58500424  O
dm table 360060e80042ae00000002ae000000443  N
dm table 360060e80042ae00000002ae000000443  N
dm status 360060e80042ae00000002ae000000443  N
dm info 360060e80042ae00000002ae000000443  O
dm table 360060e80042ae00000002ae000000442  N
dm table 360060e80042ae00000002ae000000442  N
dm status 360060e80042ae00000002ae000000442  N
dm info 360060e80042ae00000002ae000000442  O
dm table system-lv_opt  N
1HITACHI_D60H58500434
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 0:0:0:1 sdb 8:16  [failed][faulty]
\_ round-robin 0 [active]
 \_ 1:0:0:1 sdg 8:96  [active][ready]

1HITACHI_D60H58500479
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 1:0:0:2 sdh 8:112 [active][ready]
\_ round-robin 0 [enabled]
 \_ 0:0:0:2 sdc 8:32  [active][ready]

1HITACHI_D60H58500424
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 1:0:0:0 sdf 8:80  [active][ready]

360060e80042ae00000002ae000000443
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:1 sde 8:64  [active][ready]
 \_ 1:0:1:1 sdj 8:144 [active][ready]

360060e80042ae00000002ae000000442
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sdd 8:48  [failed][faulty]
 \_ 1:0:1:0 sdi 8:128 [active][ready]
Depending on how you configured multipathd, the device should automaticllay failback once plugged-in again. This is governed by the polling_interval parameter in /etc/multipath.conf.

Syslog when path 1 is restored

Path plugged-in again
12:59:47 host kernel: qla2300 0000:07:09.0: LOOP UP detected (2 Gbps).
12:59:55 host multipathd: 8:80: tur checker reports path is up
12:59:55 host multipathd: 8:80: reinstated
12:59:55 host multipathd: 1HITACHI_D60H58500424: switch to path group #1
12:59:55 host multipathd: 1HITACHI_D60H58500424: switch to path group #1
12:59:55 host multipathd: 8:96: tur checker reports path is up
12:59:55 host multipathd: 8:96: reinstated
12:59:56 host multipathd: 8:112: tur checker reports path is up
12:59:56 host multipathd: 8:112: reinstated
12:59:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1
12:59:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1
12:59:56 host multipathd: 8:128: tur checker reports path is up
12:59:56 host multipathd: 8:128: reinstated
12:59:56 host multipathd: 8:144: tur checker reports path is up
12:59:56 host multipathd: 8:144: reinstated

Syslog when path 2 fails

 Second path is unplugged
13:01:07 host kernel: qla2300 0000:05:07.0: LOOP DOWN detected (4).
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 167
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#175): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
 Failing device
13:01:09 host kernel: device-mapper: dm-multipath: Failing path 8:48
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#167): Requeued sector as #1
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10233415
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10233423): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10233415): Requeued sector as #2
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 32929615
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#32929623): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#32929615): Requeued sector as #3
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 36642879
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#36642887): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#36642879): Requeued sector as #4
13:01:09 host kernel: SCSI error : <0 0 0 1> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdb, sector 1734
13:01:09 host kernel: device-mapper: dm-multipath: 8:16 (#1736): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
 Failing device
13:01:09 host kernel: device-mapper: dm-multipath: Failing path 8:16
13:01:09 host kernel: device-mapper: dm-multipath: 8:16 (#1734): Requeued sector as #1
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 1775
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#1777): IO error - error: -5 - bi_rw: 8 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#1775): Requeued sector as #5
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 8013775
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013783): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013775): Requeued sector as #6
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 8013783
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013791): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#8013783): Requeued sector as #7
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 9046111
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046119): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046111): Requeued sector as #8
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 9046119
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046127): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#9046119): Requeued sector as #9
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10046159
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046167): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046159): Requeued sector as #10
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 10046167
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046175): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#10046167): Requeued sector as #11
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
 Failing device
13:01:09 host multipathd: 8:48: mark as failed
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 11109615
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#11109623): IO error - error: -5 - bi_rw: 9 - bi_flags: 18 - bi_error: 01000005
13:01:09 host kernel: device-mapper: dm-multipath: 8:48 (#11109615): Requeued sector as #12
 Failing device
13:01:09 host multipathd: 8:16: mark as failed
13:01:09 host kernel: SCSI error : <0 0 1 0> return code = 0x10000
13:01:09 host kernel: end_request: I/O error, dev sdd, sector 11109623

Multipath map after path 2 failed

dm names   N
dm table 360060e80042ae00000002ae000000442p1  N
dm table 360060e80042ae00000002ae000000443p1  N
dm table 1HITACHI_D60H58500434  N
dm table 1HITACHI_D60H58500434  N
dm status 1HITACHI_D60H58500434  N
dm info 1HITACHI_D60H58500434  O
dm table system-lv_tmp  N
dm table 1HITACHI_D60H58500479  N
dm table 1HITACHI_D60H58500479  N
dm status 1HITACHI_D60H58500479  N
dm info 1HITACHI_D60H58500479  O
dm table 1HITACHI_D60H58500434p1  N
dm table system-lv_usr  N
dm table system-lv_var  N
dm table system-lv_home  N
dm table 1HITACHI_D60H58500424p1  N
dm table 1HITACHI_D60H58500479p1  N
dm table 1HITACHI_D60H58500424  N
dm table 1HITACHI_D60H58500424  N
dm status 1HITACHI_D60H58500424  N
dm info 1HITACHI_D60H58500424  O
dm table 360060e80042ae00000002ae000000443  N
dm table 360060e80042ae00000002ae000000443  N
dm status 360060e80042ae00000002ae000000443  N
dm info 360060e80042ae00000002ae000000443  O
dm table 360060e80042ae00000002ae000000442  N
dm table 360060e80042ae00000002ae000000442  N
dm status 360060e80042ae00000002ae000000442  N
dm info 360060e80042ae00000002ae000000442  O
dm table system-lv_opt  N
1HITACHI_D60H58500434
[size=6 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sdb 8:16  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdg 8:96  [failed][faulty]

1HITACHI_D60H58500479
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 1:0:0:2 sdh 8:112 [failed][faulty]
\_ round-robin 0 [active]
 \_ 0:0:0:2 sdc 8:32  [active][ready]

1HITACHI_D60H58500424
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdf 8:80  [failed][faulty]
sda is missing due to a wrong blacklist configuration

360060e80042ae00000002ae000000443
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:1 sde 8:64  [active][ready]
 \_ 1:0:1:1 sdj 8:144 [failed][faulty]

360060e80042ae00000002ae000000442
[size=27 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:1:0 sdd 8:48  [active][ready]
 \_ 1:0:1:0 sdi 8:128 [failed][faulty]

Syslog when path 2 is restored

13:03:39 host kernel: qla2300 0000:05:07.0: LOOP UP detected (2 Gbps).
13:05:11 host multipathd: 8:16: tur checker reports path is up
13:05:11 host multipathd: 8:16: reinstated
13:05:11 host multipathd: 1HITACHI_D60H58500434: switch to path group #1
13:05:11 host multipathd: 1HITACHI_D60H58500434: switch to path group #1
13:05:11 host multipathd: 8:48: tur checker reports path is up
13:05:11 host multipathd: 8:48: reinstated
13:05:56 host multipathd: 1HITACHI_D60H58500479: switch to path group #1
13:05:58 host multipathd: 1HITACHI_D60H58500424: switch to path group #1
13:05:58 host multipathd: 1HITACHI_D60H58500434: switch to path group #1
13:05:58 host multipathd: 1HITACHI_D60H58500479: switch to path group #1
13:07:13 host multipathd: 1HITACHI_D60H58500434: switch to path group #1