Log report In node 1: Jan 25 11:25:33 wsguardian1 kernel: igb: eth4 NIC Link is Down Jan 25 11:25:39 wsguardian1 kernel: block drbd0: PingAck did not arrive in time. Jan 25 11:25:39 wsguardian1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Jan 25 11:25:39 wsguardian1 kernel: block drbd0: asender terminated Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Terminating asender thread Jan 25 11:25:39 wsguardian1 kernel: block drbd0: short read expecting header on sock: r=-512 Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Creating new current UUID Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Connection closed Jan 25 11:25:39 wsguardian1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Jan 25 11:25:39 wsguardian1 rhcs_fence: Attempting to fence peer using RHCS from DRBD... Jan 25 11:25:39 wsguardian1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 255 (0xff00) Jan 25 11:25:39 wsguardian1 kernel: block drbd0: fence-peer helper broken, returned 255 Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:39 wsguardian1 kernel: block drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Jan 25 11:25:39 wsguardian1 kernel: block drbd0: receiver terminated Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Restarting receiver thread Jan 25 11:25:39 wsguardian1 kernel: block drbd0: receiver (re)started Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:39 wsguardian1 kernel: block drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jan 25 11:25:39 wsguardian1 kernel: block drbd1: PingAck did not arrive in time. Jan 25 11:25:39 wsguardian1 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Jan 25 11:25:39 wsguardian1 kernel: block drbd1: asender terminated Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Terminating asender thread Jan 25 11:25:39 wsguardian1 kernel: block drbd1: short read expecting header on sock: r=-512 Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Creating new current UUID Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Connection closed Jan 25 11:25:39 wsguardian1 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 Jan 25 11:25:39 wsguardian1 rhcs_fence: Attempting to fence peer using RHCS from DRBD... Jan 25 11:25:39 wsguardian1 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 255 (0xff00) Jan 25 11:25:39 wsguardian1 kernel: block drbd1: fence-peer helper broken, returned 255 Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:39 wsguardian1 kernel: block drbd1: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd1: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd1: conn( NetworkFailure -> Unconnected ) Jan 25 11:25:39 wsguardian1 kernel: block drbd1: receiver terminated Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Restarting receiver thread Jan 25 11:25:39 wsguardian1 kernel: block drbd1: receiver (re)started Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:39 wsguardian1 kernel: block drbd1: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd1: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:39 wsguardian1 kernel: block drbd1: conn( Unconnected -> WFConnection ) Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] The token was lost in the OPERATIONAL state. Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] entering GATHER state from 2. Log report In node 2 Jan 25 11:25:34 wsguardian2 kernel: igb: eth4 NIC Link is Down Jan 25 11:25:39 wsguardian2 kernel: block drbd0: PingAck did not arrive in time. Jan 25 11:25:39 wsguardian2 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Jan 25 11:25:39 wsguardian2 kernel: block drbd0: asender terminated Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Terminating asender thread Jan 25 11:25:39 wsguardian2 kernel: block drbd0: short read expecting header on sock: r=-512 Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Creating new current UUID Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Connection closed Jan 25 11:25:39 wsguardian2 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Jan 25 11:25:39 wsguardian2 rhcs_fence: Attempting to fence peer using RHCS from DRBD... Jan 25 11:25:40 wsguardian2 kernel: block drbd1: PingAck did not arrive in time. Jan 25 11:25:40 wsguardian2 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Jan 25 11:25:40 wsguardian2 kernel: block drbd1: asender terminated Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Terminating asender thread Jan 25 11:25:40 wsguardian2 kernel: block drbd1: short read expecting header on sock: r=-512 Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Creating new current UUID Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Connection closed Jan 25 11:25:40 wsguardian2 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 Jan 25 11:25:40 wsguardian2 rhcs_fence: Attempting to fence peer using RHCS from DRBD... Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] The token was lost in the OPERATIONAL state. Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] entering GATHER state from 2. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering GATHER state from 0. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Creating commit token because I am the rep. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Storing new sequence id for ring 128 Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering COMMIT state. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering RECOVERY state. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] position [0] member 192.168.253.2: Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] previous ring seq 292 rep 192.168.253.1 Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] aru bb high delivered bb received flag 1 Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Did not need to originate any messages in recovery. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Sending initial ORF token Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] CLM CONFIGURATION CHANGE Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] New Configuration: Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.2) Jan 25 11:25:45 wsguardian2 kernel: dlm: closing connection to node 1 Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Left: Jan 25 11:25:45 wsguardian2 fenced[2875]: wsguardian1 not a cluster member after 0 sec post_fail_delay Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.1) Jan 25 11:25:45 wsguardian2 fenced[2875]: fencing node "wsguardian1" Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Joined: Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] CLM CONFIGURATION CHANGE Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] New Configuration: Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.2) Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Left: Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Joined: Jan 25 11:25:45 wsguardian2 openais[2854]: [SYNC ] This node is within the primary component and will provide service. Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering OPERATIONAL state. Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] got nodejoin message 192.168.253.2 Jan 25 11:25:45 wsguardian2 openais[2854]: [CPG ] got joinlist message from node 2 Jan 25 11:25:48 wsguardian2 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 255 (0xff00) Jan 25 11:25:48 wsguardian2 kernel: block drbd0: fence-peer helper broken, returned 255 Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:48 wsguardian2 kernel: block drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:48 wsguardian2 kernel: block drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:48 wsguardian2 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Jan 25 11:25:48 wsguardian2 kernel: block drbd0: receiver terminated Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Restarting receiver thread Jan 25 11:25:48 wsguardian2 kernel: block drbd0: receiver (re)started Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:48 wsguardian2 kernel: block drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:48 wsguardian2 kernel: block drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:48 wsguardian2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jan 25 11:25:49 wsguardian2 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 255 (0xff00) Jan 25 11:25:49 wsguardian2 kernel: block drbd1: fence-peer helper broken, returned 255 Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:49 wsguardian2 kernel: block drbd1: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:49 wsguardian2 kernel: block drbd1: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:49 wsguardian2 kernel: block drbd1: conn( NetworkFailure -> Unconnected ) Jan 25 11:25:49 wsguardian2 kernel: block drbd1: receiver terminated Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Restarting receiver thread Jan 25 11:25:49 wsguardian2 kernel: block drbd1: receiver (re)started Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Jan 25 11:25:49 wsguardian2 kernel: block drbd1: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:49 wsguardian2 kernel: block drbd1: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } Jan 25 11:25:49 wsguardian2 kernel: block drbd1: conn( Unconnected -> WFConnection ) After reboot (fencing) drbd report on node1 cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09 0: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s---- ns:0 nr:0 dw:0 dr:0 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:16384 1: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s---- ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:167936 After manual startup drbd report on node2 cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09 0: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s---- ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:20480 1: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s---- ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12288