Log report
In node 1:
Jan 25 11:25:33 wsguardian1 kernel: igb: eth4 NIC Link is Down
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: PingAck did not arrive in time.
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: asender terminated
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Terminating asender thread
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: short read expecting header on sock: r=-512
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Creating new current UUID
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Connection closed
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
Jan 25 11:25:39 wsguardian1 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 255 (0xff00)
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: fence-peer helper broken, returned 255
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: receiver terminated
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Restarting receiver thread
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: receiver (re)started
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd0: conn( Unconnected -> WFConnection )
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: PingAck did not arrive in time.
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: asender terminated
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Terminating asender thread
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: short read expecting header on sock: r=-512
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Creating new current UUID
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Connection closed
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1
Jan 25 11:25:39 wsguardian1 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 255 (0xff00)
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: fence-peer helper broken, returned 255
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: conn( NetworkFailure -> Unconnected )
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: receiver terminated
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Restarting receiver thread
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: receiver (re)started
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:39 wsguardian1 kernel: block drbd1: conn( Unconnected -> WFConnection )
Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] The token was lost in the OPERATIONAL state.
Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jan 25 11:25:42 wsguardian1 openais[2880]: [TOTEM] entering GATHER state from 2.
Log report
In node 2
Jan 25 11:25:34 wsguardian2 kernel: igb: eth4 NIC Link is Down
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: PingAck did not arrive in time.
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: asender terminated
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Terminating asender thread
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: short read expecting header on sock: r=-512
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Creating new current UUID
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: Connection closed
Jan 25 11:25:39 wsguardian2 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
Jan 25 11:25:39 wsguardian2 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: PingAck did not arrive in time.
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: asender terminated
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Terminating asender thread
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: short read expecting header on sock: r=-512
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Creating new current UUID
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: Connection closed
Jan 25 11:25:40 wsguardian2 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1
Jan 25 11:25:40 wsguardian2 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] The token was lost in the OPERATIONAL state.
Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jan 25 11:25:43 wsguardian2 openais[2854]: [TOTEM] entering GATHER state from 2.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering GATHER state from 0.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Creating commit token because I am the rep.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Storing new sequence id for ring 128
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering COMMIT state.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering RECOVERY state.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] position [0] member 192.168.253.2:
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] previous ring seq 292 rep 192.168.253.1
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] aru bb high delivered bb received flag 1
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Did not need to originate any messages in recovery.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] Sending initial ORF token
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] CLM CONFIGURATION CHANGE
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] New Configuration:
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.2)
Jan 25 11:25:45 wsguardian2 kernel: dlm: closing connection to node 1
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Left:
Jan 25 11:25:45 wsguardian2 fenced[2875]: wsguardian1 not a cluster member after 0 sec post_fail_delay
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.1)
Jan 25 11:25:45 wsguardian2 fenced[2875]: fencing node "wsguardian1"
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Joined:
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] CLM CONFIGURATION CHANGE
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] New Configuration:
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] r(0) ip(192.168.253.2)
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Left:
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] Members Joined:
Jan 25 11:25:45 wsguardian2 openais[2854]: [SYNC ] This node is within the primary component and will provide service.
Jan 25 11:25:45 wsguardian2 openais[2854]: [TOTEM] entering OPERATIONAL state.
Jan 25 11:25:45 wsguardian2 openais[2854]: [CLM ] got nodejoin message 192.168.253.2
Jan 25 11:25:45 wsguardian2 openais[2854]: [CPG ] got joinlist message from node 2
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 255 (0xff00)
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: fence-peer helper broken, returned 255
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: receiver terminated
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Restarting receiver thread
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: receiver (re)started
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:48 wsguardian2 kernel: block drbd0: conn( Unconnected -> WFConnection )
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 255 (0xff00)
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: fence-peer helper broken, returned 255
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: conn( NetworkFailure -> Unconnected )
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: receiver terminated
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Restarting receiver thread
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: receiver (re)started
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- }
Jan 25 11:25:49 wsguardian2 kernel: block drbd1: conn( Unconnected -> WFConnection )
After reboot (fencing) drbd report on node1
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s----
ns:0 nr:0 dw:0 dr:0 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:16384
1: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s----
ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:167936
After manual startup drbd report on node2
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s----
ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:20480
1: cs:StandAlone ro:Secondary/Unknown ds:Consistent/DUnknown s----
ns:0 nr:0 dw:0 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12288