Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- *************************************************************************************************************************
- PHASE 1: VERIFICATION of HOST1
- First verification after "data-integrity-alg" enabled: ALL GOOD
- ...
- Mar 24 17:31:22 host1 kernel: block drbd0: conn( Connected -> VerifyS )
- Mar 24 17:31:22 host1 kernel: block drbd0: Starting Online Verify from sector 0
- Mar 24 19:33:31 host1 kernel: block drbd0: Online verify done (total 7328 sec; paused 0 sec; 119932 K/sec)
- Mar 24 19:33:31 host1 kernel: block drbd0: conn( VerifyS -> Connected )
- Mar 24 19:33:31 host1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
- Mar 24 19:33:31 host1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- Second verification after "data-integrity-alg" enabled: BOTH SERVERS HUNG
- Mar 25 00:42:01 host1 kernel: block drbd0: conn( Connected -> VerifyS )
- Mar 25 00:42:01 host1 kernel: block drbd0: Starting Online Verify from sector 0
- Mar 25 01:07:07 host1 kernel: block drbd0: [drbd0_worker/3644] sock_sendmsg time expired, ko = 4294967295
- Mar 25 01:07:13 host1 kernel: block drbd0: [drbd0_worker/3644] sock_sendmsg time expired, ko = 4294967294
- Mar 25 01:07:19 host1 kernel: block drbd0: [drbd0_worker/3644] sock_sendmsg time expired, ko = 4294967293
- ...
- PHASE 1: VERIFICATION of HOST2
- First verification after "data-integrity-alg" enabled: ALL GOOD
- ...
- Mar 24 17:31:22 host2 kernel: block drbd0: conn( Connected -> VerifyT )
- Mar 24 17:31:22 host2 kernel: block drbd0: Online Verify start sector: 0
- Mar 24 19:33:31 host2 kernel: block drbd0: Online verify done (total 7328 sec; paused 0 sec; 119932 K/sec)
- Mar 24 19:33:31 host2 kernel: block drbd0: conn( VerifyT -> Connected )
- Mar 24 19:33:31 host2 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
- Mar 24 19:33:31 host2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- Second verification after "data-integrity-alg" enabled: BOTH SERVERS HUNG
- Mar 25 00:42:01 host2 kernel: block drbd0: conn( Connected -> VerifyT )
- Mar 25 00:42:01 host2 kernel: block drbd0: Online Verify start sector: 0
- Mar 25 01:06:58 host2 kernel: block drbd0: kvm[172358] Concurrent local write detected! [DISCARD L] new: 989901215s +4096; pending: 989901215s +4096
- Mar 25 01:11:08 host2 kernel: block drbd0: [drbd0_worker/3754] sock_sendmsg time expired, ko = 4294967295
- Mar 25 01:11:14 host2 kernel: block drbd0: [drbd0_worker/3754] sock_sendmsg time expired, ko = 4294967294
- Mar 25 01:11:20 host2 kernel: block drbd0: [drbd0_worker/3754] sock_sendmsg time expired, ko = 4294967293
- ...
- *************************************************************************************************************************
- PHASE 2: HOST1
- Resync after host2 rebooted: everything seems okay
- Mar 25 09:01:00 host1 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
- Mar 25 09:01:00 host1 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
- Mar 25 09:01:00 host1 kernel: block drbd0: conn( WFConnection -> WFReportParams )
- Mar 25 09:01:00 host1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [136758])
- Mar 25 09:01:00 host1 kernel: block drbd0: data-integrity-alg: crc32c
- Mar 25 09:01:00 host1 kernel: block drbd0: drbd_sync_handshake:
- Mar 25 09:01:00 host1 kernel: block drbd0: self 5114F9434B703F5D:9C28E5306F77E971:4ED5A202300A9955:4ED4A202300A9955 bits:17622 flags:0
- Mar 25 09:01:00 host1 kernel: block drbd0: peer 9C28E5306F77E970:0000000000000000:4ED5A202300A9954:4ED4A202300A9955 bits:130048 flags:2
- Mar 25 09:01:00 host1 kernel: block drbd0: uuid_compare()=1 by rule 70
- Mar 25 09:01:00 host1 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
- Mar 25 09:01:00 host1 kernel: block drbd0: peer( Secondary -> Primary )
- Mar 25 09:01:00 host1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
- Mar 25 09:01:00 host1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
- Mar 25 09:01:00 host1 kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
- Mar 25 09:01:00 host1 kernel: block drbd0: Began resync as SyncSource (will sync 590684 KB [147671 bits set]).
- Mar 25 09:01:00 host1 kernel: block drbd0: updated sync UUID 5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971:4ED5A202300A9955
- Mar 25 09:01:07 host1 kernel: block drbd0: Resync done (total 6 sec; paused 0 sec; 98444 K/sec)
- Mar 25 09:01:07 host1 kernel: block drbd0: updated UUIDs 5114F9434B703F5D:0000000000000000:9C29E5306F77E971:9C28E5306F77E971
- Mar 25 09:01:07 host1 kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
- Mar 25 09:01:07 host1 kernel: block drbd0: bitmap WRITE of 6375 pages took 55 jiffies
- Mar 25 09:01:07 host1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- Split brain after 10 minutes of normal work
- Mar 25 09:11:43 host1 kernel: block drbd0: Digest mismatch, buffer modified by upper layers during write: 1274046648s +4096
- Mar 25 09:11:43 host1 kernel: block drbd0: sock was shut down by peer
- Mar 25 09:11:43 host1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
- Mar 25 09:11:43 host1 kernel: block drbd0: short read expecting header on sock: r=0
- Mar 25 09:11:43 host1 kernel: block drbd0: new current UUID 72BF71273D52849B:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971
- Mar 25 09:11:43 host1 kernel: block drbd0: meta connection shut down by peer.
- Mar 25 09:11:43 host1 kernel: block drbd0: asender terminated
- Mar 25 09:11:43 host1 kernel: block drbd0: Terminating asender thread
- Mar 25 09:11:43 host1 kernel: block drbd0: Connection closed
- Mar 25 09:11:43 host1 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
- Mar 25 09:11:43 host1 kernel: block drbd0: receiver terminated
- Mar 25 09:11:43 host1 kernel: block drbd0: Restarting receiver thread
- Mar 25 09:11:43 host1 kernel: block drbd0: receiver (re)started
- Mar 25 09:11:43 host1 kernel: block drbd0: conn( Unconnected -> WFConnection )
- Mar 25 09:11:43 host1 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
- Mar 25 09:11:43 host1 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
- Mar 25 09:11:43 host1 kernel: block drbd0: conn( WFConnection -> WFReportParams )
- Mar 25 09:11:43 host1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [136758])
- Mar 25 09:11:43 host1 kernel: block drbd0: data-integrity-alg: crc32c
- Mar 25 09:11:43 host1 kernel: block drbd0: drbd_sync_handshake:
- Mar 25 09:11:43 host1 kernel: block drbd0: self 72BF71273D52849B:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971 bits:175 flags:0
- Mar 25 09:11:43 host1 kernel: block drbd0: peer 0225AA4EFAB3BE37:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971 bits:0 flags:0
- Mar 25 09:11:43 host1 kernel: block drbd0: uuid_compare()=100 by rule 90
- Mar 25 09:11:43 host1 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
- Mar 25 09:11:43 host1 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
- Mar 25 09:11:43 host1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
- Mar 25 09:11:43 host1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
- Mar 25 09:11:43 host1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
- Mar 25 09:11:43 host1 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
- Mar 25 09:11:43 host1 kernel: block drbd0: error receiving ReportState, l: 4!
- Mar 25 09:11:43 host1 kernel: block drbd0: asender terminated
- Mar 25 09:11:43 host1 kernel: block drbd0: Terminating asender thread
- Mar 25 09:11:43 host1 kernel: block drbd0: Connection closed
- Mar 25 09:11:43 host1 kernel: block drbd0: conn( Disconnecting -> StandAlone )
- Mar 25 09:11:43 host1 kernel: block drbd0: receiver terminated
- Mar 25 09:11:43 host1 kernel: block drbd0: Terminating receiver thread
- And then I had to disable data-integrity-alg and solve "split brain" to make servers work again
- PHASE 2: HOST2
- Resync after host2 rebooted: everything seems okay
- Mar 25 09:01:00 host2 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
- Mar 25 09:01:00 host2 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
- Mar 25 09:01:00 host2 kernel: block drbd0: conn( WFConnection -> WFReportParams )
- Mar 25 09:01:00 host2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [2450])
- Mar 25 09:01:00 host2 kernel: block drbd0: data-integrity-alg: crc32c
- Mar 25 09:01:00 host2 kernel: block drbd0: drbd_sync_handshake:
- Mar 25 09:01:00 host2 kernel: block drbd0: self 9C28E5306F77E970:0000000000000000:4ED5A202300A9954:4ED4A202300A9955 bits:130048 flags:0
- Mar 25 09:01:00 host2 kernel: block drbd0: peer 5114F9434B703F5D:9C28E5306F77E971:4ED5A202300A9955:4ED4A202300A9955 bits:17622 flags:0
- Mar 25 09:01:00 host2 kernel: block drbd0: uuid_compare()=-1 by rule 50
- Mar 25 09:01:00 host2 kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
- Mar 25 09:01:00 host2 kernel: block drbd0: role( Secondary -> Primary )
- Mar 25 09:01:00 host2 kernel: DLM (built Mar 18 2013 06:28:24) installed
- Mar 25 09:01:00 host2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
- Mar 25 09:01:01 host2 kernel: block drbd0: updated sync uuid 9C29E5306F77E971:0000000000000000:4ED5A202300A9954:4ED4A202300A9955
- Mar 25 09:01:01 host2 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
- Mar 25 09:01:01 host2 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
- Mar 25 09:01:01 host2 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
- Mar 25 09:01:01 host2 kernel: block drbd0: Began resync as SyncTarget (will sync 590684 KB [147671 bits set]).
- Mar 25 09:01:07 host2 kernel: block drbd0: Resync done (total 6 sec; paused 0 sec; 98444 K/sec)
- Mar 25 09:01:07 host2 kernel: block drbd0: updated UUIDs 5114F9434B703F5D:0000000000000000:9C29E5306F77E971:9C28E5306F77E971
- Mar 25 09:01:07 host2 kernel: block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
- Mar 25 09:01:07 host2 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
- Mar 25 09:01:07 host2 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
- Mar 25 09:01:07 host2 kernel: block drbd0: bitmap WRITE of 6375 pages took 22 jiffies
- Mar 25 09:01:07 host2 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- Split brain after 10 minutes of normal work
- Mar 25 09:11:43 host2 kernel: block drbd0: Digest integrity check FAILED: 1274046648s +4096
- Mar 25 09:11:43 host2 kernel: block drbd0: error receiving Data, l: 4124!
- Mar 25 09:11:43 host2 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
- Mar 25 09:11:43 host2 kernel: block drbd0: new current UUID 0225AA4EFAB3BE37:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971
- Mar 25 09:11:43 host2 kernel: block drbd0: asender terminated
- Mar 25 09:11:43 host2 kernel: block drbd0: Terminating asender thread
- Mar 25 09:11:43 host2 kernel: block drbd0: Connection closed
- Mar 25 09:11:43 host2 kernel: block drbd0: conn( ProtocolError -> Unconnected )
- Mar 25 09:11:43 host2 kernel: block drbd0: receiver terminated
- Mar 25 09:11:43 host2 kernel: block drbd0: Restarting receiver thread
- Mar 25 09:11:43 host2 kernel: block drbd0: receiver (re)started
- Mar 25 09:11:43 host2 kernel: block drbd0: conn( Unconnected -> WFConnection )
- Mar 25 09:11:44 host2 kernel: block drbd0: Handshake successful: Agreed network protocol version 96
- Mar 25 09:11:44 host2 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
- Mar 25 09:11:44 host2 kernel: block drbd0: conn( WFConnection -> WFReportParams )
- Mar 25 09:11:44 host2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [2450])
- Mar 25 09:11:44 host2 kernel: block drbd0: data-integrity-alg: crc32c
- Mar 25 09:11:44 host2 kernel: block drbd0: drbd_sync_handshake:
- Mar 25 09:11:44 host2 kernel: block drbd0: self 0225AA4EFAB3BE37:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971 bits:0 flags:0
- Mar 25 09:11:44 host2 kernel: block drbd0: peer 72BF71273D52849B:5114F9434B703F5D:9C29E5306F77E971:9C28E5306F77E971 bits:175 flags:0
- Mar 25 09:11:44 host2 kernel: block drbd0: uuid_compare()=100 by rule 90
- Mar 25 09:11:44 host2 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
- Mar 25 09:11:44 host2 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
- Mar 25 09:11:44 host2 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
- Mar 25 09:11:44 host2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
- Mar 25 09:11:44 host2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
- Mar 25 09:11:44 host2 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
- Mar 25 09:11:44 host2 kernel: block drbd0: error receiving ReportState, l: 4!
- Mar 25 09:11:44 host2 kernel: block drbd0: asender terminated
- Mar 25 09:11:44 host2 kernel: block drbd0: Terminating asender thread
- Mar 25 09:11:44 host2 kernel: block drbd0: Connection closed
- Mar 25 09:11:44 host2 kernel: block drbd0: conn( Disconnecting -> StandAlone )
- Mar 25 09:11:44 host2 kernel: block drbd0: receiver terminated
- Mar 25 09:11:44 host2 kernel: block drbd0: Terminating receiver thread
- And then I had to disable data-integrity-alg and solve "split brain" to make servers work again
- *************************************************************************************************************************
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement