Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- After the clusterwide stop and restart:
- On dnds1-13:
- -------------------
- 2013-03-14 00:11:37,993 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Atomically moving dnds1-4,60020,1363219866063's hlogs to my queue
- 2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-4%2C60020%2C1363219866063.1363219868868 with data
- 2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: The multi list size is: 5
- 2013-03-14 00:11:38,068 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in copyQueuesFromRSUsingMulti:
- org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
- at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
- at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1436)
- at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:590)
- at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
- at java.lang.Thread.run(Thread.java:662)
- 2013-03-14 00:11:38,070 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
- 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
- 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-6,60020,1359508078741
- 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-3,60020,1359508079058
- 2013-03-14 00:11:39,080 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
- 2013-03-14 00:11:39,081 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-4%2C60020%2C1363219866063.136321986886
- 8 at 0
- 2013-03-14 00:11:39,090 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:3 and size: 0
- 2013-03-14 00:11:39,090 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #dnds1-4%2C60020%2C1363219866063.1363219868868
- for position 1004 in hdfs://cluster/hbase/.oldlogs/dnds1-4%2C60020%2C1363219866063.1363219868868
- 2013-03-14 00:11:39,139 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server dnds1-13,60020,1363219887385: Writing replication status
- org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/dnds1-13,60020,1363219887385/1-dnds1-4,60
- 020,1363219866063/dnds1-4%2C60020%2C1363219866063.1363219868868
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
- at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:349)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:848)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:900)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:894)
- at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:155)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:379)
- 2013-03-14 00:11:39,140 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
- -------------------
- Another one on dnds1-12:
- -------------------
- 2013-03-14 00:11:35,905 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Atomically moving dnds1-8,60020,1363219865904's hlogs to my queue
- 2013-03-14 00:11:35,909 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-8%2C60020%2C1363219865904.1363219868852 with data
- 2013-03-14 00:11:35,937 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363217800957 with data 7470
- 2013-03-14 00:11:35,972 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363102598299 with data null
- 2013-03-14 00:11:35,973 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363109798542 with data null
- 2013-03-14 00:11:35,974 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363124199049 with data null
- 2013-03-14 00:11:35,975 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363116998811 with data null
- 2013-03-14 00:11:35,977 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363098998059 with data null
- 2013-03-14 00:11:35,978 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363026996488 with data null
- 2013-03-14 00:11:35,979 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363127799218 with data null
- 2013-03-14 00:11:35,991 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363034196762 with data null
- 2013-03-14 00:11:35,992 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1362533787045 with data null
- 2013-03-14 00:11:35,993 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363023396288 with data null
- 2013-03-14 00:11:35,993 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: The multi list size is: 29
- 2013-03-14 00:11:36,018 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in copyQueuesFromRSUsingMulti:
- org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
- at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
- at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1436)
- at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:590)
- at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
- at java.lang.Thread.run(Thread.java:662)
- 2013-03-14 00:11:36,019 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
- 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
- 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
- 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-3,60020,1359508079058
- 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-11,60020,1359508114550
- 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
- 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-10,60020,1359508114634
- 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-6,60020,1359508078741
- 2013-03-14 00:11:37,028 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
- 2013-03-14 00:11:37,030 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-8%2C60020%2C1363219865904.1363219868852 at 0
- 2013-03-14 00:11:37,036 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
- 2013-03-14 00:11:37,038 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-9%2C60020%2C1362533781275.1362533787045 at 0
- 2013-03-14 00:11:37,046 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:0 and size: 0
- 2013-03-14 00:11:37,046 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Done with the recovered queue 1-dnds1-8,60020,1363219865904
- 2013-03-14 00:11:37,048 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Finished recovering the queue
- 2013-03-14 00:11:37,048 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Source exiting 1
- 2013-03-14 00:11:37,055 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:1 and size: 0
- 2013-03-14 00:11:37,055 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #dnds1-9%2C60020%2C1362533781275.1362533787045 for position 372 in hdfs://cluster/hbase/.oldlogs/dnds1-9%2C60020%2C1362533781275.1362533787045
- 2013-03-14 00:11:37,074 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server dnds1-12,60020,1363219887328: Writing replication status
- org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/dnds1-12,60020,1363219887328/1-dnds1-9,60020,1362533781275-dnds1-11,60020,1362533806866-dnds1-8,60020,1363219865904/dnds1-9%2C60020%2C1362533781275.1362533787045
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
- at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
- at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
- at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:349)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:848)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:900)
- at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:894)
- at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:155)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:379)
- 2013-03-14 00:11:37,074 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
- -------------------------
- We then restarted all RS a 2nd time.
- This time they stay up, but now the log is filled with messages like these:
- ------------------------
- 2013-03-14 01:00:37,998 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-12%2C60020%2C1363220608780.1363220609572 at 0
- 2013-03-14 01:00:38,001 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1 Got:
- java.io.EOFException
- at java.io.DataInputStream.readFully(DataInputStream.java:180)
- at java.io.DataInputStream.readFully(DataInputStream.java:152)
- at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800)
- at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
- at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
- at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
- at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
- at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
- at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:728)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:67)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:507)
- at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313)
- 2013-03-14 01:00:38,001 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Waited too long for this file, considering dumping
- 2013-03-14 01:00:38,001 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable to open a reader, sleeping 1000 times 10
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement