Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- I am testing a small HDFS cluster of 4 machines. I successfully
- installed CDH3 (and now CDH4), configured and started HDFS, etc.
- As part of my testing, I filled HDFS with files to full capacity (100%
- full, cannot write any data into HDFS, the data disks on all 4
- machines are 100% full) and hit an anomaly and a showstopper.
- The anomaly has to do with reported disk space. "df" is expected to
- report HDFS 100% full, instead it reports 41% used. ("dfsadmin -
- report" is correct: "DFS Used: 99.9%, DFS remaining: 1.34GB" (out of
- 2.7TB)).
- The showstopper has to do with the balancer. The balancer thinks that
- one of the nodes is only 8% used and thinks that data should be moved
- to it from other nodes. Of course this 8% number is wrong - all data
- disks are 100% full, and as expected, all data movement fails.
- This is somewhat consistent with the "live nodes" report for the
- supposedly 8% full node:
- Node Last Contact Admin State Configured Capacity (TB) Used(TB)
- Non DFS Used (TB) Remaining(TB) Used (%) Used (%) Remaining(%)
- Blocks Block PoolUsed (TB) Block PoolUsed (%)> Blocks Failed
- Volumes
- iris02 0 In Service 1.61 0.13 1.47 0 8.19 0.01 2164
- 0.13 8.19 1
- ladd02 1 In Service 0.26 0.26 0 0 99.99 0.01 4321 0.26 99.99 1
- ladd06 1 In Service 0.49 0.49 0 0 99.95 0.05 7973 0.49 99.95 1
- ladd12 1 In Service 0.37 0.38 0 0 103.31 0.05 6236 0.38 103.31 0
- Observe iris02 is reported as "8% used", "0 TB remaining", "0%
- remaining".
- Also observe that iris02 has 1.47TB of "non DFS" data reported, which
- is correct - there is about 1.5 TB of non-DFS data on that disk.
- This is consistent with "dfsadmin -report" which shows 1.47GB non-DFS
- uses, 91MB "DFS remaining", 8% "DFS Used", 0.01% "DFS remaining".
- So what is going on here?
- "df" and the balancer "forgot" to account for the "non-DFS" data?
- The HDFS balancer only works correctly if "non DFS" disk use is zero
- on every node?
- Attached is the output of "df", output of "dfsadmin -report" and
- output of the balancer.
- > df /hdfs
- Filesystem 1K-blocks Used Available Use% Mounted on
- fuse_dfs 2925264896 1353711616 1571553280 47% /hdfs
- > hdfs dfsadmin -report
- Configured Capacity: 2995494459813 (2.72 TB)
- Present Capacity: 1387596803939 (1.26 TB)
- DFS Remaining: 1433567075 (1.34 GB)
- DFS Used: 1386163236864 (1.26 TB)
- DFS Used%: 99.9%
- Under replicated blocks: 3867
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- -------------------------------------------------
- Datanodes available: 4 (4 total, 0 dead)
- Live datanodes:
- Name: 142.90.111.96:50010 (ladd06.triumf.ca)
- Hostname: ladd06.triumf.ca
- Rack: /detfac
- Decommission Status : Normal
- Configured Capacity: 535521764641 (498.74 GB)
- DFS Used: 534397218816 (497.7 GB)
- Non DFS Used: 0 (0 KB)
- DFS Remaining: 1124545825 (1.05 GB)
- DFS Used%: 99.79%
- DFS Remaining%: 0.21%
- Last contact: Sat Jun 23 18:35:16 PDT 2012
- Name: 142.90.119.126:50010 (ladd12.triumf.ca)
- Hostname: ladd12.triumf.ca
- Rack: /isac2
- Decommission Status : Normal
- Configured Capacity: 404426435138 (376.65 GB)
- DFS Used: 417828085760 (389.13 GB)
- Non DFS Used: 0 (0 KB)
- DFS Remaining: 189199649 (180.43 MB)
- DFS Used%: 103.31%
- DFS Remaining%: 0.05%
- Last contact: Sat Jun 23 18:35:18 PDT 2012
- Name: 142.90.111.72:50010 (ladd02.triumf.ca)
- Hostname: ladd02.triumf.ca
- Rack: /detfac
- Decommission Status : Normal
- Configured Capacity: 289398961441 (269.52 GB)
- DFS Used: 289375211520 (269.5 GB)
- Non DFS Used: 0 (0 KB)
- DFS Remaining: 23749921 (22.65 MB)
- DFS Used%: 99.99%
- DFS Remaining%: 0.01%
- Last contact: Sat Jun 23 18:35:18 PDT 2012
- Name: 142.90.119.162:50010 (iris02.triumf.ca)
- Hostname: iris02.triumf.ca
- Rack: /isac2
- Decommission Status : Normal
- Configured Capacity: 1766147298593 (1.61 TB)
- DFS Used: 144562720768 (134.63 GB)
- Non DFS Used: 1621488506145 (1.47 TB)
- DFS Remaining: 96071680 (91.62 MB)
- DFS Used%: 8.19%
- DFS Remaining%: 0.01%
- Last contact: Sat Jun 23 18:35:17 PDT 2012
- > hdfs balancer
- 12/06/23 18:40:24 INFO balancer.Balancer: namenodes = [hdfs://
- ladd12.triumf.ca/]
- 12/06/23 18:40:24 INFO balancer.Balancer: p =
- Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
- Time Stamp Iteration# Bytes Already Moved Bytes Left
- To Move Bytes Being Moved
- 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /detfac/
- 142.90.111.96:50010
- 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /detfac/
- 142.90.111.72:50010
- 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /
- isac2/142.90.119.126:50010
- 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /
- isac2/142.90.119.162:50010
- 12/06/23 18:40:25 INFO balancer.Balancer: 3 over-utilized:
- [Source[142.90.111.96:50010, utilization=99.91803337474168],
- Source[142.90.111.72:50010, utilization=99.99179336343097],
- Source[142.90.119.126:50010, utilization=103.31374249000984]]
- 12/06/23 18:40:25 INFO balancer.Balancer: 1 underutilized:
- [BalancerDatanode[142.90.119.162:50010,
- utilization=8.185201816584936]]
- 12/06/23 18:40:25 INFO balancer.Balancer: Need to move 512.4 GB to
- make the cluster balanced.
- 12/06/23 18:40:25 INFO balancer.Balancer: Decided to move 91.62 MB
- bytes from 142.90.111.96:50010 to 142.90.119.162:50010
- 12/06/23 18:40:25 INFO balancer.Balancer: Will move 91.62 MB in this
- iteration
- Jun 23, 2012 6:40:25 PM 0 0 KB
- 512.4 GB 91.62 MB
- All disks are 100% full:
- > df /home1 /ladd/data2 /ladd/data6 /ladd/data12 /ladd/iris_data2
- Filesystem 1K-blocks Used Available Use% Mounted on
- /dev/md2 96132876 84858660 6390864 93% /home1
- ladd02:/data2 307355648 306812928 542720 100% /ladd/data2
- ladd06:/data6 547709952 522897408 24812544 96% /ladd/data6
- /data12 348293764 323569928 24723836 93% /ladd/data12
- iris02:/iris_data2 1749492736 1749464064 28672 100% /ladd/
- iris_data2
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement