Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
Recommendations - May-16-2017 @ 11:00 AM
INFRA
- Generally accepted oversubscription ratios are around 4:1 at the server access layer and 2:1 between the access layer and the aggregation layer or core.
- Schedule migration from EXT3 to XFS for all servers
- Separate service DB's once load is justified.
- Best practice on larger clusters
- Validate HDFS disks are using RAID-0-Per-Spindle (Single Stripe) and are not bypassing the control in JBOD mode*
- Linux Settings
- net.core.somaxconn = 4096 [this is 4000 atm, and DataNode Max Threads for Transfer is 4096]
- net.ipv4.tcp_fin_timeout = 10
- vm.dirty_background_ratio = 20
- vm.dirty_ratio = 50
Ambariv // SmartSense
- Deploy Ambari Views server to start migrating users away from HUE
- Setup the next version of SmartSense's Small File Report or build a script for it
- LOTS of platform issues are due to App Teams not handling small files at all! This is a MAJOR PROBLEM.
HDFS
- Add additional (2 min isolated mounts) dfs.namenode.name.dir mount
- Best Practice
- dfs.namenode.checkpoint.period = 3600
- Current Checkpoint is very high
- Increase DataNode Heaps to 2GB
- NameNode Thread Pool - Suggested value of NN server thread size is ln(no of data node)*20
- Ensure Safemode threshold != 1.
YARN
- yarn.timeline-service.generic-application-history.save-non-am-container-meta-info = false
- Enable FAIRWEIGHT for DZ Queue
- Enable Preemption on only DZ Queue
Hive
- Convert DEFAULT Hive Engine to Tez
- Set Tez Session Timeout to like 10-20 seconds, with a low number of containers. We want REUSE!
- tez.session.am.dag.submit.timeout.secs
- hive.plan.serialization.format = kyro
- Complete Hive HA Setup
- Move prod apps into HiveServer for Prod Apps
- Make ORC file Default for all new tables
- Hive Specific Properties
- hive.exec.compress.intermediate = true
- hive.exec.compress.output = true
- hive.vectorized.execution.enabled = true
- hive.vectorized.execution.reduce.enabled = true
- hive.exec.parallel = true
- hive.optimize.bucketmapjoin.sortedmerge = true
- hive.exec.dynamic.partition.mode = nonstrict
- hive.groupby.orderby.position.alias = true
- hive.enforce.bucketing = true
- hive.support.concurrency = true
- hive.optimize.ppd = true
- hive.optimize.ppd.storage = true;
- hive.cbo.enable = true;
- hive.compute.query.using.stats=true;
- hive.stats.fetch.column.stats=true;
- hive.stats.fetch.partition.stats=true;
- hive.tez.auto.reducer.parallelism=true;
- hive.tez.max.partition.factor=20;
- hive.exec.reducers.bytes.per.reducer=128000000;
Tez
- Increase Tez AM Size so it can compile jobs with many files
- tez.am.resource.memory.mb=4096
Spark
- Reconsider Enable platform wide dynamic allocation
Oozie
- Complete Oozie HA Setup
- Separate Oozie for Prod Apps and Analysts
MAPREDUCE
- mapreduce.input.fileinputformat.split.minsize = 104857600
- right now MapReduce will just take super tiny files per mapper, this is not acceptable and we would rather use network to aggregate to reduce compute traffic
- Enable Mapper Compression
Other
- Decouple user home DIRs from NAS
- Use Dedicated Disks for Zookeeper
- Use Dedicated Disks for QJM
- Remove all QJM from NameNodes
- Implement MasterServers Topology form Platform Guide - Due to age we may need to refresh this slightly.
- Consider increasing HADOOP_CLIENT_OPTS=-Xmx4g as we have tables with many files that are creating problems compiling jobs
- This would be resolved well with having AppTeams handle thier small files issues
Add Comment
Please, Sign In to add comment