hadoop参数

2017.7.28 2025.7.19 Hadoop 1181 3 分钟

YARN

参数	默认值	值	备注
yarn.nodemanager.container-metrics.enable		false	关闭，避免nodemanager内存OOM，http://hackershell.cn/?p=993
yarn.resourcemanager.recovery.enabled		true	启用 ResourceManager Recovery
yarn.scheduler.fair.continuous-scheduling-enabled		true	启用 Fair Scheduler 持续调度
mapreduce.reduce.shuffle.memory.limit.percent		0.5
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage		0.95
mapreduce.task.userlog.limit.kb		102400	限制container输出的日志不要太大，设置为100MB，注意要设置log.backups，不然会使用内存
yarn.app.mapreduce.task.container.log.backups		1	备份文件个数
yarn.scheduler.fair.max.assign	-1	10	在rm配置，一次分配中，每台机器最大分配任务数

HDFS

参数	默认值	值	配置位置	备注
fs.du.interval		1200000		磁盘du间隔，du对磁盘IO影响比较大
dfs.blockreport.initialDelay		180		延迟blockreport，避免重启时集中汇报
dfs.block.scanner.cursor.save.interval.ms		默认10分钟		10分钟保存一次scan cursor
dfs.block.scanner.volume.bytes.per.second		4194304		默认值1MB，磁盘扫描的限速，要注意看看扫描一个磁盘会不会太慢，但设的太高也会影响IO
dfs.datanode.scan.period.hours		3weeks		常规磁盘扫描间隔
dfs.namenode.checkpoint.txns		10000000	namenode hdfs-site.xml	设置大一些，避免频繁的checkpoint传输
~~hadoop.user.group.static.mapping.overrides~~	~~dr.who=;~~	~~dr.who=;yarn=yarn,hadoop,supergroup;~~ <p> <del>mapred:mapred,hadoop,supergroup</del></td> <td style="width: 611px;"> </td> <td style="width: 611px;"> <del>覆盖组权限，需要配置在core-site.xml里面，需要重启namenode</del> </td></tr> <tr> <td style="width: 104px;"> dfs.namenode.posix.acl.inheritance.enabled </td> <td style="width: 104px;"> false </td> <td style="width: 58px;"> true </td> <td style="width: 611px;"> </td> <td style="width: 611px;"> 在namenode hdfs-site.xml配置，在打上HDFS-6962补丁后，ACL mask权限能够继承 </td> </tr> <tr> <td style="width: 104px;"> dfs.datanode.balance.max.concurrent.moves </td> <td style="width: 104px;"> 5 </td> <td style="width: 58px;"> 50 </td> <td style="width: 611px;"> </td> <td style="width: 611px;"> 平衡的线程数，用于提高平衡效率（需要在DataNode和Balance的hdfs-site配置，需要重启DataNode） </td> </tr> <tr> <td style="width: 104px;"> dfs.datanode.balance.bandwidthPerSec </td> <td style="width: 104px;"> 10MB </td> <td style="width: 58px;"> 30MB </td> <td style="width: 611px;"> </td> <td style="width: 611px;"> 平衡的速度 </td> </tr> <tr> <td style="width: 104px;"> ha.failover-controller.new-active.rpc-timeout.ms </td> <td style="width: 104px;"> 60000 </td> <td style="width: 58px;"> 300000 </td> <td style="width: 611px;"> 全局的core-site.xml里面配置（客户端和failover controller都会用到） </td> <td style="width: 611px;"> failover controller在转换active等待的时间，在hdfs failover controller里面配置，如果时间不够会在failover controller里面看到超时错误日志。<a href="https://issues.apache.org/jira/browse/HDFS-11254">HDFS-11254</a> 在replay editlog的时候也会很慢。<br /> 注意要先重启备机的controller，否则重启active controller，namenode会切换。 </td> </tr> <tr> <td style="width: 104px;"> dfs.image.transfer.bandwidthPerSec </td> <td style="width: 104px;"> </td> <td style="width: 58px;"> 41943040 </td> <td style="width: 611px;"> namenode hdfs-site.xml </td> <td style="width: 611px;"> image传输限速，占用所有带宽会影响namenode rpc请求，重启active namenode才生效 </td> </tr></tbody> </table> <p>   </p> <h2> HIVE </h2> <table style="width: 1113px;"> <tr> <td style="width: 105px;"> 参数 </td> <td style="width: 101px;"> 默认值 </td> <td style="width: 159px;"> 建议值 </td> <td style="width: 921px;"> 备注 </td> </tr> <tr> <td style="width: 105px;"> hive.metastore.failure.retries </td> <td style="width: 101px;"> 1 </td> <td style="width: 159px;"> 3 </td> <td style="width: 921px;"> metastore中途失败重试的次数，某个版本之前默认值是1，后面变为3 </td> </tr> <tr> <td style="width: 105px;"> hive.metastore.try.direct.sql </td> <td style="width: 101px;"> false </td> <td style="width: 159px;"> </td> <td style="width: 921px;"> Hive Metastore 是否应尝试使用直接 SQL 查询，而不是针对一定读取路径使用 DataNucleus。这样在获取许多分区时可以使 Metastore 性能得到数量级的提升。打开这个开关要确保打了补丁HIVE-15551，否则有内存泄露 </td> </tr> <tr> <td style="width: 105px;"> </td> <td style="width: 101px;"> </td> <td style="width: 159px;"> </td> <td style="width: 921px;"> </td> </tr> </table> <h2> HBASE </h2> <p> https://github.com/mattshma/bigdata/blob/master/hbase/docs/hbase_rpc.md </p> <p> hbase.ipc.server.listen.queue.size 默认值 128 </p> <p> hbase.ipc.server.read.threadpool.size 默认值 10 </p> <p> hbase.regionserver.handler.count </p> <p> hbase.regionserver.metahandler.count </p> 作者：fatkun 链接：https://fatkun.github.io/2017/07/hadoop-config.html 许可：CC BY-NC-SA 4.0 相关文章： cloudera manager在下载日志返回502 BAD_GATEWAY错误安装cloudera manger（中科大反向代理）构建cloudera自定义parcels hue cdh5.4.0 编辑器补丁 HDFS文件的健康检查 < 安装Ubuntu提示can’t open /dev/sr0:No medium found HDFS文件的健康检查 > © 2010–2025 fatkun Powered by Hugo \| Theme is MemE CC BY-NC-SA 4.0