Hive On Tez cloudera5.4

配置 tez-site.xml <configuration> <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/apps/tez-0.8.5/,${fs.defaultFS}/apps/tez-0.8.5/lib/</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> <property> <name>tez.runtime.compress</name> <value>true</value> </property> <property> <name>tez.runtime.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration> 调优参数 tez.grouping.min-size 分片最小限制 报错处理 找不到lzo……

jobhistory关注的issue

JobHistory cache issue https://issues.apache.org/jira/browse/MAPREDUCE-6436 可能导致打印的日志太多影响性能 Job history server scans can become blocked on a single, slow entry https://issues.apache.org/jira/browse/MAPREDUCE-6797 High contention on scanning of user directory under immediate_done in……

kafka关注的issue

BrokerChangeListener computes inconsistent live/dead broker list https://issues.apache.org/jira/browse/KAFKA-3085 BrokerChangeListener missed broker id path ephemeral node deletion event. https://issues.apache.org/jira/browse/KAFKA-2448 Controller could miss a broker state change https://issues.apache.org/jira/browse/KAFKA-1120

在reduce InMemoryMapOutput OOM

2017-03-14 15:41:52,724 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)……

hive merge小文件

2.输出合并 set hive.merge.mapfiles = true #在Map-only的任务结束时合并小文件(默认开启) set hive.merge.mapredfiles = true #……