背景
已经通过cloudera manager装好了hadoop,但由于不敢升级cm,所以单独升级impala,采用yum的安装方式。原本已经安装了impala1.1,现在升级到impala2.0。所以有些配置没有说明,请参考官方文档。
在本地做个镜像
把需要用到的rpm下载下来
wget -r -np -k 'http://archive.cloudera.com/impala/redhat/6/x86_64/impala/2.0.0/'
配置一个nginx,用来提供下载 nginx配置
server {
listen 9500;
server_name www.lnmp.org;
index index.html index.htm index.php;
root /home/kpi/impala/archive.cloudera.com;
autoindex on; #打开索引功能
autoindex_exact_size off; #人性化方式显示大小
autoindex_localtime on; #显示服务器时间
}
配置repo
把以下内容保存为文件 impala.repo 到 /etc/yum.repos.d/目录下, 下面的地址 archive.cloudera.com 可以换成你本地镜像的地址
[cloudera-impala]
name=Impala
baseurl=http://archive.cloudera.com/impala/redhat/6/x86_64/impala/2.0.0/
gpgkey = http://archive.cloudera.com/impala/redhat/6/x86_64/impala/RPM-GPG-KEY-cloudera
gpgcheck = 1
安装
安装服务,所有机器安装
yum install impala impala-server impala-shell
在某台机器上装state-store这些启动脚本
yum install impala-state-store impala-catalog
配置
ln -s /etc/hive/conf/hive-site.xml /etc/impala/conf/
ln -s /etc/hive/conf/hive-env.sh /etc/impala/conf/
ln -s /etc/hadoop/conf/core-site.xml /etc/impala/conf/
cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
hdfs-site.xml配置不能直接用客户端的配置,会缺少某个配置导致impala-server启动不了。如果启动不了可以检查impala的日志。 在hdfs-site.xml加入
<property>
<name>dfs.client.file-block-storage-locations.timeout</name>
<value>3000</value>
</property>
hdfs配置查看下面的网址,注意影响性能的Short-Circuit Reads选项
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_config_performance.html?scroll=config_performance
修改impala默认用户为kpi
这个只是我们需要用kpi这个账号来读取数据,如果你使用默认用户impala也是可以的,注意要在hdfs配置这个用户可以直接读取hdfs
修改/etc/init.d/impala-server, /etc/init.d/impala-state-store,/etc/init.d/impala-catalog 文件,
找到 SVC_USER=”impala” 改为 SVC_USER=”kpi”
找到install -d -m 0755 -o impala -g impala /var/run/impala 改为 install -d -m 0755 -o kpi -g kpi /var/run/impala
创建日志目录
mkdir -p /home/hadoop/log/impala/
chown kpi:kpi /home/hadoop/log/impala/
修改impala 的默认配置/etc/default/impala文件
IMPALA_CATALOG_SERVICE_HOST=catalog的机器名或IP
IMPALA_STATE_STORE_HOST=statestore的机器名或IP
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/home/hadoop/log/impala
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} "
IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"
IMPALA_SERVER_ARGS=" \
-mem_limit=4294967296 \
-enable_webserver=true \
-webserver_port=25000 \
-beeswax_port=21000 \
-hs2_port=21050 \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}"
ENABLE_CORE_DUMPS=false
# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib
# MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar
# IMPALA_BIN=/usr/lib/impala/sbin
# IMPALA_HOME=/usr/lib/impala
# HIVE_HOME=/usr/lib/hive
# HBASE_HOME=/usr/lib/hbase
# IMPALA_CONF_DIR=/etc/impala/conf
# HADOOP_CONF_DIR=/etc/hadoop/conf
启动
service impala-state-store start
service impala-catalog start
service impala-server start
报错/etc/init.d/impala-state-store: line 35: /etc/default/hadoop: No such file or directory 这个无关紧要,找不到hadoop默认配置文件
报错处理
在redhat5.x版本中启动impala-shell会报sasl的错误,需要用pip安装python26-libs ,sasl
报错 ERROR: block location tracking is not properly enabled because
– dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000.
是因为hdfs的配置不对,需要在hdfs_site.xml加入dfs.client.file-block-storage-locations.timeout配置,见上面的配置
检查进程
看一下进程是否都已经起来了
ps auxf|grep state
ps auxf|grep catalog
安装impala lzo(可选)
加入cloudera-gplextras5.repo
[cloudera-gplextras5]
# Packages for Cloudera's GPLExtras, Version 5, on RedHat or CentOS 6 x86_64
name=Cloudera's GPLExtras, Version 5
baseurl=http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/4.3.0/
gpgkey = http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/RPM-GPG-KEY-cloudera
gpgcheck = 1
安装
yum install impala-lzo-2.0.0
参考
http://blog.javachen.com/2013/03/29/install-impala/
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_noncm_installation.html