hadoop版本 cdh4.3 使用impala创建parquet表后,查询会出错。
[impala:21000] > select * from foo;
Query: select * from foo
ERROR: AnalysisException: Failed to load metadata for table: default.foo
CAUSED BY: TableLoadingException: Failed to load metadata for table: foo
CAUSED BY: MetaException: org.apache.hadoop.hive.serde2.SerDeException SerDe parquet.hive.serde.ParquetHiveSerDe does not exist
原因是hive并没有这些lib,下载它们并放入/opt/cloudera/parcels/CDH/lib/hive/lib目录(我是使用cloudera manager部署的),创建脚本下载
#!/bin/sh
#parquet-pig parquet-scrooge parquet-test-hadoop2 parquet-thrift parquet-avro parquet-cascading
for f in parquet-column parquet-common parquet-encoding parquet-generator parquet-hadoop parquet-hive
do
curl -O http://repo1.maven.org/maven2/com/twitter/${f}/1.2.4/${f}-1.2.4.jar
#curl -O http://oss.sonatype.org/service/local/repositories/releases/content/com/twitter/${f}/1.2.4/${f}-1.2.4.jar
done
curl -O http://repo1.maven.org/maven2/com/twitter/parquet-format/1.0.0/parquet-format-1.0.0.jar
然后把他们拷贝进去
cp parquet-* /opt/cloudera/parcels/CDH/lib/hive/lib
可能要重启metastore,然后在impala中刷新metastore
INVALIDATE METADATA;
在impala修改parquet表
create table test2 (name STRING) STORED AS PARQUETFILE;
插入数据
insert into test2 select * from test;