本文只是分析和代码的关联,并没有解决慢的问题。 有些语句在impala查询很慢,尝试把日志和代码关联起来,便于之后的分析。由于impala涉及到多个对象,有些代码搞不清楚是如何调用的。 当前发现在获取元数据的过程很慢。 代码中有fe和be两个目录,be是c++写的,fe是java写的。 c++通过jni调用java代码(或许还有rpc调用) 在be\src\service\impala-hs2-server.cc里是我们使用的hiveserver2协议的服务端
开始
先把impalad的debug日志记录下来,修改/etc/default/impala里,增加一行export GLOG_v=2 通过impala-shell查询 select distinct(disp_id) from default_impala.kpi_disp_user_info_stat_tb where pt=’2014-04-08′ 等查询完后的时间分析为:
Query Timeline: 2m34s - Start execution: 77.818us (77.818us) - Planning finished: 2m33s (2m33s) - Ready to start remote fragments: 2m33s (4.473ms) - Remote fragments started: 2m34s (580.800ms) - Rows available: 2m34s (353.883ms) - First row fetched: 2m34s (51.765ms) - Unregister query: 2m34s (13.51ms)
其中大部分时间在Planning里面
在日志可以看到这两条
I1228 17:22:53.467937 11982 rpc-trace.cc:133] RPC call: BeeswaxService.query(from ::ffff:IP:39274)
I1228 17:22:53.467978 11982 impala-beeswax-server.cc:170] query(): query=select distinct(disp_id) from default_impala.kpi_disp_user_info_stat_tb where pt=’2014-04-08′
已经开始查询
I1228 17:22:53.471019 11982 Frontend.java:760] analyze query select distinct(disp_id) from default_impala.kpi_disp_user_info_stat_tb where pt=’2014-04-08′
I1228 17:22:53.472192 11982 Frontend.java:709] Requesting prioritized load of table(s): default_impala.kpi_disp_user_info_stat_tb
I1228 17:24:54.335124 11982 Frontend.java:779] Missing tables were not received in 120000ms. Load request will be retried.
I1228 17:24:54.335906 11982 Frontend.java:709] Requesting prioritized load of table(s): default_impala.kpi_disp_user_info_stat_tb
I1228 17:25:27.136531 11982 Frontend.java:833] create plan
impala-beeswax-server.cc中通过jni的方式调用Frontend.java的代码,这里日志中有一次超时了(120秒),重试了一次。第一次花了120秒,第二次重试后,在17:25:27后获取成功,第二次花了30秒。
private AnalysisContext.AnalysisResult analyzeStmt(TQueryCtx queryCtx)
throws AnalysisException, InternalException, AuthorizationException {
AnalysisContext analysisCtx = new AnalysisContext(impaladCatalog_, queryCtx,
authzConfig_);
LOG.debug("analyze query " + queryCtx.request.stmt);
// Run analysis in a loop until it any of the following events occur:
// 1) Analysis completes successfully.
// 2) Analysis fails with an AnalysisException AND there are no missing tables.
// 3) Analysis fails with an AuthorizationException.
try {
while (true) {
try {
analysisCtx.analyze(queryCtx.request.stmt);
Preconditions.checkState(analysisCtx.getAnalyzer().getMissingTbls().isEmpty());
return analysisCtx.getAnalysisResult();
} catch (AnalysisException e) {
Set<TableName> missingTbls = analysisCtx.getAnalyzer().getMissingTbls();
// Only re-throw the AnalysisException if there were no missing tables.
if (missingTbls.isEmpty()) throw e;
// Some tables/views were missing, request and wait for them to load.
if (!requestTblLoadAndWait(missingTbls, MISSING_TBL_LOAD_WAIT_TIMEOUT_MS)) {
LOG.info(String.format("Missing tables were not received in %dms. Load " +
"request will be retried.", MISSING_TBL_LOAD_WAIT_TIMEOUT_MS));
}
}
}
} finally {
// Authorize all accesses.
// AuthorizationExceptions must take precedence over any AnalysisException
// that has been thrown, so perform the authorization first.
analysisCtx.getAnalyzer().authorize(getAuthzChecker());
}
}
进去看requestTblLoadAndWait这个方法
/**
* Requests the catalog server load the given set of tables and waits until
* these tables show up in the local catalog, or the given timeout has been reached.
* The timeout is specified in milliseconds, with a value <= 0 indicating no timeout.
* The exact steps taken are:
* 1) Collect the tables that are missing (not yet loaded locally).
* 2) Make an RPC to the CatalogServer to prioritize the loading of these tables.
* 3) Wait until the local catalog contains all missing tables by (re)checking the
* catalog each time a new catalog update is received.
*
* Returns true if all missing tables were received before timing out and false if
* the timeout was reached before all tables were received.
*/
private boolean requestTblLoadAndWait(Set<TableName> requestedTbls, long timeoutMs)
throws InternalException {
Set<TableName> missingTbls = getMissingTbls(requestedTbls);
// There are no missing tables, return and avoid making an RPC to the CatalogServer.
if (missingTbls.isEmpty()) return true;
// Call into the CatalogServer and request the required tables be loaded.
LOG.info(String.format("Requesting prioritized load of table(s): %s",
Joiner.on(", ").join(missingTbls)));
TStatus status = FeSupport.PrioritizeLoad(missingTbls);
if (status.getStatus_code() != TStatusCode.OK) {
throw new InternalException("Error requesting prioritized load: " +
Joiner.on("\n").join(status.getError_msgs()));
}
long startTimeMs = System.currentTimeMillis();
// Wait until all the required tables are loaded in the Impalad's catalog cache.
while (!missingTbls.isEmpty()) {
// Check if the timeout has been reached.
if (timeoutMs > 0 && System.currentTimeMillis() - startTimeMs > timeoutMs) {
return false;
}
LOG.trace(String.format("Waiting for table(s) to complete loading: %s",
Joiner.on(", ").join(missingTbls)));
getCatalog().waitForCatalogUpdate(MAX_CATALOG_UPDATE_WAIT_TIME_MS);
missingTbls = getMissingTbls(missingTbls);
// TODO: Check for query cancellation here.
}
return true;
}
这里没看懂怎么调用的,调用的是FeSupport.PrioritizeLoad,是放到一个SET里面,并且移到第一位。猜测是impalad自己有个catalog的缓存,如果在缓存找不到(missingTbls),会通过rpc调用中心的catalogd服务。
看中心catalogd的日志,可以看到被调用了两次。时间上和impalad的日志一致。(因为在线上测试,catalogd没有打开更详细日志)
I1228 17:22:53.448292 9209 TableLoader.java:60] Loading metadata for: default_impala.kpi_disp_user_info_stat_tb
I1228 17:22:53.448468 9209 HiveMetaStoreClient.java:238] Trying to connect to metastore with URI thrift://vdc22:9083
I1228 17:22:53.449069 9209 HiveMetaStoreClient.java:326] Connected to metastore.
I1228 17:23:07.808981 15118 HiveMetaStoreClient.java:238] Trying to connect to metastore with URI thrift://vdc22:9083
I1228 17:23:07.809939 15118 HiveMetaStoreClient.java:326] Connected to metastore.
I1228 17:23:20.419278 26716 HiveMetaStoreClient.java:238] Trying to connect to metastore with URI thrift://vdc22:9083
I1228 17:23:20.419970 26716 HiveMetaStoreClient.java:326] Connected to metastore.
I1228 17:24:54.308159 3044 TableLoader.java:60] Loading metadata for: default_impala.kpi_disp_user_info_stat_tb
这里日志的代码在这里com.cloudera.impala.catalog.TableLoader
后面的步骤花费的时间不多,不继续分析了。