[HADOOP-11238]hdfs namenode getGroups延迟

在namenode日志看到如下错误

2016-05-11 01:00:26,360 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=xxx) took 5046 milliseconds

这个方法是hadoop为了获取某个用户是哪个组。

分析补丁

补丁：https://issues.apache.org/jira/browse/HADOOP-11238
先贴出修改前的源代码

public List<String> getGroups(String user) throws IOException {
    // No need to lookup for groups of static users
    List<String> staticMapping = staticUserToGroupsMap.get(user);
    if (staticMapping != null) {
      return staticMapping;
    }
    // Return cached value if available
    CachedGroups groups = userToGroupsMap.get(user);
    long startMs = timer.monotonicNow();
    if (!hasExpired(groups, startMs)) {
      if(LOG.isDebugEnabled()) {
        LOG.debug("Returning cached groups for '" + user + "'");
      }
      if (groups.getGroups().isEmpty()) {
        // Even with enabling negative cache, getGroups() has the same behavior
        // that throws IOException if the groups for the user is empty.
        throw new IOException("No groups found for user " + user);
      }
      return groups.getGroups();
    }

    // Create and cache user's groups
    List<String> groupList = impl.getGroups(user);
    long endMs = timer.monotonicNow();
    long deltaMs = endMs - startMs ;
    UserGroupInformation.metrics.addGetGroups(deltaMs);
    if (deltaMs > warningDeltaMs) {
      LOG.warn("Potential performance problem: getGroups(user=" + user +") " +
          "took " + deltaMs + " milliseconds.");
    }
    groups = new CachedGroups(groupList, endMs);
    if (groups.getGroups().isEmpty()) {
      if (isNegativeCacheEnabled()) {
        userToGroupsMap.put(user, groups);
      }
      throw new IOException("No groups found for user " + user);
    }
    userToGroupsMap.put(user, groups);
    if(LOG.isDebugEnabled()) {
      LOG.debug("Returning fetched groups for '" + user + "'");
    }
    return groups.getGroups();
  }

导致缓慢的原因是，当缓存超过时间失效时，同时有多个线程调用impl.getGroups(user)方法，我们默认是使用org.apache.hadoop.security.ShellBasedUnixGroupsMapping的实现，这个实际是调用系统命令id来获取用户组信息。多线程调用时，可能会同时启动几十个进程，导致返回缓慢。我之前想到的修改方法是使用线程同步的方式，避免多个线程同时更新cache。搜索一下issue果然有人实现了，但那个是很久就提交了，没实现NegativeCache，所以官方合并是我上面给出来的补丁。 HADOOP-11238实现的方案是使用Google Guava第三方库来缓存。不了解具体实现，看注释同样实现了多线程请求时，如果在缓存中不存在，也只有一个线程加载，其他线程等待这个线程加载完成。如果在缓存中存在，直接返回缓存内容，如果缓存超时，刷新数据过程中，其他线程仍然获取旧数据，不会等待这个刷新。
NegativeCache是为了缓存空组（某个用户不属于任何组）的情况，超时时间和cache不一样，所以额外加了个变量。

踩坑

cloudera manager里面的hadoop.security.group.mapping配置是在“服务范围”内的，也就是对客户端和服务端都会生效。设置为jni的方式后，客户端可能会加载不到native lib导致任务失败。

[HADOOP-11238]hdfs namenode getGroups延迟

分析补丁

相关参数

踩坑

相关文章：