子查询
子查询需要一个别名
SELECT * FROM (SELECT A,B FROM atable) alaisName
group by
这样的语句会报错,提示a不在group by里面
select a,sum(b) from atable group by b
hive Expression Not In Group By Key
via:来源
解决方法是,使用 max、min、或者collect_set,如:
select max(a),sum(b) from atable group by b
group by 之后字段可以作为条件再继续查找
SELECT * FROM (SELECT SUM(A) ASUM FROM ATABLE GROUP BY A) SUBQ1 WHERE ASUM > 10
Join
默认join是内连接,还有left outer join 、right outer join和 full outer join
可以用case when … then … else …
语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END via:HIVE UDF整理(九) via:数据倾斜总结
select *
from log a
left outer join users b
on case when a.user_id is null then concat(‘hive’,rand() ) else a.user_id end = b.user_id;
Not in
not in 在Hive0.8才支持,在之前的版本可以用left outer join代替 Hive使用LEFT OUTER JOIN 实现not in-- 要实现这个,key不在B表
select a.key from a where key not in(select key from b)
-- 改写为 左连接key,找出B表key为null的
select a.key from a left outer join b on a.key=b.key where b.key1 is null