Hive查询用到的语句记录

子查询

子查询需要一个别名

SELECT * FROM (SELECT A,B FROM atable) alaisName

group by

这样的语句会报错,提示a不在group by里面

select a,sum(b) from atable group by b

hive Expression Not In Group By Key
via:来源
解决方法是,使用 max、min、或者collect_set,如:

select max(a),sum(b) from atable group by b

group by 之后字段可以作为条件再继续查找

SELECT * FROM (SELECT SUM(A) ASUM FROM ATABLE GROUP BY A) SUBQ1 WHERE ASUM > 10

Join

默认join是内连接,还有left outer join 、right outer join和 full outer join

可以用case when … then … else …

语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END via:HIVE UDF整理(九) via:数据倾斜总结

select *
  from log a
  left outer join users b
  on case when a.user_id is null then concat(‘hive’,rand() ) else a.user_id end = b.user_id;

Not in

not in 在Hive0.8才支持,在之前的版本可以用left outer join代替 Hive使用LEFT OUTER JOIN 实现not in 
-- 要实现这个,key不在B表
select a.key from a where key not in(select key from b)
-- 改写为 左连接key,找出B表key为null的
select a.key from a left outer join b on a.key=b.key where b.key1 is null
 
updatedupdated2024-08-302024-08-30