天天看点

hive—尽量少用表连接JOIN,多用UNION ALL+group by +计算函数

hive表连接没有SQL强,所以hive多构造大宽表,而不是,多个小表之间的表连接。

hive表连接  join可以用,但是,效率低。

下面,举一个可以用UNION ALL+group by +计算函数,代替表连接的例子。

- 需求:2019年每个用户的支付和退款金额汇总

--union all

select a.user_name,

sum(a.total_amount),

sum(a.refund_amount)

from

(select user_name,

sum(pay_amount) total_amount,

0 as refund_amount

from user_trade

where year(dt)=2019

group by user_name

union all

select user_name,

0 as total_amount,

sum(refund_amount) refund_amount

from user_refund

group by user_name)a

group by a.user_name;

-- full join(表连接也可以,但是效率低)

select coalesce(a.user_name,b.user_name), 

if(a.total_amount is null, 0,a.total_amount),

if(b.refund_amount is null,0,b.refund_amount)

full join

group by user_name)b

on a.user_name=b.user_name;

PS:解释一下coalesce()函数

hive—尽量少用表连接JOIN,多用UNION ALL+group by +计算函数

生活其实也很简单,喜欢的就去争取、得到的就珍惜、失去的就忘记。