0%

Hive之窗口函数

一、常见窗口函数函数

  1. OVER()
  • 指定分析函数工作的数据窗口大小,这个数据窗口的大小可能会随着行的变化而变化。
  1. CURRENT ROW
  • 当前行
  1. n PRECEDING
  • 往前n行数据
  1. n FOLLOWING
  • 往后n行数据
  1. UNBOUNDED
  • 起点,UNBOUNDED PRECEDING表示从前面的起点,UNBOUNDED FOLLOWING表示到后面的终点
  1. LAG(col, n)
  • 往前第n行数据
  1. LEAD(col, n)
  • 往后第n行数据
  1. NTILE(n)
  • 把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。

二、案例实操

2.1 数据准备 business表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
name,orderdate,cost
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94

2.2 查询在2017年4月份购买过的顾客及总人数

1
2
3
4
select name, count(*) over()
from business
where substring(ordergate, 1, 7) = '2017-04'
group by name;

2.3 查询顾客的购买明细及月购买总额

1
2
select name, orderdate, cost, sum(cost) over(partition by substring(ordergate, 1, 7))
from business

2.4 上述的场景,要将cost按照日期进行累加

1
2
3
4
5
6
7
8
9
select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按name分组,组内 数据相加
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行
from business;

2.5 查询顾客上次的购买时间

1
2
select name, orderdate, cost, lag(orderdate, 1, '1900-01-01') over(partition by name, order by orderdate) as time1
from business;

2.6 查询前20%时间的订单信息

1
2
3
select *
from (select name,orderdate,cost, ntile(5) over(order by orderdate) sorted from business)t
where sorted = 1;