窗口函数实战(三)
本篇文章是笔者在牛客网上摘选的几道比较有挑战性的SQL窗口函数编程题。
1.近三个月未完成试卷数为0的用户完成情况【中等】(原题链接)
首先来分析一下题目的查询要求。
- 用户近三个有试卷作答记录的月份(可以用窗口函数解决)。
- 并且这三个月中,不能有试卷是未完成状态
- 统计这三个月的试卷完成数
下面先给出一种解法。
with a as (
select uid, exam_id, start_time, score,
dense_rank() over (partition by uid order by date_format(start_time,'%Y-%m') desc) as start_time_rank
from exam_record
)
select uid, count(score) as exam_complete_cnt
from a
where uid not in (
select uid
from (
select *
from a
where start_time_rank <= 3
) b
where score is null
) and start_time_rank <= 3
group by uid
order by exam_complete_cnt desc, uid desc
可以看到,上面这种解法在寻找满足条件的 uid 比较麻烦。其实,一个 having count(score) = count(uid) 就可以解决。
select uid, count(score) as exam_complete_cnt
from (
select uid, start_time, score,
dense_rank() over(partition by uid order by date_format(start_time, '%Y%m') desc) as start_time_rank
from exam_record
) a
where start_time_rank <= 3
group by uid
having count(score) = count(uid)
order by exam_complete_cnt desc, uid desc
2.连续两次作答试卷的最大时间窗【较难】(原题链接)
我们来分析一下题目的要求。
- 2021年,至少有 2 天作答过试卷的用户。
- 该年,连续两次作答试卷的最大时间窗
days_window 。 - 根据该年的历史规律,他在
days_window 里平均会做多少套试卷。
用户 1006 分别在 20210901、20210906、20210907 作答过 3 次试卷,连续两次作答最大时间窗为 6 天(1 号到 6 号),他 1 号到 7 号这 7 天里共做了 3 张试卷,平均每天 3/7=0.428571 张,那么 6 天里平均会做 0.428571*6=2.57 张试卷(保留两位小数);
下面给出一种解法。
with a as (
select uid, start_time,
DATEDIFF( LAG(start_time, 1) over ( partition by uid order by start_time DESC ), start_time ) + 1 as days_win
from exam_record
where year(start_time) = 2021
)
select uid, days_window, round(days_window * avg_exam, 2) as avg_exam_cnt
from (
select uid, max(days_win) as days_window,
count(start_time) / (DATEDIFF(max(start_time),min(start_time)) + 1) as avg_exam
from a
group by uid
having days_window >= 2
) b
order by days_window desc, avg_exam_cnt desc
- 首先利用
LAG 函数得到上一次答题的时间; - 再利用
DATEDIFF 得到两次答题之间的窗口大小(注意 +1); - 剩下的计算就比较简单了。
3.第二快/慢用时之差大于试卷时长一半的试卷【较难】(原题链接)
题目的要求很简单:找到 第二快和第二慢用时之差 大于 试卷时长的一半 的试卷信息,按试卷ID降序排序。
以下是我给出的一种解法。
SELECT distinct exam_id, duration, release_time
from (
select exam_id, duration, release_time,
nth_value(duration_time, 2) over (partition by exam_id order by duration_time desc) as max2_duration_time,
nth_value(duration_time, 2) over (partition by exam_id order by duration_time) as min2_duration_time
from (
select exam_record.exam_id, duration, release_time,
(unix_timestamp(submit_time) - unix_timestamp(start_time)) / 60 as duration_time
from exam_record
left join examination_info using(exam_id)
where submit_time is not null
) a
) b
where max2_duration_time is not null and min2_duration_time is not NULL
and (max2_duration_time - min2_duration_time) > 0.5 * duration
order by exam_id desc
另外给出一种解法。
SELECT exam_id, duration, release_time
from (
select exam_id, duration, release_time,
sum( case when duration_time_asc = 2 then duration_time
when duration_time_desc = 2 then -duration_time
else 0 end ) as sub
from (
select exam_record.exam_id, duration, release_time,
timestampdiff(second, start_time, submit_time) as duration_time,
row_number() over (partition by exam_id order by timestampdiff(second, start_time, submit_time) desc) as duration_time_asc,
row_number() over (partition by exam_id order by timestampdiff(second, start_time, submit_time)) as duration_time_desc
from exam_record
left join examination_info using(exam_id)
where submit_time is not null
) a
group by exam_id
) b
where sub > duration * 60 * 0.5
order by exam_id desc
- 对比第一种方法用的
nth_value ,上面这种做法使用的是 row_number 。 - 在计算
第二快和第二慢用时之差 的处理也不一样。
4.每类试卷得分前3名【中等】(原题链接)
此题难度不大,要求也不复杂:找到每类试卷得分的前 3 名,如果两人最大分数相同,选择最小分数大者,如果还相同,选择 uid 大者。
有作答得分记录的试卷 tag 有 SQL 和算法,SQL 试卷用户 1001、1002、1003、1004 有作答得分,最高得分分别为 81、81、89、85,最低得分分别为 78、81、86、40,因此先按最高得分排名再按最低得分排名取前三为 1003、1004、1002。
select *
from (
select tag as tid, uid,
row_number() over (partition by tag order by tag, max(score) desc, min(score) desc, uid desc) as ranking
from exam_record
left join examination_info on exam_record.exam_id = examination_info.exam_id
group by examination_info.tag, exam_record.uid
) a
where ranking <=3
|