前言
本次目的:用Python采集微博视频
随时随地发现新鲜事!微博带你欣赏世界上每一个精彩瞬间,了解每一个幕后故事.
分享你想表达的,让全世界都能听到你的心声!今天我们通过python去采集微博当中好看的视频.
知识点:
requests 第三方模块 pip install requests
post请求方式
开发者工具的使用
开发环境:
版 本:python 3.8
编辑器:pycharm 2021.2
思路分析
https://www.weibo.com/tv/api/component?page=%2Ftv%2Fchannel%2F4379160563414111%2Feditor
代码实现
1. 发送请求
2. 获取数据
3. 解析数据
4. 发送请求 针对于生成视频链接接口
5. 获取数据
6. 解析数据
7. 保存数据
开始我们的代码
导入模块
import requests
import re
加一个小伪装
headers = {
'cookie': 'SUB=_2AkMWuiaof8NxqwJRmfEcxW7kZYV1zQHEieKg5tdzJRMxHRl-yT8XqmlbtRB6PToIR8vzOUazMyBaDx1yoAhoGvmhBh2R; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WFhP5UbeyRGEMWCEO66rKKN; SINAGLOBAL=4378435525987.705.1642506657635; UOR=,,www.baidu.com; YF-V-WEIBO-G0=35846f552801987f8c1e8f7cec0e2230; _s_tentry=www.baidu.com; Apache=3198609812447.024.1647671292904; ULV=1647671293014:4:2:2:3198609812447.024.1647671292904:1647496624245; XSRF-TOKEN=ZPnKMpYcxCvUsgDmUWvm7Jwi',
'origin': 'https://www.weibo.com',
'page-referer': '/tv/channel/4379160563414111/editor',
'referer': 'https://www.weibo.com/tv/channel/4379160563414111/editor',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36',
'x-xsrf-token': 'ZPnKMpYcxCvUsgDmUWvm7Jwi',
}
def get_json(next_cursor):
url = 'https://www.weibo.com/tv/api/component?page=%2Ftv%2Fchannel%2F4379160563414111%2F4379160563414139'
data = {
'data': '{"Component_Channel_Subchannel":{"cid":"4379160563414139"}}'
}
if next_cursor != -1:
data = {
'data': '{"Component_Channel_Subchannel":{"next_cursor":' + str(next_cursor) +', "cid":"4379160563414139"}}'
}
1. 发送请求
response = requests.post(url, headers=headers, data=data)
2. 获取数据
json_data = response.json()
if json_data['data']['Component_Channel_Subchannel'] != None:
next_cursor = json_data['data']['Component_Channel_Subchannel']['next_cursor']
if next_cursor == None:
return 0
else:
return 0
3. 解析数据
data_list = json_data['data']['Component_Channel_Subchannel']['list']
for data in data_list:
title = data['title'] + str(data['media_id'])
title = re.sub(r'[\/:*?"<>|]', '', title)
oid = data['oid']
# print(title, oid)
info_url = 'https://www.weibo.com/tv/api/component?page=' + oid
data_1 = {
'data': '{"Component_Play_Playinfo":{"oid":"'+oid+'"}}'
}
4. 发送请求 针对于生成视频链接接口
response_1 = requests.post(info_url, headers=headers, data=data_1)
5. 获取数据
json_data_1 = response_1.json()
print(json_data_1)
6. 解析数据
if json_data_1['data']['Component_Play_Playinfo'] != None:
dict_urls = json_data_1['data']['Component_Play_Playinfo']['urls']
# dict_urls.keys(): 获取所有的键名称
# list(): 转成了列表 [0] list(dict_urls.keys())[0]: '高清 1080P'
# dict_urls[ist(dict_urls.keys())[0]]: 最高清画质的视频链接
video_sub = dict_urls[list(dict_urls.keys())[0]]
video_url = 'https:' + video_sub
print(title,video_url)
7. 保存数据
video_data = requests.get(url=video_url).content
with open(f'video/{title}.mp4', mode='wb') as f:
f.write(video_data)
get_json(next_cursor)
get_json(-1)
更多资源、解答可点击
尾语
好了,我的这篇文章写到这里就结束啦!
有更多建议或问题可以评论区或私信我哦!一起加油努力叭(? ?_?)?
喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!
|