IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 【烂活】斯坦福句法解析库使用小结+最新四月新番下载(以辉夜与阿尼亚为例) -> 正文阅读

[人工智能]【烂活】斯坦福句法解析库使用小结+最新四月新番下载(以辉夜与阿尼亚为例)

作者:token comment

序言

前排提示本文是挂羊头卖狗肉,正文在第二部分,第一部分纯属为了过审凑字数。



1 斯坦福句法解析库(句法树、依存关系图)使用概述

关于NLTK里斯坦福的句法解析模块,最近报警告说即将被弃用,最新版将被nltk.parse.corenlp.StanforCoreNLPParser模块取代,关于CoreNLP可以去斯坦福软件里下载JAR包,目前看至少依存分析和句法树是可行的,这两个也是最有用的,NER也能用,虽然分词和词性标注会报错,但是这两个也不必用非要用斯坦福的,有很多其他资源可以用,中文可以用jieba,英文的话nltk里就有内置的分词包和词性标注包,目前StanforCoreNLPParser还没搞清楚具体用法,近期会发布关于如何使用斯坦福JAR包详细教程。

从上面的链接中下载得到的几个JAR包如下图所示:

3

其中stanford-parser-full-2020-11-17是最重要的一个包,可以用于生成句法树和依存关系图,然后stanford-corenlp-4.4.0可能算是其他各个包的一个集成,但是我看下来里面的模型要缺很多,比如解析包的模型只有英文,而事实上前者中有包括中文在内的各种语言解析包。关于这些包的具体使用代码如下所示,其中一部分参考自https://www.cnblogs.com/baiboy/p/nltk1.html

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@163.sufe.edu.cn

# 2022/06/10 13:16:34 目前NLTK3.3.0
def segmenter_demo():
	# 2022/06/10 13:16:51 无法成功运行, 不知道为什么
	from nltk.tokenize.stanford_segmenter import StanfordSegmenter
	segmenter = StanfordSegmenter(
		path_to_jar=r'D:\data\stanford\software\stanford-segmenter-2020-11-17\stanford-segmenter-4.2.0.jar',
		# slf4j这个参数在stanford-segmenter-2020-11-17里找不到, 但是在stanford-parser-full-2020-11-17和stanford-corenlp-4.4.0里都有
		path_to_slf4j=r'D:\data\stanford\software\stanford-parser-full-2020-11-17\slf4j-api.jar',
		path_to_sihan_corpora_dict=r'D:\data\stanford\software\stanford-segmenter-2020-11-17\data',
		path_to_model=r'D:\data\stanford\software\stanford-segmenter-2020-11-17\data\pku.gz',
		path_to_dict=r'D:\data\stanford\software\stanford-segmenter-2020-11-17\data\dict-chris6.ser.gz',
	)
	string = u'我在博客园开了一个博客,我的博客名叫伏草惟存,写了一些自然语言处理的文章。'
	result = segmenter.segment(string)
	print(result)
	return result
	
def tokenizer_demo():
	# 2022/06/10 13:15:03 无法运行 nltk.tokenize 已经被弃用了
	from nltk.tokenize import StanfordTokenizer
	tokenizer = StanfordTokenizer(path_to_jar=r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser.jar')
	sent = 'Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\nThanks.'
	result = tokenizer.tokenize(sent)
	return result


def ner_tagger_demo():
	# 2022/06/10 13:16:56 可以运行英文, 但是中文的缺少模型jar包
	from nltk.tag import StanfordNERTagger
	eng_tagger = StanfordNERTagger(model_filename=r'D:\data\stanford\software\stanford-ner-2020-11-17\classifiers\english.all.3class.distsim.crf.ser.gz',
								   path_to_jar=r'D:\data\stanford\software\stanford-ner-2020-11-17\stanford-ner.jar')

	result = eng_tagger.tag('Rami Eid is studying at Stony Brook University in NY'.split())
	print(result)
	# chi_tagger = StanfordNERTagger(model_filename=r'D:\data\stanford\software\stanford-ner-2020-11-17\classifiers\chinese.misc.distsim.crf.ser.gz',
								   # path_to_jar=r'D:\data\stanford\software\stanford-ner-2020-11-17\stanford-ner.jar')
	# for word, tag in  chi_tagger.tag(result.split()):
		# print(word,tag)
	return result

def pos_tagger_demo():
	# 2022/06/10 13:17:35 通过测试
	from nltk.tag import StanfordPOSTagger
	eng_tagger = StanfordPOSTagger(model_filename=r'D:\data\stanford\software\stanford-postagger-full-2020-11-17\models\english-bidirectional-distsim.tagger',
								   path_to_jar=r'D:\data\stanford\software\stanford-postagger-full-2020-11-17\stanford-postagger.jar')
	print(eng_tagger.tag('What is the airspeed of an unladen swallow ?'.split()))
	
	chi_tagger = StanfordPOSTagger(model_filename=r'D:\data\stanford\software\stanford-postagger-full-2020-11-17\models\chinese-distsim.tagger',
								   path_to_jar=r'D:\data\stanford\software\stanford-postagger-full-2020-11-17\stanford-postagger.jar')
	result = '四川省 成都 信息 工程 大学 我 在 博客 园 开 了 一个 博客 , 我 的 博客 名叫 伏 草 惟 存 , 写 了 一些 自然语言 处理 的 文章 。\r\n'
	print(chi_tagger.tag(result.split()))
	
def dependency_demo():
	# 2022/06/10 13:21:17 通过测试
	from nltk.parse.stanford import StanfordDependencyParser
	eng_parser = StanfordDependencyParser(r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser.jar',
										  r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser-4.2.0-models.jar',
										  r'D:\data\stanford\software\stanford-parser-full-2020-11-17\englishPCFG.ser.gz')
	res = list(eng_parser.parse('the quick brown fox jumps over the lazy dog'.split()))
	for row in res[0].triples():
		print(row)

	chi_parser = StanfordDependencyParser(r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser.jar',
										  r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser-4.2.0-models.jar',
										  model_path=r'D:\data\stanford\software\stanford-parser-full-2020-11-17\chinesePCFG.ser.gz')		# 这个文件要从stanford-parser-4.2.0-models.jar中解压出来得到
	res = list(eng_parser.parse('我 和 他 是 朋友'.split()))
	print(list(res[0].triples()))
	print('#' * 64)
	for row in res[0].triples():
		print(row)
	
def parse_tree_demo():	
	# 2022/06/10 13:21:17 通过测试
	from nltk.parse.stanford import StanfordParser	
	parser = StanfordParser(r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser.jar',
							r'D:\data\stanford\software\stanford-parser-full-2020-11-17\stanford-parser-4.2.0-models.jar',
							model_path=r'D:\data\stanford\software\stanford-parser-full-2020-11-17\chinesePCFG.ser.gz')		# 这个文件要从stanford-parser-4.2.0-models.jar中解压出来得到
	parse_tree = list(parser.parse(['我', '和', '他', '是', '朋友']))
	print(parse_tree)
	return parse_tree

# segmenter_demo()
# tokenizer_demo()
# ner_tagger_demo()
# pos_tagger_demo()
# dependency_demo()
# parse_tree_demo()

目前更新nltk到最新版(3.7.0),可以使用corenlp模块,但是发现它调用的是远程接口,因而无需下载jar包到本地,但是容易连不上远程服务器。感觉是斯坦福不准备开放它们的解析包,而是封装成接口,看注释部分效果还挺fancy:

class CoreNLPParser(GenericCoreNLPParser)
 |  CoreNLPParser(url='http://localhost:9000', encoding='utf8', tagtype=None)
 |
 |  >>> parser = CoreNLPParser(url='http://localhost:9000')
 |
 |  >>> next(
 |  ...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
 |  ... ).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
 |                       ROOT
 |                        |
 |                        S
 |         _______________|__________________________
 |        |                         VP               |
 |        |                _________|___             |
 |        |               |             PP           |
 |        |               |     ________|___         |
 |        NP              |    |            NP       |
 |    ____|__________     |    |     _______|____    |
 |   DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |   |    |     |    |    |    |    |       |    |   |
 |  The quick brown fox jumps over the     lazy dog  .

另外stanza包同理,也是需要调用远程接口方能调用,API文档在https://stanfordnlp.github.io/stanza/index.html,笔者私以为有上面那个解析包应该差不多就够用了,这个stanza不搭梯子用起来也经常会失败。


2 烂活(可能对追番的朋友有用)

忙里偷闲分享一个烂活。

最近在B站追《辉夜大小姐想让人告白第三季》和《间谍过家家》,实话说以前的B站新番还是能做到跟动画发布商同步更新,零氪党追番也就是只比大会员慢一周少看一集而已,总归是可以忍受。现在的B站各种骚操作,更新巨慢也就算了,各种圣光、暗牧、删减,有些敏感片段还要自己亲自作画重改,实在是让人难以接受,若不是B站还有仅存的弹幕氛围,谁TM还在B站追番。

然后笔者找到了这个:蚂蚁Tube@动画板块

目前基本上所有的四月新番都在持续更新,过往的老番也比较,当然除了动画以外,还有电影、电视剧、综艺的资源,应该说是非常nice了。

经常光顾这种免费站点的人肯定都知道,这类站点的通病就是视频加载巨慢,而且经常会看到一半就完全宕机了,这可实在是太糟心了,所以笔者想能不能直接把视频下载到本地来观看。

其实这件事并不复杂,比B站视频的爬取要简单很多,这里就顺手把B站视频爬虫的脚本挂在下面(因为笔者也是借鉴别人的代码做了一些修改,试着运行主体部分的几个示例,应该还是非常清晰的,截至本文发布仍然可用,注释较为详细,这里是可以直接用番剧的episodeid去直接下载整部番剧的,当然要需要大会员的必须得有大会员的账号,这里用的Cookie是笔者本人的账号,目前应该已经失效了,需要的可以自己网页端登录一下账号然后把Cookie拷贝过来):

# -*- coding: utf-8 -*-
# @author: caoyang
# @email: caoyang@163.sufe.edu.cn
# https://github.com/iawia002/annie

import os
import re
import json
import requests
from tqdm import tqdm

class BiliBiliCrawler(object):
	
	def __init__(self) -> None:				
		self.user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
		self.video_webpage_link = 'https://www.bilibili.com/video/{}'.format
		self.video_detail_api = 'https://api.bilibili.com/x/player/pagelist?bvid={}&jsonp=jsonp'.format						
		self.video_playurl_api = 'https://api.bilibili.com/x/player/playurl?cid={}&bvid={}&qn=64&type=&otype=json'.format	
		self.episode_playurl_api = 'https://api.bilibili.com/pgc/player/web/playurl?ep_id={}&jsonp=jsonp'.format			
		self.episode_webpage_link = 'https://www.bilibili.com/bangumi/play/ep{}'.format
		self.anime_webpage_link = 'https://www.bilibili.com/bangumi/play/ss{}'.format
		self.chunk_size = 1024
		self.regexs = {
			'host': 'https://(.*\.com)',
			'episode_name': r'meta name="keywords" content="(.*?)"',
			'initial_state': r'<script>window.__INITIAL_STATE__=(.*?);',
			'playinfo': r'<script>window.*?__playinfo__=(.*?)</script>',	
		}

	def easy_download_video(self, bvid, save_path=None) -> bool:
		"""Tricky method with available api"""
		
		# Request for detail information of video
		response = requests.get(self.video_detail_api(bvid), headers={'User-Agent': self.user_agent})
		json_response = response.json()
		
		cid = json_response['data'][0]['cid']
		video_title = json_response['data'][0]['part']
		if save_path is None:
			save_path = f'{video_title}.mp4'		

		print(f'Video title: {video_title}')
		
		# Request for playurl and size of video
		response = requests.get(self.video_playurl_api(cid, bvid), headers={'User-Agent': self.user_agent})
		json_response = response.json()
		video_playurl = json_response['data']['durl'][0]['url']
		# video_playurl = json_response['data']['durl'][0]['backup_url'][0]
		video_size = json_response['data']['durl'][0]['size']
		total = video_size // self.chunk_size

		print(f'Video size: {video_size}')
		
		# Download video
		headers = {
			'User-Agent': self.user_agent,
			'Origin'	: 'https://www.bilibili.com',
			'Referer'	: 'https://www.bilibili.com',			
		}
		headers['Host'] = re.findall(self.regexs['host'], video_playurl, re.I)[0]
		headers['Range'] = f'bytes=0-{video_size}'
		response = requests.get(video_playurl, headers=headers, stream=True, verify=False)
		tqdm_bar = tqdm(response.iter_content(self.chunk_size), desc='Download process', total=total)
		with open(save_path, 'wb') as f:
			for byte in tqdm_bar:
				f.write(byte)
		return True

	def easy_download_episode(self, epid, save_path=None) -> bool:
		"""Tricky method with available api"""
		
		# Request for playurl and size of episode
		
		# temp_headers = {
			# "Host": "api.bilibili.com",
			# "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0",
			# "Accept": "application/json, text/plain, */*",
			# "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
			# "Accept-Encoding": "gzip, deflate, br",
			# "Referer": "https://www.bilibili.com/bangumi/play/ep234407?spm_id_from=333.337.0.0",
			# "Origin": "https://www.bilibili.com",
			# "Connection": "keep-alive",
			# "Cookie": "innersign=0; buvid3=3D8F234E-5DAF-B5BD-1A26-C7CDE57C21B155047infoc; i-wanna-go-back=-1; b_ut=7; b_lsid=1047C7449_1808035E0D6; _uuid=A4884E3F-BF68-310101-E5E6-10EBFDBCC10CA456283infoc; buvid_fp=82c49016c72d24614786e2a9e883f994; buvid4=247E3498-6553-51E8-EB96-C147A773B34357718-022050123-7//HOhRX5o4Xun7E1GZ2Vg%3D%3D; fingerprint=1b7ad7a26a4a90ff38c80c37007d4612; sid=jilve18q; buvid_fp_plain=undefined; SESSDATA=f1edfaf9%2C1666970475%2Cf281c%2A51; bili_jct=de9bcc8a41300ac37d770bca4de101a8; DedeUserID=130321232; DedeUserID__ckMd5=42d02c72aa29553d; nostalgia_conf=-1; CURRENT_BLACKGAP=1; CURRENT_FNVAL=4048; CURRENT_QUALITY=0; rpdid=|(u~||~uukl)0J'uYluRu)l|J",
			# "Sec-Fetch-Dest": "empty",
			# "Sec-Fetch-Mode": "cors",
			# "Sec-Fetch-Site": "same-site",
			# "TE": "trailers",
		# }
		# response = requests.get(self.episode_playurl_api(epid), headers=temp_headers)
		
		# 2022/05/01 23:31:08 上面是带大会员的下载方式, 可以下载大会员可看的番剧
		response = requests.get(self.episode_playurl_api(epid))
		json_response = response.json()
		# episode_playurl = json_response['result']['durl'][0]['url']
		episode_playurl = json_response['result']['durl'][0]['backup_url'][0]
		episode_size = json_response['result']['durl'][0]['size']
		total = episode_size // self.chunk_size

		print(f'Episode size: {episode_size}')
		
		# Download episode
		# 2022/05/01 23:31:41 大会员最好加入下面的cookie, 但是我不确信是否去掉还能不能可以
		headers = {
			'User-Agent': self.user_agent,
			'Origin'	: 'https://www.bilibili.com',
			'Referer'	: 'https://www.bilibili.com',	
			# 'Cookie'	: "innersign=0; buvid3=3D8F234E-5DAF-B5BD-1A26-C7CDE57C21B155047infoc; i-wanna-go-back=-1; b_ut=7; b_lsid=1047C7449_1808035E0D6; _uuid=A4884E3F-BF68-310101-E5E6-10EBFDBCC10CA456283infoc; buvid_fp=82c49016c72d24614786e2a9e883f994; buvid4=247E3498-6553-51E8-EB96-C147A773B34357718-022050123-7//HOhRX5o4Xun7E1GZ2Vg%3D%3D; fingerprint=1b7ad7a26a4a90ff38c80c37007d4612; sid=jilve18q; buvid_fp_plain=undefined; SESSDATA=f1edfaf9%2C1666970475%2Cf281c%2A51; bili_jct=de9bcc8a41300ac37d770bca4de101a8; DedeUserID=130321232; DedeUserID__ckMd5=42d02c72aa29553d; nostalgia_conf=-1; CURRENT_BLACKGAP=1; CURRENT_FNVAL=4048; CURRENT_QUALITY=0; rpdid=|(u~||~uukl)0J'uYluRu)l|J",
		}
		headers['Host'] = re.findall(self.regexs['host'], episode_playurl, re.I)[0]
		headers['Range'] = f'bytes=0-{episode_size}'
		response = requests.get(episode_playurl, headers=headers, stream=True, verify=False)
		tqdm_bar = tqdm(response.iter_content(self.chunk_size), desc='Download process', total=total)
		if save_path is None:
			save_path = f'ep{epid}.mp4'
		with open(save_path, 'wb') as f:
			for byte in tqdm_bar:
				f.write(byte)
		return True

	def download(self, bvid, video_save_path=None, audio_save_path=None) -> dict:
		"""General method by parsing page source"""
		
		if video_save_path is None:
			video_save_path = f'{bvid}.m4s'
		if audio_save_path is None:
			audio_save_path = f'{bvid}.mp3'
		
		common_headers = {
			'Accept'			: '*/*',
			'Accept-encoding'	: 'gzip, deflate, br',
			'Accept-language'	: 'zh-CN,zh;q=0.9,en;q=0.8',
			'Cache-Control'		: 'no-cache',
			'Origin'			: 'https://www.bilibili.com',
			'Pragma'			: 'no-cache',
			'Host'				: 'www.bilibili.com',
			'User-Agent'		: self.user_agent,
		}

		# In fact we only need bvid
		# Each episode of an anime also has a bvid and a corresponding bvid-URL which is redirected to another episode link
		# e.g. https://www.bilibili.com/video/BV1rK4y1b7TZ is redirected to https://www.bilibili.com/bangumi/play/ep322903
		response = requests.get(self.video_webpage_link(bvid), headers=common_headers)
		html = response.text
		playinfos = re.findall(self.regexs['playinfo'], html, re.S)
		if not playinfos:
			raise Exception(f'No playinfo found in bvid {bvid}\nPerhaps VIP required')
		playinfo = json.loads(playinfos[0])
		
		# There exists four different URLs with observations as below
		# `baseUrl` is the same as `base_url` with string value
		# `backupUrl` is the same as `backup_url` with array value
		# Here hard code is employed to select playurl
		def _select_video_playurl(_videoinfo):
			if 'backupUrl' in _videoinfo:
				return _videoinfo['backupUrl'][-1]
			if 'backup_url' in _videoinfo:
				return _videoinfo['backup_url'][-1]
			if 'baseUrl' in _videoinfo:
				return _videoinfo['baseUrl']
			if 'base_url' in _videoinfo:
				return _videoinfo['base_url']	
			raise Exception(f'No video URL found\n{_videoinfo}')	
			
		def _select_audio_playurl(_audioinfo):
			if 'backupUrl' in _audioinfo:
				return _audioinfo['backupUrl'][-1]
			if 'backup_url' in _audioinfo:
				return _audioinfo['backup_url'][-1]
			if 'baseUrl' in _audioinfo:
				return _audioinfo['baseUrl']
			if 'base_url' in _audioinfo:
				return _audioinfo['base_url']
			raise Exception(f'No audio URL found\n{_audioinfo}')
		
		# with open(f'playinfo-{bvid}.js', 'w') as f:
			# json.dump(playinfo, f)

		if 'durl' in playinfo['data']:
			video_playurl = playinfo['data']['durl'][0]['url']
			# video_playurl = playinfo['data']['durl'][0]['backup_url'][1]
			print(video_playurl)
			video_size = playinfo['data']['durl'][0]['size']
			total = video_size // self.chunk_size
			print(f'Video size: {video_size}')
			headers = {
				'User-Agent': self.user_agent,
				'Origin'	: 'https://www.bilibili.com',
				'Referer'	: 'https://www.bilibili.com',			
			}
			headers['Host'] = re.findall(self.regexs['host'], video_playurl, re.I)[0]
			headers['Range'] = f'bytes=0-{video_size}'
			# headers['Range'] = f'bytes={video_size + 1}-{video_size + video_size + 1}'
			response = requests.get(video_playurl, headers=headers, stream=True, verify=False)
			tqdm_bar = tqdm(response.iter_content(self.chunk_size), desc='Download process', total=total)
			with open(video_save_path, 'wb') as f:
				for byte in tqdm_bar:
					f.write(byte)
			return True

		elif 'dash' in playinfo['data']:
			videoinfo = playinfo['data']['dash']['video'][0]
			audioinfo = playinfo['data']['dash']['audio'][0]
			video_playurl = _select_video_playurl(videoinfo)
			audio_playurl = _select_audio_playurl(audioinfo)

		else:
			raise Exception(f'No data found in playinfo\n{playinfo}')

		# First make a fake request to get the `Content-Range` params in response headers
		fake_headers = {
			'Accept'			: '*/*',
			'Accept-Encoding'	: 'identity',
			'Accept-Language'	: 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
			'Accept-Encoding'	: 'gzip, deflate, br',
			'Cache-Control'		: 'no-cache',
			'Origin'			: 'https://www.bilibili.com',
			'Pragma'			: 'no-cache',
			'Range'				: 'bytes=0-299',
			'Referer'			: self.video_webpage_link(bvid),
			'User-Agent'		: self.user_agent,
			'Connection'		: 'keep-alive',
		}
		response = requests.get(video_playurl, headers=fake_headers, stream=True)
		video_size = int(response.headers['Content-Range'].split('/')[-1])
		total = video_size // self.chunk_size
		
		# Next make a real request to download full video
		real_headers = {
			'Accept'			: '*/*',
			'accept-encoding'	: 'identity',
			'Accept-Language'	: 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
			'Accept-Encoding'	: 'gzip, deflate, br',
			'cache-control'		: 'no-cache',
			'Origin'			: 'https://www.bilibili.com',
			'pragma'			: 'no-cache',
			'Range'				: f'bytes=0-{video_size}',
			'Referer'			: self.video_webpage_link(bvid),
			'User-Agent'		: self.user_agent,
			'Connection'		: 'keep-alive',
		}
		response = requests.get(video_playurl, headers=real_headers, stream=True)
		tqdm_bar = tqdm(response.iter_content(self.chunk_size), desc='Download video', total=total)
		with open(video_save_path, 'wb') as f:
			for byte in tqdm_bar:
				f.write(byte)
				
		# The same way for downloading audio
		response = requests.get(audio_playurl, headers=fake_headers, stream=True)
		audio_size = int(response.headers['Content-Range'].split('/')[-1])
		total = audio_size // self.chunk_size // 2
		
		# Confusingly downloading full audio at one time is forbidden
		# We have to download audio in two parts
		with open(audio_save_path, 'wb') as f:
			audio_part = 0
			for (_from, _to) in [[0, audio_size // 2], [audio_size // 2 + 1, audio_size]]:
				headers = {
					'Accept': '*/*',
					'Accept-Encoding': 'identity',
					'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
					'Accept-Encoding': 'gzip, deflate, br',
					'Cache-Control': 'no-cache',
					'Origin': 'https://www.bilibili.com',
					'Pragma': 'no-cache',
					'Range': f'bytes={_from}-{_to}',
					'Referer': self.video_webpage_link(bvid),
					'User-Agent': self.user_agent,
					'Connection': 'keep-alive',
				}
				audio_part += 1
				response = requests.get(audio_playurl, headers=headers, stream=True)
				tqdm_bar = tqdm(response.iter_content(self.chunk_size), desc=f'Download audio part{audio_part}', total=total)
				for byte in tqdm_bar:
					f.write(byte)
		return True

	def easy_download(self, url) -> bool:
		"""
		Download with page URL as below:
		>>> url = 'https://www.bilibili.com/video/BV1jf4y1h73r'
		>>> url = 'https://www.bilibili.com/bangumi/play/ep399420'
		"""

		headers = {
			'Accept': '*/*',
			'Accept-Encoding': 'gzip, deflate, br',
			'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
			'Cache-Control': 'no-cache',
			'Origin': 'https://www.bilibili.com',
			'Pragma': 'no-cache',
			'Host': 'www.bilibili.com',
			'User-Agent': self.user_agent,
		}		
		response = requests.get(url, headers=headers)
		html = response.text
		initial_states = re.findall(self.regexs['initial_state'], html, re.S)
		if not initial_states:
			raise Exception('No initial states found in page source')
		initial_state = json.loads(initial_states[0])
		
		# Download anime with several episodes
		episode_list = initial_state.get('epList')
		if episode_list is not None:
			name = re.findall(self.regexs['episode_name'], html, re.S)[0].strip()
			for episode in episode_list:
				if episode['badge'] != '会员':							 # No VIP required
					if not os.path.exists(name):
						os.mkdir(name)
					self.download(
						bvid=str(episode['bvid']),
						video_save_path=os.path.join(name, episode['titleFormat'] + episode['longTitle'] + '.m4s'),
						audio_save_path=os.path.join(name, episode['titleFormat'] + episode['longTitle'] + '.mp3'),
					)
				else:													 # Unable to download VIP anime
					continue
		
		# Download common videos
		else:
			video_data = initial_state['videoData']
			name = video_data['tname'].strip()
			if not os.path.exists(name):
				os.mkdir(name)
			self.download(
				bvid=str(episode['bvid']),
				video_save_path=os.path.join(name, video_data['title'] + '.m4s'),
				audio_save_path=os.path.join(name, video_data['title'] + '.mp3'),
			)
		return True


if __name__ == '__main__':
	
	bb = BiliBiliCrawler()

	# bb.easy_download_video('BV14T4y1u7ST', 'temp/BV14T4y1u7ST.mp4')
	# bb.easy_download_video('BV1z5411W7tX', 'temp/BV1z5411W7tX.mp4')
	# bb.easy_download_video('BV1HX4y1T7Bz', 'temp/BV1HX4y1T7Bz.mp4')
	
	bb.easy_download_episode('234407', 'temp/ep234407.mp4')
	# bb.easy_download_episode('321808', 'temp/ep321808.mp4')
	
	# bb.download('BV1PT4y137CA')
	# bb.download('BV14T4y1u7ST')
	
	# bb.easy_download('https://www.bilibili.com/video/BV1jf4y1h73r')
	# bb.easy_download('https://www.bilibili.com/bangumi/play/ep399420')
	# bb.easy_download('https://www.bilibili.com/bangumi/play/ss12548/')

言归正传,笔者以下载《辉夜大小姐想让人告白第三季》第10集(截至本文发布的最新集,B站仅更新到第7集)简要描述一下下载蚂蚁Tube@动画板块视频的方法:

  1. 打开视频链接:https://mayitube.com/v_O5vUcy/9

  2. F12打开开发者工具,刷新页面,在网络一栏下面的筛选XHR请求信息,请务必找到一个发起者为hls.js(如下图红箭头指示的位置)的GET请求,并查看其响应(下图中右边的红色方框)中的URL

    1

  3. 新建标签页打开该URL:

    2

  4. 复制网页中的内容到下面代码的对应位置(7-290行):

    # -*- coding: UTF-8 -*-
    # @author: caoyang
    # @email: caoyang@163.sufe.edu.cn
    
    import requests
    
    string = """#EXTM3U
    #EXT-X-VERSION:3
    #EXT-X-TARGETDURATION:11
    #EXT-X-PLAYLIST-TYPE:VOD
    #EXT-X-MEDIA-SEQUENCE:0
    #EXTINF:10.474,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0licGJGWFlvLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL291RW5WV1I1LnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvZk1WYTZFYnoudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvTUFkZWM2ODMudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzRqWm9laWxkLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0RLdUpObnA1LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvajFNeE9ndmUudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvV0VrNnU3M3YudHM=.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3hsY2hBZW1LLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzVzZktXOTF5LnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvVm5PM3RYbHEudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvcXhkMUlDS3cudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0tzb1VjaE4xLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2tNZzVFd1FILnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaFN6SWdtZEMudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvRDY2anZBOXkudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1JKVE5zZ1JoLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzZjTUExVkFiLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvVnRTZUVwY08udHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUEMyZzRWblIudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0xZUWk3c3pxLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0JWMmVVblJzLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvNVBxOHB5c0kudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvNnMyYVl0cE8udHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzdYN21jMlM0LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzhlMWJGMzQ5LnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvb0ZPYnhEZjkudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMveDI0UG5rV1oudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0xhYkJzRWoyLnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2hEOFJBN0lRLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaGxmdTZwSGMudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvZ21MdjRUajkudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2NFVnRldlh1LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1FvS2VNblNlLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvN1N6N25kU0wudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvMGhTQ1NvMjUudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2V0cDAwb1BpLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2R4MmZlS3RRLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvcWhSdzRzR3IudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvMjkwdzBkRkYudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2RyS29PVzNULnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0Q0ZU93THJDLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvOVRtSXFBYTcudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvTXQ1QW05V2MudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0FQYmpuNlhjLnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzV6c0lIbTNoLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvZkZJb0FHdXEudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvOEI4MzdwRzUudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL21wUjgxbzZQLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzdhT1BlQXFNLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvVE9xR1BlZUgudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvSmtXNmF4Rm0udHM=.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzUxOFJQT1JDLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1JQQjhiMG9KLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvQ1BJOGx3V1AudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvS2dGVUdTdW8udHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3NWYnMwRUNhLnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1pEZG1MWVBlLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUTBwTkFBb2cudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaXpCM3ZrcVMudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL09BUmRZRGtNLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3h5WTRWNkR0LnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvWVNHYTNPdjQudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvMzRhNmNmZEcudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzdIbXVTbXhDLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0VNZGN1S0p4LnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaUtoUkxwNmwudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaFBtMXMzWHUudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2pLOXQ5ckpkLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzhOVlQzWk9jLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvc3BxN2dGdlMudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvWWM2cm8xbVIudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3JpaDBha0ZPLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3RPMXp5NVo2LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvc092Vko5TWMudHM=.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvWWJOQVlNQXcudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2xxSG05cmhDLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzZ4dE44TER0LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUmg3Z2pkSTAudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvRHJKTkJETG4udHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzExVHRqVE96LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0FZeUp4M093LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvbEtzSE9qTGgudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvc2xMa09uZTYudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2QzTFVhakFmLnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1R4MkRCRXhRLnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMva2tqMjlwUEQudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvQnJQNFBSYUEudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3h6ZTh4QTlHLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2JWMHhqUUJBLnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUUR5eG1IdmUudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaDZLUURKdEUudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3BPNFBMRktZLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2lIeFBZdFJSLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvQXk3dU9MREwudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvYTJQUVkxU00udHM=.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0FZb1B0blM1LnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1c1ME9Ba1RFLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvSW5mekVTSmYudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUDVNU2c3UkgudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3R2N0hIano2LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3o4U2tPbFRLLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvWmJjcWpZWDIudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvT25kRENlY2IudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL25mSFpCWnN0LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2VlTTR0TW91LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvWkxodEc0c2EudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvSGZjcktLc1EudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzhvN1NHQ3VULnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2d1bWJTWnBYLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUGRaRVFRSXQudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvMGZVN1pUVFMudHM=.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL3RFSURHOFNoLnRz.ts
    #EXTINF:10.428,
    https://server6.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0RnTXNnZHFGLnRz.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvNGYydUU5OTQudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvUmw0c3dYZjgudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1NDWkZzaXZILnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0RVTElFT2g4LnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvZG5TbjVqS2cudHM=.ts
    #EXTINF:10.428,
    https://server8.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvQXVUaFNJZnIudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2I2aWQwTDVuLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0RNR1dUZ3lvLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvaVg0Y21wRGIudHM=.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvbnNpYzJFcEcudHM=.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0JYNkd6RzFpLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzLzhzZkUyTm4xLnRz.ts
    #EXTINF:10.428,
    https://server3.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvYTNjcWpnTEIudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvYjZ4TmRxUWwudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL21tMXQxQmtlLnRz.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL1JDQWN6bUNrLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvZmozd01LSHIudHM=.ts
    #EXTINF:10.428,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvTVVLZHQ1Z1MudHM=.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL01QSk1Ja2NWLnRz.ts
    #EXTINF:10.428,
    https://server7.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2lNUU5ORnY0LnRz.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvOHF6cDZRVDAudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly92LmR1Ym9rdS5jby8yMDIyMDYxMS9mVzhnVnlCQy9obHMvdG92bVkzN1kudHM=.ts
    #EXTINF:10.428,
    https://server2.mayitube.com/video_source/aHR0cHM6Ly90cy52Ym9rdS5jb20vMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL0pva1RIM3diLnRz.ts
    #EXTINF:10.428,
    https://server5.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5uZXQvMjAyMjA2MTEvZlc4Z1Z5QkMvaGxzL2FFa1N5d3ROLnRz.ts
    #EXTINF:1.252,
    https://server4.mayitube.com/video_source/aHR0cHM6Ly93LmR1Ym9rdS5tZS8yMDIyMDYxMS9mVzhnVnlCQy9obHMvVUZPRjRIWWcudHM=.ts
    #EXT-X-ENDLIST"""
    
    # 用于将字符串形式的请求头转换为字典的工具函数
    def f(headers):
    	new_headers = {}
    	for _line in headers.splitlines():
    		key, value = _line.split(':', 1)
    		new_headers[key] = value.strip()
    	return new_headers
    
    headers = f("""Host: server8.mayitube.com
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
    Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
    Accept-Encoding: gzip, deflate, br
    Connection: keep-alive
    Cookie: _ga_NPW9Q6R88F=GS1.1.1655300402.2.1.1655300758.0; _ga=GA1.1.1556103515.1655291716; fpestid=OSMZ1DfZDFyJ5jijIXL1GWjkbvAbAsrZTrP91V-uIE25Nkd0zrEwhdZ0T9KIlsMTX1md5Q; _clck=1cp07q7|1|f2c|0; _clsk=1vuu0zr|1655300760540|4|1|f.clarity.ms/collect; bfp_sn_rf_8b2087b102c9e3e5ffed1c1478ed8b78=Direct; bfp_sn_rt_8b2087b102c9e3e5ffed1c1478ed8b78=1655293718453; bafp=1cadc1a0-ec9e-11ec-a6f4-25fd0ad0ea7e
    Upgrade-Insecure-Requests: 1
    Sec-Fetch-Dest: document
    Sec-Fetch-Mode: navigate
    Sec-Fetch-Site: none
    Sec-Fetch-User: ?1
    TE: trailers""")
    
    with open('video.ts', 'wb') as f:
    	count = 0
    	for line in string.splitlines():
    		if line.startswith('http'):
    			url = line.strip()
    			count += 1
    			print(_id, count, url)
    			while True:
    				try:
    					response = requests.get(url, headers=headers, timeout=60)
    					break
    				except:
    					print('error')
    					continue
    			f.write(response.content)
    
  5. 视频将会下载到video.ts文件中;

简单解释一下视频下载的逻辑,所有的关键在于第三步中的页面内容,可以发现页面中有若干个URL,这些对应的是大约10秒时长的视频内容,我们要做的就是将这些URL的响应字节全部写入到video.ts中即可,当然请求这些URL往往会出错(也就是为什么网页端看视频经常会崩溃),因此代码里做了一些鲁棒性的调整。

关于ts格式的视频如何播放,笔者用的播放器是PotPlayer(强烈安利,这个播放器非常nice),可以直接播放ts格式的视频,如果想要转换成常规的mp4格式,建议另寻方法。

经测试,该方法可以推广到其他番剧。

可能有人会觉得这样做还是过于复杂,其实注意到第2步红字中发起者为hls.js的那个GET请求了吗?如果我们能够直到这个请求的URL是如何得到的,那么即可实现全自动的下载,事实上笔者已经花了十几分钟查看hls.js中的内容(从网络一栏筛选JS请求即可看到),但是实在是太长(有15000行),没有能够看明白该请求的URL的文件ID是如何构造出来的,不过的确可以在浏览器里设置断点进行调试,不过这实在是过于复杂,因此只是暂且走到这一步,也不打算深究下去(主要原因是在笔者看JS的时间里,《辉夜大小姐》7-10集已经全部下好了,那还看个P代码)。

下载速度不算很快,但是也不会慢到哪里去,如果需要批量下载建议改写多进程,或者直接躲开几个窗口同时运行就好了。

总之如果没有其他资源可取,私以为蚂蚁Tube@动画板块是一个权宜之计,虽然下载得到的视频是有水印的,但是能动就行,还要啥自行车?!

烂活整完,搁笔。

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-06-16 21:42:28  更:2022-06-16 21:44:46 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/26 2:45:51-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码