音乐播放器
简介:系统组成
具备用户安全管理、歌曲管理、收藏管理、歌曲推荐等模块
摘要
-
基于pyqt5 作为前端UI界面的开发语言 -
数据库数据格式为csv文本通过读取为list列表,进行索引操作 --可改进 -
音乐曲库为网易云华语部分歌单(5w+非重复歌曲) (切分数据集,提升速度)
总曲库爬取了网易云音乐中80w首歌400w+次收藏的歌单,推荐数据从中切分为华语部分歌单
-
播放方式为 网络MP3资源(获取网易云外链+id) + 本地MP3资源 --pyqt内的媒体播放模块(可以支持网络mp3资源的播放) PyQt5.QtMultimedia import QMediaPlayer, QMediaContent (Anaconda虚拟环境存在读不到的bug,升降pyqt版本解决) -
推荐系统基于用户的音乐收藏数据,实现基于用户协同过滤的音乐推荐列表
使用jaccard算法进行相似度计算(二值离散)(倒查表技术)
-
收藏管理具有增加和删除功能,所有数据在playlist_1.csv 和 playlist_2.csv中
开发心得
json dict set list touple
pandas DataFrame pd.DataFrame()
numpy NDarray np.array()
? 例:读取playlist.csv文件用DataFrame格式不利于增删改查操作(扩列)
? 有的人收藏歌曲只有3首,有的人却有1000多首,过于稀疏
? 所以采用行读取,变成一个个列表list的形式进行操作,便于索引和读写,最后再写回去 --分表能显著提升效率
? 还有一种方案,采用二进制,将文件数据以字典形式保存
可以互相调用的前提是在一起,ui与功能代码分离
通过继承关系进行ui资源初始化,将所有的窗口类定义在一个py文件下,类之间的属性要共用,就用global xxx
例:登陆界面(class ui_1)的用户输入数据要传给播放界面(class ui_2)
用eyeD3实现歌曲元信息的增加、删除、修改等操作
jupyter 代码链接?
本机代码,github支持
一、推荐系统使用说明
-
用户数据读取,收藏模块的数据已经闭合,所以直接读取数据库playlist.csv内所有数据即可 -
将数据转化为二值离散型稀疏矩阵(行:用户UID,列:歌曲UID,值:(0:未收藏 1:已收藏) 歌曲根据风格可以切分成很多张表,但这里根据用户去推荐,所以歌曲列就不切分形成多张表了。 具体操作为: 读取原始歌单爬虫数据(json),挑选需要的数据项(歌单数据[id,name] , 歌单ID对应下的歌曲ID列表[1,2,3,4…] ,歌单热度) (如图)
每个歌单的格式
里面包含非常多的信息(风格,歌手,歌曲播放次数,歌曲时长,歌曲发行时间…)
`
{
"result": {
"id": 111450065,
"status": 0,
"commentThreadId": "A_PL_0_111450065",
"trackCount": 120,
"updateTime": 1460164523907,
"commentCount": 227,
"ordered": true,
"anonimous": false,
"highQuality": false,
"subscribers": [],
"playCount": 687070,
"trackNumberUpdateTime": 1460164523907,
"createTime": 1443528317662,
"name": "带本书去旅行吧,人生最美好的时光在路上。",
"cloudTrackCount": 0,
"shareCount": 149,
"adType": 0,
"trackUpdateTime": 1494134249465,
"userId": 39256799,
"coverImgId": 3359008023885470,
"coverImgUrl": "http://p1.music.126.net/2ZFcuSJ6STR8WgzkIi2U-Q==/3359008023885470.jpg",
"artists": null,
"newImported": false,
"subscribed": false,
"privacy": 0,
"specialType": 0,
"description": "现在是一年中最美好的时节,世界上很多地方都不冷不热,有湛蓝的天空和清冽的空气,正是出游的好时光。长假将至,你是不是已经收拾行装准备出发了?行前焦虑症中把衣服、洗漱用品、充电器之类东西忙忙碌碌地丢进箱子,打进背包的时候,我打赌你肯定会留个位置给一位好朋友:书。不是吗?不管是打发时间,小读怡情,还是为了做好攻略备不时之需,亦或是为了小小地装上一把,你都得有一本书傍身呀。读大仲马,我是复仇的伯爵;读柯南道尔,我穿梭在雾都的暗夜;读村上春树,我是寻羊的冒险者;读马尔克斯,目睹百年家族兴衰;读三毛,让灵魂在撒哈拉流浪;读老舍,嗅着老北京的气息;读海茵莱茵,于科幻狂流遨游;读卡夫卡,在城堡中审判……读书的孩子不会孤单,读书的孩子永远幸福。",
"subscribedCount": 10882,
"totalDuration": 0,
"tags": [
"旅行",
"钢琴",
"安静"]
"creator": {
"followed": false,
"remarkName": null,
"expertTags": [
"古典",
"民谣",
"华语"
],
"userId": 39256799,
"authority": 0,
"userType": 0,
"gender": 1,
"backgroundImgId": 3427177752524551,
"city": 360600,
"mutual": false,
"avatarUrl": "http://p1.music.126.net/TLRTrJpOM5lr68qJv1IyGQ==/1400777825738419.jpg",
"avatarImgIdStr": "1400777825738419",
"detailDescription": "",
"province": 360000,
"description": "",
"birthday": 637516800000,
"nickname": "有梦人生不觉寒",
"vipType": 0,
"avatarImgId": 1400777825738419,
"defaultAvatar": false,
"djStatus": 0,
"accountStatus": 0,
"backgroundImgIdStr": "3427177752524551",
"backgroundUrl": "http://p1.music.126.net/LS96S_6VP9Hm7-T447-X0g==/3427177752524551.jpg",
"signature": "漫无目的的乱听,听着,听着,竟然灵魂出窍了。更多精品音乐美图分享请加我微信hu272367751。微信是我的精神家园,有我最真诚的分享。",
"authStatus": 0}
"tracks": [{歌曲1},{歌曲2}, ...]
}
}
每首歌曲的格式
{
"id": 29738501,
"name": "跟着你到天边 钢琴版",
"duration": 174001,
"hearTime": 0,
"commentThreadId": "R_SO_4_29738501",
"score": 40,
"mvid": 0,
"hMusic": null,
"disc": "",
"fee": 0,
"no": 1,
"rtUrl": null,
"ringtone": null,
"rtUrls": [],
"rurl": null,
"status": 0,
"ftype": 0,
"mp3Url": "http://m2.music.126.net/vrVa20wHs8iIe0G8Oe7I9Q==/3222668581877701.mp3",
"audition": null,
"playedNum": 0,
"copyrightId": 0,
"rtype": 0,
"crbt": null,
"popularity": 40,
"dayPlays": 0,
"alias": [],
"copyFrom": "",
"position": 1,
"starred": false,,
"starredNum": 0
"bMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": 0.0553125,
"sr": 44100,
"dfsId": 3222668581877701,
"playTime": 174001,
"bitrate": 96000,
"id": 52423394,
"size": 2089713
},
"lMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": 0.0553125,
"sr": 44100,
"dfsId": 3222668581877701,
"playTime": 174001,
"bitrate": 96000,
"id": 52423394,
"size": 2089713
},
"mMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": -0.000265076,
"sr": 44100,
"dfsId": 3222668581877702,
"playTime": 174001,
"bitrate": 128000,
"id": 52423395,
"size": 2785510
},
"artists": [
{
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "群星",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 122455
}
],
"album": {
"id": 3054006,
"status": 2,
"type": null,
"tags": "",
"size": 69,
"blurPicUrl": "http://p1.music.126.net/2XLMVZhzVZCOunaRCOQ7Bg==/3274345629219531.jpg",
"copyrightId": 0,
"name": "热门华语248",
"companyId": 0,
"songs": [],
"description": "",
"pic": 3274345629219531,
"commentThreadId": "R_AL_3_3054006",
"publishTime": 1388505600004,
"briefDesc": "",
"company": "",
"picId": 3274345629219531,
"alias": [],
"picUrl": "http://p1.music.126.net/2XLMVZhzVZCOunaRCOQ7Bg==/3274345629219531.jpg",
"artists": [
{
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "群星",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 122455
}
],
"artist": {
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 0
}
}
}
`
推荐库 surprise lightfm 看下他们的核心代码,然后再来写自己的,核心是数据处理(数据),相似度计算(算法),数据评估部分(模型预测和评估)
import surprise import lightfm
project = offline modelling + online predictio
1)offline python脚本语言
2)online 效率至上 C++/Java
原则:能离线预先算好的,都离线算好,最优的形式:线上是一个K-V字典
1.针对用户推荐 网易云音乐(每日30首歌/7首歌)?
2.针对歌曲 在你听某首歌的时候,找“相似歌曲”
从json文件中提取最终形成playlist.csv文件:(如图)
每行包括:歌单段(名称##标签##UID##热度) 歌曲段(SID:::歌名:::作者:::歌曲评分)
3.将每行数据处理成如下格式,便于字典化处理(第一列:歌单UID,第二列:歌曲SID,第三列:收藏状态(1),第四列:时间戳)
popular_music_suprise_format.txt
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1oPvk7st-1662873403098)(D:\music_recommdation\使用文档.assets\image-20220911121833213-16628699153343.png)]
- 二值离散型稀疏矩阵生成后,还需要根据用户是否收藏了该歌曲,将
- 1
功能代码
1.playlist.csv -->
popular_music_suprise_format.txt
import json
import sys
def is_null(s):
return len(s.split(",")) > 2
def parse_song_info(song_info):
try:
song_id, name, artist, popularity = song_info.split(":::")
return ",".join([song_id, '1', '1300000'])
except Exception as e:
return ""
def parse_playlist_line(in_line):
try:
contents = in_line.strip().split("\t")
name, tags, playlist_id, subscribed_count = contents[0].split("##")
songs_info = map(lambda x: playlist_id + "," + parse_song_info(x), contents[1:])
songs_info = filter(is_null, songs_info)
return "\n".join(songs_info)
except Exception as e:
print(e)
return False
def parse_file(in_file, out_file):
out = open(out_file, 'w')
for line in open(in_file, encoding='utf-8'):
result = parse_playlist_line(line)
if (result):
out.write(result.strip() + "\n")
out.close()
path = "./data/output/popular/"
parse_file("./data/playlist.csv", path+"popular_music_suprise_format.txt")
2.playlist.csv -->
popular_playlist.pkl #从歌单id到歌单名称的映射字典
popular_song.pkl #从歌曲id到歌曲名称的映射字典
import pickle
import sys
"""
歌单id-->歌单名
歌曲id-->歌曲名
歌单id-->对应所有歌曲id序列
"""
path = "./data/output/popular/"
def parse_playlist_get_info(in_line, playlist_dic, song_dic):
contents = in_line.strip().split("\t")
name, tags, playlist_id, subscribed_count = contents[0].split("##")
playlist_dic[playlist_id] = name
for song in contents[1:]:
try:
song_id, song_name, artist, popularity = song.split(":::")
song_dic[song_id] = song_name+"\t"+artist
except:
print("song format error")
print(song+"\n")
def parse_file(in_file, out_playlist, out_song):
playlist_dic = {}
song_dic = {}
for line in open(in_file, encoding='utf-8'):
parse_playlist_get_info(line, playlist_dic, song_dic)
print(playlist_dic)
pickle.dump(playlist_dic, open(out_playlist,"wb"))
pickle.dump(song_dic, open(out_song,"wb"))
parse_file("./data/playlist.csv", path+"popular_playlist.pkl", path+"popular_song.pkl")
for line in open(in_file, encoding=‘utf-8’): parse_playlist_get_info(line, playlist_dic, song_dic) #把映射字典保存在二进制文件中 print(playlist_dic) pickle.dump(playlist_dic, open(out_playlist,“wb”)) #可以通过 playlist_dic = pickle.load(open(“playlist.pkl”,“rb”))重新载入 pickle.dump(song_dic, open(out_song,“wb”))
parse_file(“./data/playlist.csv”, path+“popular_playlist.pkl”, path+“popular_song.pkl”)
|