1. 首先通过抓包东方财富数据中心的排名信息 通过此网址(http://data.eastmoney.com/zjlx/detail.html),运用fiddle抓包工具抓取其后台实时更新页面的数据信息,至于如何抓取,我这里不做赘述,熟悉fiddler是一个数据挖掘者的基本素养。这是直接抓好的链接(https://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112305306202746207478_1630325762044&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13),直接点击,会出现以下结果,对比后发现,就是这里的原版数据无疑。 2.用Python处理数据并用dataframe类型返回
def get_flow_df(url='''https://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112305306202746207478_1630325762044&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13'''):
Max_Retry_Times = 3
while True:
try:
html = urllib.request.urlopen(url, timeout=5).read()
time.sleep(3)
break
except:
Max_Retry_Times = Max_Retry_Times - 1
if Max_Retry_Times == 0:
break
print('超时重试')
html = html.decode('utf-8')
pre1_data = re.findall(r'[(](.*?)[)]', html)
data_list = str(pre1_data).replace('[', 'q').replace(']', 'q').split('q')
pre2_data = data_list[2]
df = pd.DataFrame.from_dict(eval(pre2_data), orient='columns')
df=df.rename(columns={'f12':'symbol','f14':'name','f2':'trade','f13':'market','f3':'c_ptg'})
new_df=pd.DataFrame(columns=('symbol','name','trade','market'))
symbol_list=[]
new_df['symbol']=df['symbol']
new_df['name']=df['name']
new_df['trade']=df['trade']
new_df['market']=df['market']
new_df['c_ptg']=df['c_ptg']
print(new_df)
return new_df
注:Url链接中的pz=50是前50名的股票,如果你需求量大可以做相应微调。 执行结果: 3.声明 本文如有侵权将即刻删除,如涉及侵权或有其他技术问题请私信我。
|