[Python知识库] 爬虫的学习入门

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> 爬虫的学习入门 -> 正文阅读

[Python知识库]爬虫的学习入门

经过学习思考后，

当我们去访问一个网页时，是如何进行的？
　　①打开浏览器，输入要访问的网址，发起请求。
　　②等待服务器返回数据，通过浏览器加载网页。
　　③从网页中找到自己需要的数据（文本、图片、文件等等）。
　　④保存自己需要的数据。

对于爬虫，也是类似的。它模仿人类请求网页的过程，但是又稍有不同。
　　首先，对应于上面的①和②步骤，我们要利用python实现请求一个网页的功能。
　　其次，对应于上面的③步骤，我们要利用python实现解析请求到的网页的功能。
　　最后，对于上面的④步骤，我们要利用python实现保存数据的功能。
　　

1.先对东方财富网的数据进行爬取

按F12进入管理者模式，找到需要的数据

先进行导包

import requests
import re
import time
import random
import openpyxl

2.定义一个函数，然后设置请求头，同时避免被网站反爬虫

对网页1至50页进行爬取。

def main():
    cookies = {
    'qgqp_b_id': '02d480cce140d4a420a0df6b307a945c',
    'cowCookie': 'true',
    'em_hq_fls': 'js',
    'intellpositionL': '1168.61px',
    'HAList': 'a-sz-300059-%u4E1C%u65B9%u8D22%u5BCC%2Ca-sz-000001-%u5E73%u5B89%u94F6%u884C',
    'st_si': '07441051579204',
    'st_asi': 'delete',
    'st_pvi': '34234318767565',
    'st_sp': '2021-09-28%2010%3A43%3A13',
    'st_inirUrl': 'http%3A%2F%2Fdata.eastmoney.com%2F',
    'st_sn': '31',
    'st_psi': '20211020210419860-113300300813-5631892871',
    'intellpositionT': '1007.88px',
    }

    headers = {
    'Connection': 'keep-alive',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36 Edg/94.0.992.50',
    'DNT': '1',
    'Accept': '*/*',
    'Referer': 'http://quote.eastmoney.com/',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    }
    for page in range(1,50):
        params = (
            ('cb', 'jQuery1124031167968836399784_1615878909521'),
            ('pn', str(page)),
            ('pz', '20'),
            ('po', '1'),
            ('np', '1'),
            ('ut', 'bd1d9ddb04089700cf9c27f6f7426281'),
            ('fltt', '2'),
            ('invt', '2'),
            ('fid', 'f3'),
            ('fs', 'm:0 t:6,m:0 t:13,m:0 t:80,m:1 t:2,m:1 t:23'),
            ('fields', 'f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152'),
        )

        response = requests.get('http://24.push2.eastmoney.com/api/qt/clist/get', headers=headers, params=params, cookies=cookies, verify=False)
        daimas = re.findall('"f12":(.*?),',response.text)
        names = re.findall('"f14":"(.*?)"',response.text)
        zuixinjias = re.findall('"f2":(.*?),',response.text)
        zhangdiefus = re.findall('"f3":(.*?),',response.text)
        zhangdiees = re.findall('"f4":(.*?),',response.text)
        chengjiaoliangs = re.findall('"f5":(.*?),',response.text)
        chengjiaoes = re.findall('"f6":(.*?),',response.text)
        zhenfus = re.findall('"f7":(.*?),',response.text)
        zuigaos = re.findall('"f15":(.*?),',response.text)
        zuidis = re.findall('"f16":(.*?),',response.text)
        jinkais = re.findall('"f17":(.*?),',response.text)
        zuoshous = re.findall('"f18":(.*?),',response.text)
        liangbis = re.findall('"f10":(.*?),',response.text)
        huanshoulvs = re.findall('"f8":(.*?),',response.text)
        shiyinglvs = re.findall('"f9":(.*?),',response.text)

        for i in range(20):
            sheet.append([daimas[i],names[i],zuixinjias[i],zhangdiefus[i],zhangdiees[i],
                          chengjiaoliangs[i],chengjiaoes[i],zhenfus[i],zuigaos[i],zuidis[i],
                          jinkais[i],zuoshous[i],liangbis[i],huanshoulvs[i],shiyinglvs[i]])

        time.sleep(random.randint(2,4))

3.对爬取的数据进行分类命名，同时进行excel存储。

if __name__ == '__main__':
    wb = openpyxl.Workbook()
    sheet = wb.active
    sheet['A1'] = '代码'
    sheet['B1'] = '名称'
    sheet['C1'] = '最新价'
    sheet['D1'] = '涨跌幅'
    sheet['E1'] = '涨跌额'
    sheet['F1'] = '成交量'
    sheet['G1'] = '成交额'
    sheet['H1'] = '振幅'
    sheet['I1'] = '最高'
    sheet['J1'] = '最低'
    sheet['K1'] = '今开'
    sheet['L1'] = '昨收'
    sheet['M1'] = '量比'
    sheet['N1'] = '换手率'
    sheet['O1'] = '市盈率'
    main()
    wb.save('股票数据.xlsx')

Python知识库最新文章

Python中String模块

【Python】 14-CVS文件操作

python的panda库读写文件

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

加:2021-10-21 12:08:59 更:2021-10-21 12:09:46

360图书馆购物三丰科技阅读网日历万年历 2025年12日历

-2025/12/1 4:58:09-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码