[Python知识库] 21年10月，唯美女孩网站爬取，python， requests + re

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> 21年10月，唯美女孩网站爬取，python， requests + re -> 正文阅读

[Python知识库]21年10月，唯美女孩网站爬取，python， requests + re

不多bb，直接上代码

import re
import time

import requests

headers = {
    "user-agent": "写自己的浏览器"
}
response = requests.get('https://www.vmgirls.com/这里五个数.html', headers=headers)    # 只爬详情页
time.sleep(1)


def cunchu(data):
    img_all = re.findall('<a href="(.*?)" alt=".*?" title=".*?">', data)
    for i in img_all:
        img_url = 'https:' + i
        print(img_url)
        time.sleep(1)
        file = './img/' + i[26:34] + '.jpg'
        img_res = requests.get(img_url, headers=headers)
        with open(file, "wb") as f:
            f.write(img_res.content)


if re.findall('<a href="(.*?)" alt=".*?" title=".*?">', response.text):
    print("没有跳转")
    cunchu(response.text)
else:
    href = re.findall('.href ="(.*?)"; </s', response.text)[0]
    print("有跳转")
    url = 'https://www.vmgirls.com' + href
    time.sleep(0.7)
    data_res = requests.get(url, headers=headers)
    cunchu(data_res.text)

重点思路：在请求详情页的时候有可能会遇到跳转页，通过re获取详情页url，继续get。