一、BeautifulSoup解析数据
data:image/s3,"s3://crabby-images/1fec7/1fec7956bde1bcea238d8975fc42db62255936f9" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/5bdc2/5bdc2e048bd260084c760dcbe400c70b41b3807d" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/115e1/115e1053faa8dfa983c242b2f3e91cc29ebf42d3" alt="在这里插入图片描述"
import requests
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
print(res.status_code)
html = res.text
print(html)
import requests
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
html = res.text
soup = BeautifulSoup(html,'html.parser')
print(type(soup))
print(soup)
data:image/s3,"s3://crabby-images/ab943/ab943a257a0d488f462f697b19bd37a07259383e" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/fa57b/fa57ba3de1d8e98319cadf8090dbca4088320b8e" alt="在这里插入图片描述"
二、BeautifulSoup解析数据
1、find()与find_all()
data:image/s3,"s3://crabby-images/8405b/8405b64f1ae2b4e3397917f32dce3f2494e705e9" alt="在这里插入图片描述"
data:image/s3,"s3://crabby-images/a453b/a453bc2bd118ace12c2b8481e1d909db4946f115" alt="在这里插入图片描述"
import requests
from bs4 import BeautifulSoup
url = 'https://localprod.pandateacher.com/python-manuscript/crawler-html/spder-men0.0.html'
res = requests.get (url)
print(res.status_code)
soup = BeautifulSoup(res.text,'html.parser')
items = soup.find_all('div')
print(type(items))
print(items)
data:image/s3,"s3://crabby-images/f6376/f637655d1b932b1c8089595008aaf02a4c566dcd" alt="在这里插入图片描述"
import requests
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
html = res.text
soup = BeautifulSoup( html,'html.parser')
items = soup.find_all(class_='books')
print(items)
print(type(items))
2、取出列表的每一个值
import requests
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
html= res.text
soup = BeautifulSoup( html,'html.parser')
items = soup.find_all(class_='books')
for item in items:
print('想找的数据都包含在这里了:\n',item)
3、Tag对象
data:image/s3,"s3://crabby-images/e047f/e047fb5e74e94659b1d33f4ca63ce930f9e70fde" alt="在这里插入图片描述"
import requests
from bs4 import BeautifulSoup
res = requests.get('https://localprod.pandateacher.com/python-manuscript/crawler-html/spider-men5.0.html')
html = res.text
soup = BeautifulSoup( html,'html.parser')
items = soup.find_all(class_='books')
for item in items:
kind = item.find('h2')
title = item.find(class_='title')
brief = item.find(class_='info')
print(kind,'\n',title,'\n',brief)
print(type(kind),type(title),type(brief))
print(kind.text,'\n',title.text,'\n',title['href'],'\n',brief.text)
data:image/s3,"s3://crabby-images/54f0f/54f0f9be7350244e1ac8dbb99c5baefa0724abe3" alt="在这里插入图片描述"
data:image/s3,"s3://crabby-images/d04b0/d04b022c018c8965c8b814f865452545339c4e43" alt="在这里插入图片描述"
三、对象的变化过程
data:image/s3,"s3://crabby-images/4471e/4471e221ed4e06d769d40d958ae59bed58cd0ceb" alt="在这里插入图片描述"
|