??爬取内容为简书用户的最新评论中的评论题目、评论内容及评论时间,爬取5页,用selenium爬取,将这些评论存入Excel文件中,文件后缀为.xls。
代码如下:
from selenium import webdriver
import xlwt
import time
listdata=[]
driver = webdriver.Chrome()
for i in range(1,6):
print("*****开始爬取第"+str(i)+"页*****")
url='https://www.jianshu.com/u/9104ebf5e177?order_by=commented_at&page='+str(i)
driver.get(url)
time.sleep(10)
datainfo = driver.find_elements_by_css_selector('li.have-img')
for info in datainfo:
title = info.find_element_by_tag_name('a.title')
comment = info.find_element_by_tag_name('p.abstract')
times = info.find_element_by_tag_name('span.time')
listinfo=[title.text,comment.text,times.text]
listdata.append(listinfo)
time.sleep(3)
print("*****全部爬取完成!******")
print("开始将爬取的内容保存在xls文件中")
header = ["title","comment","time"]
book = xlwt.Workbook(encoding='utf-8')
sheet = book.add_sheet('Sheet1')
for h in range(len(header)):
sheet.write(0, h, header[h])
i = 1
for data in listdata:
j = 0
for d in data:
sheet.write(i,j,d)
j += 1
i += 1
book.save('jianshuluopan.xls')
print("保存成功!")
爬取结果部分截图:
|