一、小练手
1.自动填充百度网页的查询关键字完成自动搜索
通过查看百度网页的源码找到搜素框的id以及搜素按钮的id
获取百度网页
driver = webdriver.Chrome("E:\GoogleDownload\chromedriver_win32\chromedriver.exe")
driver.get("https://www.baidu.com/")
填充搜索框
search=driver.find_element_by_id("kw")
search.send_keys("醉意丶千层梦")
模拟点击
send_button=driver.find_element_by_id("su")
send_button.click()
效果
2.到指定网站去爬取十句名言
分析网页,含有quote类的标签即为所要的标签 其中text类名言,author为作者。
代码实现
driver = webdriver.Chrome("E:\GoogleDownload\chromedriver_win32\chromedriver.exe")
driver.get("http://quotes.toscrape.com/js/")
csvHeaders = ['作者','名言']
subjects = []
subject=[]
res_list=driver.find_elements_by_class_name("quote")
for tmp in res_list:
subject.append(tmp.find_element_by_class_name("author").text)
subject.append(tmp.find_element_by_class_name("text").text)
print(subject)
subjects.append(subject)
subject=[]
效果
二、爬取京东特定商品
1.分析网页
获取输入框 点击搜素按钮
获取展示书籍的列表 获取每一本书籍 书名 价格 跳转到下一页
2.实现
driver = webdriver.Chrome("E:\GoogleDownload\chromedriver_win32\chromedriver.exe")
driver.set_window_size(1920,1080)
driver.get("https://www.jd.com/")
key=driver.find_element_by_id("key").send_keys("python编程")
time.sleep(1)
button=driver.find_element_by_class_name("button").click()
time.sleep(1)
windows = driver.window_handles
driver.switch_to.window(windows[-1])
time.sleep(1)
js = 'return document.body.scrollHeight'
max_height = driver.execute_script(js)
max_height=(int(max_height/1000))*1000
tmp_height=1000
res_dict={}
num=200
while len(res_dict)<num:
tmp_height = 1000
while tmp_height < max_height:
js = "window.scrollBy(0,1000)"
driver.execute_script(js)
tmp_height += 1000
J_goodsList = driver.find_element_by_id("J_goodsList")
ul = J_goodsList.find_element_by_tag_name("ul")
res_list = ul.find_elements_by_tag_name("li")
for res in res_list:
res_dict[res.find_element_by_class_name('p-name').find_element_by_tag_name('em').text] \
= res.find_element_by_xpath("//div[@class='p-price']//i").text
if len(res_dict)==num:
break
time.sleep(2)
if len(res_dict) == num:
break
J_bottomPage=driver.find_element_by_id("J_bottomPage")
next_button=J_bottomPage.find_element_by_class_name("pn-next").click()
windows = driver.window_handles
driver.switch_to.window(windows[-1])
time.sleep(3)
csvHeaders = ['书名','价格']
csvRows=[]
row=[]
for key,value in res_dict.items():
row.append(key)
row.append(value)
csvRows.append(row)
row=[]
with open('./output/jd_books.csv', 'w', newline='') as file:
fileWriter = csv.writer(file)
fileWriter.writerow(csvHeaders)
fileWriter.writerows(csvRows)
效果
爬取的数据
三、总结
爬虫很好用,爬虫工具更好用
|