[开发测试] Practice 1_使用selenium基本用法实现对大学排名等数据爬取

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 开发测试 -> Practice 1_使用selenium基本用法实现对大学排名等数据爬取 -> 正文阅读

[开发测试]Practice 1_使用selenium基本用法实现对大学排名等数据爬取

在学习网络爬虫的过程中，通过在mooc上北理嵩天老师的爬虫教学课和平时的自学完成最基础的对无需登录的大学排名网站爬取数据，mooc上老师使用的是bs4库中的BeautifulSoup库以及正则表达式实现对大学排名的爬取。本文中通过selenium中的webdriver实现模拟浏览器的登录和爬取需要的数据，并存储和读取爬取的数据

完整代码如下：

from selenium import webdriver

options=webdriver.ChromeOptions()
options.add_argument('--start-maximized')

'''驱动模拟浏览器并达到指定网页'''
driver = webdriver.Chrome("E:\chromedriver.exe")
driver.get('https://www.shanghairanking.cn/rankings/bcur/2021')

'''使用xpath定位抓取数据'''
names_tags=driver.find_elements_by_xpath("//a[@class='name-cn']")#通过标签名与属性查找
#由于位置与分数用上述方式找不到，通过xpath查找兄弟节点标签的功能查找
locations_tags=driver.find_elements_by_xpath("//td[@class='align-left']/following-sibling::td[1]")
scoles_tags=driver.find_elements_by_xpath("//td[@class='align-left']/following-sibling::td[3]")
info_list=[]
i=0
while i<len(names_tags):
	info_list.append([names_tags[i].get_attribute('textContent').strip(),locations_tags[i].get_attribute('textContent').strip(),scoles_tags[i].get_attribute('textContent').strip()])
	print(info_list[i])#打印输出爬取的数据
	i=i+1
driver.quit()

'''存储数据'''
path='E:/python学习文件/py_爬虫/projects/infor_of_unives.txt'
with open(path,'a') as f:
	for univer in info_list:
		f.write(str(univer)+'\n')
f.close()

'''读取数据'''
with open(path,'r') as f:
	print(f.read())

爬取数据展示（我这里使用列表形式存储）：