爬虫篇——selenium(webdriver)进行用户登录并爬取数据
摘要
本文主要介绍了如何通过selenium使用Chormedriver进行用户登录并爬取数据,使用过程中需注意合理使用selenium.webdriver.support.expected_conditions 和selenium.webdriver.support.ui.WebDriverWait
(一)创建browser对象
chromedriver.exe的下载地址为:点此进行下载
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common import by
from selenium.webdriver.common.action_chains import ActionChains
from lxml import etree
class ChromeCrawl(object):
def __init__(self):
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
self.browser = webdriver.Chrome(executable_path="./tools/chromedriver.exe", chrome_options=chrome_options)
self.browser.set_page_load_timeout(60)
self.browser.set_script_timeout(60)
self.wait = WebDriverWait(self.browser, 60)
(二)用户登录
def login(self):
username = "*****"
passwd = "******"
self.browser.get('https:********login')
self.browser.implicitly_wait(60)
elem = self.browser.find_element_by_id("username")
elem.send_keys(username)
elem = self.browser.find_element_by_id("password")
elem.send_keys(passwd)
button = self.wait.until(expected_conditions.element_to_be_clickable((by.XPATH, '//*****')))# 根据自己的网页进行设置
ActionChains(self.browser).click(button).perform()
self.wait.until(expected_conditions.presence_of_element_located((by.CLASS_NAME, '******')))
(三)数据爬取
def crawl(self):
self.browser.get('https:******')
self.wait.until(expected_conditions.presence_of_element_located((by.CLASS_NAME, '******')))
html = etree.HTML(self.browser.page_source)
tmp = html.xpath('//*****')
by CyrusMay 2022 01 25
一生要有多少的辗转 才能走到幸福的彼岸 ——————五月天(青空未来)——————
|