python爬虫学习7
-
opener
- opener位于urlopen中,有时我们不使用urlopen()方法,而是直接构造opener。
import urllib.request
url = 'https://www.baidu.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36 SLBrowser/7.0.0.12151 SLBChan/11'}
req = urllib.request.Request(url,headers=headers)
opener = urllib.request.build_opener()
resp = opener.open(req)
print(resp.read().decode('utf-8'))
运行结果:
-
使用Cookie
import http.cookiejar,urllib.request
url = 'https://www.baidu.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36 SLBrowser/7.0.0.12151 SLBChan/11'}
req = urllib.request.Request(url,headers=headers)
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
resp = opener.open(req)
for i in cookie:
print(i.name+'='+i.value)
运行结果:
import http.cookiejar,urllib.request
filename = 'cookie.txt'
url = 'https://www.baidu.com'
def get_cookie():
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
resp = opener.open(url)
cookie.save()
if __name__ == '__main__':
get_cookie()
运行后可以看到当前目录下出现了一个名为cookie.txt的文件,打开后内容如下:
未完,明日继续,,,
|