目录
1. requests模块安装
1.1 pip insatll requests
1.2 PyCharm安装
2.?requests实战
2.1 获取请求方式
?2.2 添加请求头
3. 尾末福利:抓取精美图片
requests库采requests用的是阻塞式的网络请求方式,也就是说,发起请求之后,必须等到有响应才会继续执行下面的任务。
1. requests模块安装
基于PyCharm2022.1.1的开发环境。
1.1 pip insatll requests
点击Terminal
?输入pip install requests后回车,本人已经安装过,所以显示需求以满足。
1.2 PyCharm安装
安装完成后会显示类似successful标志 。
2.?requests实战
以搜狗为例:
import requests #导入模块
url = 'https://www.sogou.com/' #请求网址
response = requests.get(url) #响应
response.encoding = 'utf-8' #编码方式
print('响应内容为:',response.content) #获取响应内容
print('响应文本为:',response.text) #获取响应文本
print('请求头为:',response.headers) #获取请求头
print('请求方式为:',response.request) #获取请求方式
print('编码方式为:',response.encoding) #获取编码方式
print('请求网址url为:',response.url) #获取请求网址url
print('cookies为:',response.cookies) #获取cookies
print('状态码为:',response.status_code) #获取状态码,一般200请求成功,404请求失败
print('响应类型为:',type(response)) #获取响应类型
print('内容响应类型为:',type(response.content))
print('文本响应类型为:',type(response.text))
运行结果如下:
响应内容为: b'<!DOCTYPE html><html lang="cn"><head><meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date(); window.lead_ip = \'123.147.244.130\';\n window.now = 1653966907968;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"×tamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\\.css($|\\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\\.js($|\\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e - \xe4\xb8\x8a\xe7\xbd\x91\xe4\xbb\x8e\xe6\x90\x9c\xe7\x8b\x97\xe5\xbc\x80\xe5\xa7\x8b</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2"><meta name="keywords" content="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2,\xe7\xbd\x91\xe9\xa1\xb5\xe6\x90\x9c\xe7\xb4\xa2,\xe5\xbe\xae\xe4\xbf\xa1\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xa7\x86\xe9\xa2\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe5\x9b\xbe\xe7\x89\x87\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x9f\xb3\xe4\xb9\x90\xe6\x90\x9c\xe7\xb4\xa2,\xe6\x96\xb0\xe9\x97\xbb\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xbd\xaf\xe4\xbb\xb6\xe6\x90\x9c\xe7\xb4\xa2,\xe9\x97\xae\xe7\xad\x94\xe6\x90\x9c\xe7\xb4\xa2,\xe7\x99\xbe\xe7\xa7\x91\xe6\x90\x9c\xe7\xb4\xa2,\xe8\xb4\xad\xe7\x89\xa9\xe6\x90\x9c\xe7\xb4\xa2"><meta name="description" content="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2\xe6\x98\xaf\xe5\x85\xa8\xe7\x90\x83\xe7\xac\xac\xe4\xb8\x89\xe4\xbb\xa3\xe4\xba\x92\xe5\x8a\xa8\xe5\xbc\x8f\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e\xef\xbc\x8c\xe6\x94\xaf\xe6\x8c\x81\xe5\xbe\xae\xe4\xbf\xa1\xe5\x85\xac\xe4\xbc\x97\xe5\x8f\xb7\xe5\x92\x8c\xe6\x96\x87\xe7\xab\xa0\xe6\x90\x9c\xe7\xb4\xa2\xe3\x80\x81\xe7\x9f\xa5\xe4\xb9\x8e\xe6\x90\x9c\xe7\xb4\xa2\xe3\x80\x81\xe8\x8b\xb1\xe6\x96\x87\xe6\x90\x9c\xe7\xb4\xa2\xe5\x8f\x8a\xe7\xbf\xbb\xe8\xaf\x91\xe7\xad\x89\xef\xbc\x8c\xe9\x80\x9a\xe8\xbf\x87\xe8\x87\xaa\xe4\xb8\xbb\xe7\xa0\x94\xe5\x8f\x91\xe7\x9a\x84\xe4\xba\xba\xe5\xb7\xa5\xe6\x99\xba\xe8\x83\xbd\xe7\xae\x97\xe6\xb3\x95\xe4\xb8\xba\xe7\x94\xa8\xe6\x88\xb7\xe6\x8f\x90\xe4\xbe\x9b\xe4\xb8\x93\xe4\xb8\x9a\xe3\x80\x81\xe7\xb2\xbe\xe5\x87\x86\xe3\x80\x81\xe4\xbe\xbf\xe6\x8d\xb7\xe7\x9a\x84\xe6\x90\x9c\xe7\xb4\xa2\xe6\x9c\x8d\xe5\x8a\xa1\xe3\x80\x82"><link rel="stylesheet" type="text/css" href="//dlweb.sogoucdn.com/pcsearch/web/index/css/index_style_39e6e10.css"><style>.wrapper .suggestion{border:1px solid #e8e8e8;width:653px;-moz-box-shadow:0 1px 8px rgba(0,0,0,.1);-webkit-box-shadow:0 1px 8px rgba(0,0,0,.1);box-shadow:0 1px 8px rgba(0,0,0,.1);border-top-left-radius:0;border-top-right-radius:0;border-bottom-right-radius:2px;border-bottom-left-radius:2px;top:43px}.wrapper .suglist{width:206px}.wrapper .suglist .keyword{color:#7a77c8}.big-scn .suggestion{width:820px}.big-scn .suglist{width:236px}.wrapper .suglist{padding:4px 0}input[type=text]::-ms-clear{display:none}</style><!-- indexSnippetToHeader start --> <!-- indexSnippetToHeader end --></head><body color-style="white"><div class="wrapper " id="wrap"><div class="header"> <div class="top-nav"><ul><li class="cur"><span>\xe7\xbd\x91\xe9\xa1\xb5</span></li><li><a onclick="st(this,\'73141200\',\'weixin\')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">\xe5\xbe\xae\xe4\xbf\xa1</a></li><li><a onclick="st(this,\'40051200\',\'zhihu\')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">\xe7\x9f\xa5\xe4\xb9\x8e</a></li><li><a onclick="st(this,\'40030500\',\'pic\')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">\xe5\x9b\xbe\xe7\x89\x87</a></li><li><a onclick="st(this,\'40030600\',\'video\')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">\xe8\xa7\x86\xe9\xa2\x91</a></li><li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,\'\',\'myingyi\')">\xe5\x8c\xbb\xe7\x96\x97</a></li><li><a href="http://hanyu.sogou.com?fr=pcweb_index_nav" uigs-id="nav_hanyu" id="hanyu" onclick="st(this,\'\',\'hanyu\')">\xe6\xb1\x89\xe8\xaf\xad</a></li><li><a href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="fanyi" onclick="st(this,\'\',\'fanyi\')">\xe7\xbf\xbb\xe8\xaf\x91</a></li><li><a onclick="st(this,\'web2ww\',\'wenwen\')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">\xe9\x97\xae\xe9\x97\xae</a></li><li><a onclick="st(this,\'web2ww\',\'baike\')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_baike">\xe7\x99\xbe\xe7\xa7\x91</a></li><li><a onclick="st(this,\'40031000\')" href="http://map.sogou.com" uigs-id="nav_map" id="map">\xe5\x9c\xb0\xe5\x9b\xbe</a></li><li class="show-more"><a href="javascript:void(0);" id="more-product">\xe6\x9b\xb4\xe5\xa4\x9a<i class="m-arr"></i></a><div class="pos-more" id="products-box" style="top:40px"><span class="ico-san"></span><a onclick="st(this,\'40031500\')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">\xe8\xb4\xad\xe7\x89\xa9</a><a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">\xe7\x9f\xa5\xe8\xaf\x86</a><a onclick="st(this,\'40051205\')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">\xe5\xba\x94\xe7\x94\xa8</a><a href="https://baike.sogou.com/kexue/home.htm" uigs-id="nav_science" id="science">\xe7\xa7\x91\xe5\xad\xa6</a><span class="all"><a onclick="st(this,\'40051206\')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">\xe5\x85\xa8\xe9\x83\xa8</a></span></div></li></ul></div><div class="user-box"> <a href="javascript:void(0)" id="cniil_wza" style="float:left;text-decoration:none;color:#000;opacity:.75;padding-right:20px;margin-right:20px;border-right:1px solid #e7e7e7;line-height:14px;position:relative;top:5px">\xe6\x97\xa0\xe9\x9a\x9c\xe7\xa2\x8d</a> <div class="local-weather" id="local-weather"><div class="wea-box" id="cur-weather" style="display:none"></div> <div class="pos-more" id="detail-weather" style="top:40px;left:-80px"></div> </div><span class="line" id="user-box-line" style="display:none"></span><div class="user-enter"> <a href="javascript:void(0);" class="enter" id="loginBtn">\xe7\x99\xbb\xe5\xbd\x95</a> </div></div></div><div class="content" id="content"><div class="pos-header" id="top-float-bar"><div class="part-one"></div><div class="part-two" id="card-tab-layer"><div class="c-top" id="top-card-tab"></div></div></div><div class="logo2" id="logo-s"><span></span></div><div class="logo" id="logo-l"><span></span></div> <div class="search-box querybox-focus" id="search-box"><form action="/web" name="sf" id="sf"><span class="sec-input-box"><input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off"></span><span class="enter-input"><input type="submit" value="\xe6\x90\x9c\xe7\x8b\x97\xe6\x90\x9c\xe7\xb4\xa2" id="stb"></span><input type="hidden" name="_asf" value="www.sogou.com"> <input type="hidden" name="_ast"> <input type="hidden" name="w" value="01019900"> <input type="hidden" name="p" value="40040100"> <input type="hidden" name="ie" value="utf8"> <input type="hidden" name="from" value="index-nologin"> <input type="hidden" name="s_from" value="index"><div class="keywords-tips" id="keywordsTips" style="display:none"><i></i><p>\xe2\x80\x9c<strong id="keywordsTipsStrong">369</strong>\xe2\x80\x9d\xe5\x90\x8e\xe9\x9d\xa2\xe7\x9a\x84\xe6\x96\x87\xe5\xad\x97\xe8\xa2\xab\xe5\xbf\xbd\xe7\x95\xa5\xef\xbc\x8c\xe6\x90\x9c\xe7\x8b\x97\xe7\x9a\x84\xe6\x9f\xa5\xe8\xaf\xa2\xe9\x99\x90\xe5\x88\xb6\xe5\x9c\xa840\xe4\xb8\xaa\xe6\xb1\x89\xe5\xad\x97\xe4\xbb\xa5\xe5\x86\x85\xe3\x80\x82</p></div></form></div> </div><div class="card-box" id="card-box" style="display:none"><div class="card-box2" id="card-box2"><div class="c-top" id="card-tab-box"><a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card" class="shezhi"></a></div><div class="c-main" id="card-content"></div></div></div><div class="loog-more" id="scroll-more" style="display:none"><a href="javascript:void(0);" uigs-id="scroll-more">\xe6\xbb\x9a\xe5\x8a\xa8\xe6\x9f\xa5\xe7\x9c\x8b\xe6\x9b\xb4\xe5\xa4\x9a<br><span class="ico_san"></span></a></div><div class="ft" id="footer" style="display:none" ><a href="http://b.sogou.com/" target="_blank" uigs-id="footer_tuiguang">\xe4\xbc\x81\xe4\xb8\x9a\xe6\x8e\xa8\xe5\xb9\xbf</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">\xe5\x85\x8d\xe8\xb4\xa3\xe5\xa3\xb0\xe6\x98\x8e</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88\xe5\x8f\x8a\xe6\x8a\x95\xe8\xaf\x89</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">\xe9\x9a\x90\xe7\xa7\x81\xe6\x94\xbf\xe7\xad\x96</a><br><span class="g">\xe8\x8d\xaf\xe5\x93\x81\xe5\x8c\xbb\xe7\x96\x97\xe5\x99\xa8\xe6\xa2\xb0\xe7\xbd\x91\xe7\xbb\x9c\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe5\xa4\x87\xe6\xa1\x88\xef\xbc\x9a\xef\xbc\x88\xe4\xba\xac\xef\xbc\x89\xe7\xbd\x91\xe8\x8d\xaf\xe6\xa2\xb0\xe4\xbf\xa1\xe6\x81\xaf\xe5\xa4\x87\xe5\xad\x97\xef\xbc\x882021\xef\xbc\x89\xe7\xac\xac00047\xe5\x8f\xb7</span> / <span class="g">\xe4\xba\x92\xe8\x81\x94\xe7\xbd\x91\xe8\x8d\xaf\xe5\x93\x81\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe8\xb5\x84\xe6\xa0\xbc\xe8\xaf\x81\xe4\xb9\xa6(\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7)\xef\xbc\x9a(\xe4\xba\xac)-\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">\xe7\xbd\x91\xe4\xb8\x8a\xe6\x9c\x89\xe5\xae\xb3\xe4\xbf\xa1\xe6\x81\xaf\xe4\xb8\xbe\xe6\x8a\xa5\xe4\xb8\x93\xe5\x8c\xba</a> / <span class="g">\xe4\xba\xac\xe7\xbd\x91\xe6\x96\x87(2019)6117-724\xe5\x8f\xb7</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe8\xaf\x81050897\xe5\x8f\xb7</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe5\xa4\x8711001839\xe5\x8f\xb7-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">\xe4\xba\xac\xe5\x85\xac\xe7\xbd\x91\xe5\xae\x89\xe5\xa4\x8711000002000025\xe5\x8f\xb7</a></div> <div class="ft-v1" id="QRcode-footer" style="padding-bottom:28px"><div class="ft-info"><a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>\xe6\x90\x9c\xe7\x8b\x97\xe8\xbe\x93\xe5\x85\xa5\xe6\xb3\x95</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>\xe6\xb5\x8f\xe8\xa7\x88\xe5\x99\xa8</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>\xe7\xbd\x91\xe5\x9d\x80\xe5\xaf\xbc\xe8\x88\xaa</a><br><a href="http://b.sogou.com/" target="_blank" class="g">\xe4\xbc\x81\xe4\xb8\x9a\xe6\x8e\xa8\xe5\xb9\xbf</a> - <a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">\xe5\x85\x8d\xe8\xb4\xa3\xe5\xa3\xb0\xe6\x98\x8e</a> - <a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88\xe5\x8f\x8a\xe6\x8a\x95\xe8\xaf\x89</a> - <a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">\xe9\x9a\x90\xe7\xa7\x81\xe6\x94\xbf\xe7\xad\x96</a><br><span class="g">\xe8\x8d\xaf\xe5\x93\x81\xe5\x8c\xbb\xe7\x96\x97\xe5\x99\xa8\xe6\xa2\xb0\xe7\xbd\x91\xe7\xbb\x9c\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe5\xa4\x87\xe6\xa1\x88\xef\xbc\x9a\xef\xbc\x88\xe4\xba\xac\xef\xbc\x89\xe7\xbd\x91\xe8\x8d\xaf\xe6\xa2\xb0\xe4\xbf\xa1\xe6\x81\xaf\xe5\xa4\x87\xe5\xad\x97\xef\xbc\x882021\xef\xbc\x89\xe7\xac\xac00047\xe5\x8f\xb7</span> / <span class="g">\xe4\xba\x92\xe8\x81\x94\xe7\xbd\x91\xe8\x8d\xaf\xe5\x93\x81\xe4\xbf\xa1\xe6\x81\xaf\xe6\x9c\x8d\xe5\x8a\xa1\xe8\xb5\x84\xe6\xa0\xbc\xe8\xaf\x81\xe4\xb9\xa6(\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7)\xef\xbc\x9a(\xe4\xba\xac)-\xe9\x9d\x9e\xe7\xbb\x8f\xe8\x90\xa5\xe6\x80\xa7-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">\xe7\xbd\x91\xe4\xb8\x8a\xe6\x9c\x89\xe5\xae\xb3\xe4\xbf\xa1\xe6\x81\xaf\xe4\xb8\xbe\xe6\x8a\xa5\xe4\xb8\x93\xe5\x8c\xba</a> / <span class="g">\xe4\xba\xac\xe7\xbd\x91\xe6\x96\x87(2019)6117-724\xe5\x8f\xb7</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe8\xaf\x81050897\xe5\x8f\xb7</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">\xe4\xba\xacICP\xe5\xa4\x8711001839\xe5\x8f\xb7-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">\xe4\xba\xac\xe5\x85\xac\xe7\xbd\x91\xe5\xae\x89\xe5\xa4\x8711000002000025\xe5\x8f\xb7</a></div> <div class="fit-older"></div> </div> <div class="kuozhan" id="QRcode-box" style="display:none"><a href="javascript:void(0);" id="miniQRcode"></a><span id="QRcode"></span></div><a href="javascript:void(0);" class="back-top" id="back-top"></a></div> <script>var SugPara, uigs_para, msBrowserName = navigator.userAgent.toLowerCase(),msIsSe = false,msIsMSearch = false, hasDoodle = false, queryinput = document.getElementById(\'query\');</script><script>/*file=static/js/indexjs.js*/function indexjsInit(e,o,n,t,s,u,i){var r={puid:t,cards:s,cards_sw:u,uigs_cookie:"SUID,sct,SUV"};function c(){try{window.external.metasearch("make_connection","www.google.com.hk")}catch(e){}}uigs_para={uigs_productid:"webapp",type:"webindex_new",stype:e?"login":"nologin",scrnwi:screen.width,scrnhi:screen.height,uigs_pbtag:"A",uigs_cookie:"SUID,sct",protocol:"https:"==location.protocol.toLowerCase()?"https":"http"},e&&(uigs_para=Object.assign(uigs_para,r)),window.loginCardConfig={},SugPara={queryboxid:"search-box",enableSug:!0,sugType:"web",domain:"w.sugg.sogou.com",productId:"web",sugFormName:"sf",inputid:"query",submitId:"stb",suggestRid:"01015002",normalRid:"01019900",useParent:1,sugglocation:"index",showVr:!0,showHotwords:!0,suggAbtestObject:o},/se 2\\.x/i.test(msBrowserName)&&(msIsSe=!0),/metasr/i.test(msBrowserName)&&(msIsMSearch=!0),queryinput&&msIsSe&&msIsMSearch&&(queryinput.addEventListener?(queryinput.addEventListener("keypress",c,!1),queryinput.addEventListener("keydown",c,!1)):queryinput.attachEvent?(queryinput.attachEvent("onkeypress",c),queryinput.attachEvent("onkeydown",c)):(queryinput.onkeypress=c,queryinput.onkeydown=c)),window.m_s_index=function(){var e=document.sf.query,o=Math.round(1e3*((new Date).getTime()+Math.random()));e.focus(),new RegExp("kw=([^&]+)").test(location.search)&&0==e.value.length&&(e.value=decodeURIComponent(RegExp.$1)),document.cookie.indexOf("SUV=")<0&&(document.cookie="SUV="+o+";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+function(){var e=document.domain;return e.indexOf("sogou.com")==e.length-9?".sogou.com":e.indexOf("soso.com")==e.length-8?".soso.com":-1!=e.indexOf("sogo.com")?".sogo.com":void 0}()),n&&((new Image).src="//pb6.sogou.com/v6")},window.st=function(e,o,n,t){var s=document.sf.query,u=encodeURIComponent(s.value),i={news:"http://news.sogou.com/news?ie=utf8&query=",web:"web?ie=utf8&query=",weixin:"http://weixin.sogou.com/weixin?type=2&ie=utf8&query=",zhihu:"http://zhihu.sogou.com/zhihu?ie=utf8&query=",pic:"http://pic.sogou.com/pics?ie=utf8&query=",video:"https://v.sogou.com/v?ie=utf8&query=",myingyi:"https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=",overseas:"http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=",scholar:"http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=",fanyi:"http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=",wenwen:"http://wenwen.sogou.com/s/?ch=websearch&w=",hanyu:"https://hanyu.sogou.com/?query=",science:"https://baike.sogou.com/kexue/home.htm?query="},r=i[n]||e.href;function c(e){return-1<e.indexOf("?")?"&":"?"}s&&""!==s.value&&(["hanyu"].includes(n)?r=r.match(/.*(?=\\?query\\=)/)[0]+{hanyu:{index:"",result:"result"}}[n].result+"?query="+u:i[n]?r=i[n]+u:0<r.indexOf("kw=")?r=r.replace(new RegExp("kw=[^&$]*"),"kw="+u):r+=c(r)+"kw="+u),o&&(r+=c(r)+"p="+o),t&&0<t.length&&(r+="#"+t),!s||""!=s.value||"wenwen"!=n&&"science"!=n||(r=e.href),e.href=r},window.cid=function(e,o){var n=document.sf.query,t=encodeURIComponent(n.value);t?"web2ww"===o?e.href+="s/?cid=web2ww&w="+t:"web2bk"===o&&(e.href+="Search.e?sp=S"+t+"&cid=web2bk"):e.href+="?cid="+o},window.m_s_index()}indexjsInit(false, {"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}, true, \'invaliduser\', \'\', \'\');</script><script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/suggbase_b9937f7.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/js/common/widget/index_login_b1cc5cb.js"></script><script src="//account.sogou.com/static/api/passport-async.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/searchbase_453304b.js"></script> <script defer="defer" async type="text/javascript" src="//dlweb.sogoucdn.com/barrier_free/pc/wzaV15/aria.js?appid=c4d5562ec7daa12a5a351cbe1a292da1" charset="utf-8"></script></body></html><!--zly-->'
响应文本为: <!DOCTYPE html><html lang="cn"><head><meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date(); window.lead_ip = '123.147.244.130';
window.now = 1653966907968;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"×tamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\.css($|\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\.js($|\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>搜狗搜索引擎 - 上网从搜狗开始</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索"><meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索"><meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。"><link rel="stylesheet" type="text/css" href="//dlweb.sogoucdn.com/pcsearch/web/index/css/index_style_39e6e10.css"><style>.wrapper .suggestion{border:1px solid #e8e8e8;width:653px;-moz-box-shadow:0 1px 8px rgba(0,0,0,.1);-webkit-box-shadow:0 1px 8px rgba(0,0,0,.1);box-shadow:0 1px 8px rgba(0,0,0,.1);border-top-left-radius:0;border-top-right-radius:0;border-bottom-right-radius:2px;border-bottom-left-radius:2px;top:43px}.wrapper .suglist{width:206px}.wrapper .suglist .keyword{color:#7a77c8}.big-scn .suggestion{width:820px}.big-scn .suglist{width:236px}.wrapper .suglist{padding:4px 0}input[type=text]::-ms-clear{display:none}</style><!-- indexSnippetToHeader start --> <!-- indexSnippetToHeader end --></head><body color-style="white"><div class="wrapper " id="wrap"><div class="header"> <div class="top-nav"><ul><li class="cur"><span>网页</span></li><li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li><li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li><li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li><li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li><li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">医疗</a></li><li><a href="http://hanyu.sogou.com?fr=pcweb_index_nav" uigs-id="nav_hanyu" id="hanyu" onclick="st(this,'','hanyu')">汉语</a></li><li><a href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="fanyi" onclick="st(this,'','fanyi')">翻译</a></li><li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li><li><a onclick="st(this,'web2ww','baike')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_baike">百科</a></li><li><a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a></li><li class="show-more"><a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a><div class="pos-more" id="products-box" style="top:40px"><span class="ico-san"></span><a onclick="st(this,'40031500')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a><a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a><a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a><a href="https://baike.sogou.com/kexue/home.htm" uigs-id="nav_science" id="science">科学</a><span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span></div></li></ul></div><div class="user-box"> <a href="javascript:void(0)" id="cniil_wza" style="float:left;text-decoration:none;color:#000;opacity:.75;padding-right:20px;margin-right:20px;border-right:1px solid #e7e7e7;line-height:14px;position:relative;top:5px">无障碍</a> <div class="local-weather" id="local-weather"><div class="wea-box" id="cur-weather" style="display:none"></div> <div class="pos-more" id="detail-weather" style="top:40px;left:-80px"></div> </div><span class="line" id="user-box-line" style="display:none"></span><div class="user-enter"> <a href="javascript:void(0);" class="enter" id="loginBtn">登录</a> </div></div></div><div class="content" id="content"><div class="pos-header" id="top-float-bar"><div class="part-one"></div><div class="part-two" id="card-tab-layer"><div class="c-top" id="top-card-tab"></div></div></div><div class="logo2" id="logo-s"><span></span></div><div class="logo" id="logo-l"><span></span></div> <div class="search-box querybox-focus" id="search-box"><form action="/web" name="sf" id="sf"><span class="sec-input-box"><input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off"></span><span class="enter-input"><input type="submit" value="搜狗搜索" id="stb"></span><input type="hidden" name="_asf" value="www.sogou.com"> <input type="hidden" name="_ast"> <input type="hidden" name="w" value="01019900"> <input type="hidden" name="p" value="40040100"> <input type="hidden" name="ie" value="utf8"> <input type="hidden" name="from" value="index-nologin"> <input type="hidden" name="s_from" value="index"><div class="keywords-tips" id="keywordsTips" style="display:none"><i></i><p>“<strong id="keywordsTipsStrong">369</strong>”后面的文字被忽略,搜狗的查询限制在40个汉字以内。</p></div></form></div> </div><div class="card-box" id="card-box" style="display:none"><div class="card-box2" id="card-box2"><div class="c-top" id="card-tab-box"><a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card" class="shezhi"></a></div><div class="c-main" id="card-content"></div></div></div><div class="loog-more" id="scroll-more" style="display:none"><a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a></div><div class="ft" id="footer" style="display:none" ><a href="http://b.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span> / <span class="g">互联网药品信息服务资格证书(非经营性):(京)-非经营性-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a> / <span class="g">京网文(2019)6117-724号</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div> <div class="ft-v1" id="QRcode-footer" style="padding-bottom:28px"><div class="ft-info"><a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br><a href="http://b.sogou.com/" target="_blank" class="g">企业推广</a> - <a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a> - <a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a> - <a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span> / <span class="g">互联网药品信息服务资格证书(非经营性):(京)-非经营性-2018-0311</span><br>© 2004-2022 Sogou.com / <a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a> / <span class="g">京网文(2019)6117-724号</span> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a> / <a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a> / <a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div> <div class="fit-older"></div> </div> <div class="kuozhan" id="QRcode-box" style="display:none"><a href="javascript:void(0);" id="miniQRcode"></a><span id="QRcode"></span></div><a href="javascript:void(0);" class="back-top" id="back-top"></a></div> <script>var SugPara, uigs_para, msBrowserName = navigator.userAgent.toLowerCase(),msIsSe = false,msIsMSearch = false, hasDoodle = false, queryinput = document.getElementById('query');</script><script>/*file=static/js/indexjs.js*/function indexjsInit(e,o,n,t,s,u,i){var r={puid:t,cards:s,cards_sw:u,uigs_cookie:"SUID,sct,SUV"};function c(){try{window.external.metasearch("make_connection","www.google.com.hk")}catch(e){}}uigs_para={uigs_productid:"webapp",type:"webindex_new",stype:e?"login":"nologin",scrnwi:screen.width,scrnhi:screen.height,uigs_pbtag:"A",uigs_cookie:"SUID,sct",protocol:"https:"==location.protocol.toLowerCase()?"https":"http"},e&&(uigs_para=Object.assign(uigs_para,r)),window.loginCardConfig={},SugPara={queryboxid:"search-box",enableSug:!0,sugType:"web",domain:"w.sugg.sogou.com",productId:"web",sugFormName:"sf",inputid:"query",submitId:"stb",suggestRid:"01015002",normalRid:"01019900",useParent:1,sugglocation:"index",showVr:!0,showHotwords:!0,suggAbtestObject:o},/se 2\.x/i.test(msBrowserName)&&(msIsSe=!0),/metasr/i.test(msBrowserName)&&(msIsMSearch=!0),queryinput&&msIsSe&&msIsMSearch&&(queryinput.addEventListener?(queryinput.addEventListener("keypress",c,!1),queryinput.addEventListener("keydown",c,!1)):queryinput.attachEvent?(queryinput.attachEvent("onkeypress",c),queryinput.attachEvent("onkeydown",c)):(queryinput.onkeypress=c,queryinput.onkeydown=c)),window.m_s_index=function(){var e=document.sf.query,o=Math.round(1e3*((new Date).getTime()+Math.random()));e.focus(),new RegExp("kw=([^&]+)").test(location.search)&&0==e.value.length&&(e.value=decodeURIComponent(RegExp.$1)),document.cookie.indexOf("SUV=")<0&&(document.cookie="SUV="+o+";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+function(){var e=document.domain;return e.indexOf("sogou.com")==e.length-9?".sogou.com":e.indexOf("soso.com")==e.length-8?".soso.com":-1!=e.indexOf("sogo.com")?".sogo.com":void 0}()),n&&((new Image).src="//pb6.sogou.com/v6")},window.st=function(e,o,n,t){var s=document.sf.query,u=encodeURIComponent(s.value),i={news:"http://news.sogou.com/news?ie=utf8&query=",web:"web?ie=utf8&query=",weixin:"http://weixin.sogou.com/weixin?type=2&ie=utf8&query=",zhihu:"http://zhihu.sogou.com/zhihu?ie=utf8&query=",pic:"http://pic.sogou.com/pics?ie=utf8&query=",video:"https://v.sogou.com/v?ie=utf8&query=",myingyi:"https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=",overseas:"http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=",scholar:"http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=",fanyi:"http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=",wenwen:"http://wenwen.sogou.com/s/?ch=websearch&w=",hanyu:"https://hanyu.sogou.com/?query=",science:"https://baike.sogou.com/kexue/home.htm?query="},r=i[n]||e.href;function c(e){return-1<e.indexOf("?")?"&":"?"}s&&""!==s.value&&(["hanyu"].includes(n)?r=r.match(/.*(?=\?query\=)/)[0]+{hanyu:{index:"",result:"result"}}[n].result+"?query="+u:i[n]?r=i[n]+u:0<r.indexOf("kw=")?r=r.replace(new RegExp("kw=[^&$]*"),"kw="+u):r+=c(r)+"kw="+u),o&&(r+=c(r)+"p="+o),t&&0<t.length&&(r+="#"+t),!s||""!=s.value||"wenwen"!=n&&"science"!=n||(r=e.href),e.href=r},window.cid=function(e,o){var n=document.sf.query,t=encodeURIComponent(n.value);t?"web2ww"===o?e.href+="s/?cid=web2ww&w="+t:"web2bk"===o&&(e.href+="Search.e?sp=S"+t+"&cid=web2bk"):e.href+="?cid="+o},window.m_s_index()}indexjsInit(false, {"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}, true, 'invaliduser', '', '');</script><script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/suggbase_b9937f7.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/js/common/widget/index_login_b1cc5cb.js"></script><script src="//account.sogou.com/static/api/passport-async.js"></script> <script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/searchbase_453304b.js"></script> <script defer="defer" async type="text/javascript" src="//dlweb.sogoucdn.com/barrier_free/pc/wzaV15/aria.js?appid=c4d5562ec7daa12a5a351cbe1a292da1" charset="utf-8"></script></body></html><!--zly-->
请求头为: {'Server': 'nginx', 'Date': 'Tue, 31 May 2022 03:15:08 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Set-Cookie': 'ABTEST=7|1653966908|v17; expires=Thu, 30-Jun-22 03:15:08 GMT; path=/, IPLOC=CN5000; expires=Wed, 31-May-23 03:15:08 GMT; domain=.sogou.com; path=/, SUID=82F4937B364A910A000000006295883C; expires=Mon, 26-May-2042 03:15:08 GMT; domain=.sogou.com; path=/, black_passportid=; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT; domain=.sogou.com', 'P3P': 'CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR", CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR", CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"', 'Pragma': 'No-cache', 'Cache-Control': 'max-age=0', 'Expires': 'Tue, 31 May 2022 03:15:08 GMT', 'UUID': 'a09a4fc6-1144-4ddb-a028-df355ff57969', 'Content-Encoding': 'gzip'}
请求方式为: <PreparedRequest [GET]>
编码方式为: utf-8
请求网址url为: https://www.sogou.com/
cookies为: <RequestsCookieJar[<Cookie IPLOC=CN5000 for .sogou.com/>, <Cookie SUID=82F4937B364A910A000000006295883C for .sogou.com/>, <Cookie ABTEST=7|1653966908|v17 for www.sogou.com/>]>
状态码为: 200
响应类型为: <class 'requests.models.Response'>
内容响应类型为: <class 'bytes'>
文本响应类型为: <class 'str'>
2.1 获取请求方式
以搜狗为例:
在空白处右键点击“检查”或者按F12键。
将进入以下界面,1点击Network,2刷新,3选中Name下的第一个www.sogou.com。
进入之后,可查看URL,请求方式,请求头等信息。
?
?2.2 添加请求头
添加请求头进行伪装,处理一个小小的反爬。
以搜狗为例,在搜索框中输入“成龙”搜索,按F12键进入以下页面:
获取url和请求方式get?,编写爬虫程序:
未添加请求头信息时:
#未添加请求头时:
import requests
url = "https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653967846&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=674&sst0=1653967851220&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653967851220"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url)
print(response.text)#拿到源代码
运行结果如下:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
<link rel="shortcut icon" href="//www.sogou.com/images/logo/new/favicon.ico?v=4" type="image/x-icon">
<title>搜狗搜索</title>
<link rel="stylesheet" href="static/css/anti.min.css?v=1"/>
<script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
<script src="static/js/antispider.min.js?v=3"></script>
<script>
var domain = getDomain();
window.imgCode = -1;
(function() {
function checkSNUID() {
var cookieArr = document.cookie.split('; '),
count = 0;
for(var i = 0, len = cookieArr.length; i < len; i++) {
if (cookieArr[i].indexOf('SNUID=') > -1) {
count++;
}
}
return count > 1;
}
if(checkSNUID()) {
var date = new Date(), expires;
date.setTime(date.getTime() -100000);
expires = date.toGMTString();
document.cookie = 'SNUID=1;path=/;expires=' + expires;
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.www.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.weixin.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.sogo.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.www.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.weixin.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.snapshot.sogoucdn.com';
/*document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.zhinan.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.gouwu.sogou.com';
document.cookie = 'SNUID=1;path=/;expires=' + expires + ';domain=.ishop.sogou.com';*/
sendLog('delSNUID');
}
if(getCookie('seccodeRight') === 'success') {
sendLog('verifyLoop');
setCookie('seccodeRight', 1, getUTCString(-1), location.hostname, '/');
}
if(getCookie('refresh')) {
sendLog('refresh');
}
})();
function setImgCode(code) {
try {
var t = new Date().getTime() - imgRequestTime.getTime();
sendLog('imgCost',"cost="+t);
} catch (e) {
}
window.imgCode = code;
}
sendLog('index');
function changeImg2() {
if(window.event) {
window.event.returnValue=false
}
}
var suuid = "9321d62d-f547-4a1e-a150-c9527e7c82de";var auuid = "c918ed45-9536-4d74-b27b-6ca583856d4a"; </script>
</head>
<body>
<div class="header">
<div class="logo">
<a href="/">
<img width="180" height="60" src="static/images/logo_180x60.png" srcset="static/images/logo_180x60@2x.png 2x">
</a>
</div>
<div class="other"><span class="s1">您的访问出错了</span><span class="s2"><a href="/">返回首页>></a></span></div>
</div>
<div class="content-box">
<p class="ip-time-p">IP:123.147.244.130<br>访问时间:2022.05.31 14:37:58<br>SourceVerifyCode:c9527e7c82de<br>From:www.sogou.com</p>
<p class="p2">用户您好,我们的系统检测到您网络中存在异常访问请求。<br>此验证码用于确认这些请求是您的正常行为而不是自动程序发出的,需要您协助验证。</p>
<p class="p3"><label for="seccodeInput">验证码:</label></p>
<form name="authform" method="POST" id="seccodeForm" action="/">
<p class="p4">
<input type=text name="c" value="" placeholder="请输入验证码" id="seccodeInput" autocomplete="off">
<input type="hidden" name="tc" id="tc" value="">
<input type="hidden" name="r" id="from" value="%2Fweb%3Fquery%3D%E6%88%90%E9%BE%99%26_ast%3D1653967846%26_asf%3Dwww.sogou.com%26w%3D01029901%26p%3D40040100%26dp%3D1%26cid%3D%26s_from%3Dresult_up%26sut%3D674%26sst0%3D1653967851220%26lkt%3D0%2C0%2C0%26sugsuv%3D1653292431916060%26sugtime%3D1653967851220" >
<input type="hidden" name="p" id="product" value="web_gd" >
<input type="hidden" name="m" value="f9ab5bf7a9587003b95025fada8f5ce5" > <span class="s1">
<script>imgRequestTime=new Date();</script>
<a onclick="changeImg2();" href="javascript:void(0)">
<img id="seccodeImage" onload="setImgCode(1)" onerror="setImgCode(0)" src="util/seccode.php?tc=1653979078" width="100" height="40" alt="请输入图中的验证码" title="请输入图中的验证码">
</a>
</span>
<a href="javascript:void(0);" id="change-img" onclick="changeImg2();" style="padding-left:50px;">换一张</a>
<span class="s2" id="error-tips" style="display: none;"></span>
</p>
</form>
<p class="p5">
<a href="javascript:void(0);" id="submit">提交</a>
<span>提交后没解决问题?欢迎<a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com" target="_blank">反馈</a>。</span>
<!--span>提交后没解决问题?欢迎<a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com&verifycode=c9527e7c82de" target="_blank">反馈</a>。</span-->
</p>
</div>
<div id="ft"><a href="http://fuwu.sogou.com/" target="_blank">企业推广</a><a href="http://corp.sogou.com/" target="_blank">关于搜狗</a><a href="/docs/terms.htm?v=1" target="_blank">免责声明</a><a href="http://fankui.help.sogou.com/index.php/web/web/index?type=10&anti_time=1653979078&domain=www.sogou.com" target="_blank">意见反馈</a><br> © 2022<span id="footer-year"></span> Sogou Inc. - <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a> - 京公网安备1100<span class="ba">00000025号</span></div>
<script src="static/js/index.min.js?v=0.1.5"></script>
</body>
</html>
<!--zly-->
很明显有问题,没有出现“成龙”的有关信息。
解决方法:
1.获取请求头:
?2.添加请求头:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36'
}#请求头,伪装
url = "https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653967846&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=674&sst0=1653967851220&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653967851220"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url=url,headers=headers)
print(response.text)#拿到源代码
运行结果如下:
篇幅过长,放上结果截图查看,请谅解:
?
成功拿到源代码。?
不过我们会发现此时的url太长了,怎么处理呢?
#原网址:
url ='https://www.sogou.com/web?query=%E6%88%90%E9%BE%99&_ast=1653985908&_asf=www.sogou.com&w=01029901&p=40040108&dp=1&cid=&s_from=result_up&sut=916&sst0=1653985949798&lkt=0%2C0%2C0&sugsuv=1653292431916060&sugtime=1653985949798'
#处理后的网址:
url ='https://www.sogou.com/web?query=%E6%88%90%E9%BE%99'
#或者是:
url = 'https://www.sogou.com/web?query=成龙'
在对网址进行删减后回车发现进入的是同一个界面,因此我们就得到了精简版的网址。
?如果我还想搜索其他人怎么办呢?
我们先看一下原网页有什么特点:
原来我们得到的响应内容是由“query:成龙”控制的,于是更改代码如下:
import requests
query = input('输入一个明星的名字:')
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.41 Safari/537.36'
}#请求头,伪装
url = f"https://www.sogou.com/web?query={query}"#f,query,表示用f将变量query塞到url的字符串里
response = requests.get(url=url,headers=headers)#处理一个小小的反爬
print(response)
print(response.text)#拿到源代码
?运行结果如下:
成功了。?
3. 尾末福利:抓取精美图片
实战演练:
首先,获取图片下载地址:
没有其次,最后:
import requests
url = 'https://i01piccdn.sogoucdn.com/a2df911ea958c157'
response = requests.get(url)
with open('liuyifei.jpg', 'wb') as f: #在当前路径下创建liuyifei.jpg文件并打开作为f文件
f.write(response.content)
print("下载成功!")
?结果如下:
?
|