#python安全开发课程第三章第一节–web篇 注释:本课为python2.7版本的代码,在python3里面request库已经变为web标准库,没有了urllib库。
HTTP协议简述
http协议是超文本传输 ,规定浏览器和服务器的通信规则, 发送request请求,得到response响应。下面介绍一下请求和响应的格式
>>> curl www.baidu.com -v
out: > GET / HTTP/1.1 //请求行
Host: www.baidu.com //请求头
User-Agent: curl/7.52.1 //请求的客户端(很多时候我们要伪造成浏览器)
//空白行,代表请求头结束
... //请求正文,也可能没有
HTTP/1.1 200 OK //响应行
Server: bfe/1.0.8.18 //响应头,参数较多,不知道的参数百百度就可以了
Date: Sun, 04 Jun 2017 06:31:02 GMT
Content-Type: text/html
Content-Length: 2381
Last-Modified: Mon, 23 Jan 2017 13:27:36 GMT
Connection: Keep-Alive
ETag: "588604c8-94d"
Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
Pragma: no-cache
Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/
Accept-Ranges: bytes
<html> //响应正文
...
web常用库介绍
python 的web库稍微乱点,urllib urllib2 urllib3 requests(这个库没有的自己装一下),为啥这么乱,历史原因。我们这里主要讲解urllib2和requests两个库,urllib2是标准库,所有python版本都带,requests是最适合人类的库了,python3里面它变成了标准库
urllib urllib2用法介绍
urlib2将是我们做安全人员最常用的模块,当然还是能用requests就用requests ####1.一个简单的get请求
>>> import urllib2
>>> f=urllib2.urlopen('http://www.sina.com')
>>> f.read()
>>> f.readline()
>>>f.readlines()
一次post请求
import urllib
import urllib2
url='http://127.0.0.1/post.php'
header={}
header['User-Agent']='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
test_data = {'name':'timo','address':'china'}
test_data_urlencode = urllib.urlencode(test_data)
r=urllib2.Request(url,headers=header,data=test_data_urlencode)
f=urllib2.urlopen(r)
f.readlines()
r.close()
一个复杂的请求头可能是这个样子
x = "ASPSESSIONIDQCCBRQDA=PAAEILKAFGNBCKPECJMAKPMM; readlist=196%2C25%2C294%2C340%2C307; orderlist=-1)%3Bif(ascii(substring(DB_NAME(),"+str(i)+",1))>"+str(ord(payload))+")%20waitfor%20delay%20'0:0:10'%20--%20; Hm_lvt_2fb608bf1ad8b9b8ce2e04b58003184e=1465491620,1465491620,1465491621,1465491621; Hm_lpvt_2fb608bf1ad8b9b8ce2e04b58003184e=1465491621; HMACCOUNT=CE7
7C9E3D0040FF7"
headers = {
'Host': 'research.51job.com',
'Accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,ja;q=0.4',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'Keep-alive',
'Cookie': x
}
cookies库cookielib
上面是通过header来传递我们的cookies,那个方法需要我们事先使用其他方式知道cookies的存在。其实还是有一个专门处理cookielib的库跳转到这里 ####URL异常处理 实际上我们写网络代码的时候会遇到各种无法连接的状况,这个时候我们可以自己主动抛出异常,也可以让库函数帮我们抛出异常
import urllib2
requset = urllib2.Request('http://www.xxxxx.com')
try:
urllib2.urlopen(requset)
except urllib2.URLError, e:
print e.reason
except urllib2.HTTPError, e:
print e.code
print e.reason
requests库用法介绍
requests号称是最适合人类的web库(某网站原话:非专业使用其他 HTTP 库会导致危险的副作用,包括:安全缺陷症、冗余代码症、重新发明轮子症、啃文档症、抑郁、头疼、甚至死亡)这里建议大家能用requests就优先使用requests
1.requests标准用法
import requests
r=requests.get('http://www.sina.com')
r=requests.post('http://www.sina.com')
r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")
2.一个定制请求头的get请求用法
>>> import requests
>>> url = 'http://www.sina.com'
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
3.一个post请求
>>> payload = {'name': 'timo', 'address': 'china'}
>>> r = requests.post("http://127.0.0.1/post.php", data=payload)
>>> print r.text
4.request cookies的使用
requests 使用cookies登录相对还是简单的,用urllib相对复杂,出了问题需要分析
cookies={xxxxxxx}
for line in raw_cookies.split(';'):
key,value=line.split('=',1)
cookies[key]=value
cookie=requests.utils.cookiejar_from_dict(cookies, cookiejar=None, overwrite=True)
url = 'http://httpbin.org/cookies'
r = requests.get(url, cookies=cookie)
r.text
requests是一个比较大的库,更多的想法用法用时查询就好详细requests文档
5.request 的cookies的一次使用举例
>>> url='http://localhost/vulnerabilities/sqli/?id=1&Submit=Submit#'
>>> dict={'PHPSESSID':'idhe1rqalbukqufca2buqt72e7','security':'low'}
>>> cookie=requests.utils.cookiejar_from_dict(dict,cookiejar=None,overwrite=True)
>>> r=requests.get(url,cookies=cookie)
>>> r.text
6.一次模拟登录,长连接
这个不太好演示了,因为现在一般网站登录都需要token了,这样纯post提交无法真正登录了,有的时候及时没有token等防止脚本登录的方式,post模拟登录也出现不奏效,这里还是建议使用cookies登录,如下例子代码是演示前程无忧的脚本登录
模板
import requests
s = requests.session()
data = {'email':'用户名','password':'密码'}
s.post('http://www.renren.com/PLogin.do',data)
r = s.get("http://www.renren.com")
print r.text
实例
注释:这是很多年前的51job模拟登陆的代码,当前可能不再适用,学习思路吧
import requests
LOGIN_URL = 'https://login.51job.com/login.php?lang=c'
headers = {
'Host': 'login.51job.com',
'Origin': 'https://login.51job.com',
'Referer': 'https://login.51job.com/login.php?lang=c',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
}
if __name__ == '__main__':
data = {
'loginname': 'xxxxxxx@qq.com',
'password': 'xxxxxxxxxxxxxxxx',
'lang': 'c',
'action': 'save',
'from_domain': 'i',
'isread': 'on',
'verifycode': ''
}
s = requests.Session()
res = s.post(LOGIN_URL, data=data, headers=headers)
r = s.get(res.url)
print(r.url)
print(res.url)
7.要上传文件
可以使用file参数发送Multipart-encoded数据,file参数是{ ‘name’: file-like-objects}格式的字典 (or {‘name’😦‘filename’, fileobj)})
>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
"file": "<censored...binary...data>"
},
...
}
8.一次上传多个文件
>>> url = 'http://httpbin.org/post'
>>> multiple_files = [('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
...
'files': {'images': 'data:image/png;base64,iVBORw ....'}
'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',
...
}
一个post提交模拟登录
这是模拟豆瓣的登录,豆瓣没有使用csrf,token,js检测都防脚本登录的措施,比较好写,这是很多年前的豆瓣登陆方式,当下可能不再适用
import requests
from bs4 import BeautifulSoup
import re
from PIL import Image
import os
def loginin():
global session
session = requests.Session()
url='https://www.douban.com/accounts/login'
name='bazhinv000@126.com'
psw='a1234567'
headers={
"Host":"www.douban.com",
"User-Agent":"'Mozilla/5.0 (Windows NT 6.1; rv:53.0)Gecko/20100101 Firefox/53.0'",
"Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding":"gzip,deflate",
"Connection":"keep-alive"
}
data={
'form_email':name,
'form_password':psw,
'source':'index_nav',
'remember':'on'}
captcha=session.get(url,headers=headers,timeout=30)
soup=BeautifulSoup(captcha.content,'lxml')
img=soup.find_all('img',id='captcha_image')
print img
if img:
captcha_url=re.findall('src="(.*?)"',str(img))[0]
print u'验证码所在标签为:',captcha_url
a=captcha_url.split('&')[0]
capid=a.split('=')[1]
print capid
cap=session.get(captcha_url,headers=headers).content
with open('captcha.jpg','wb') as f:
f.write(cap)
f.close()
im = Image.open('captcha.jpg')
im.show()
capimg=raw_input('请输入验证码:')
newdata={
'captcha-solution':capimg,
'captcha-id':capid}
data.update(newdata)
print data
os.remove('captcha.jpg')
else:
print '不存在验证码,请直接登陆'
r=session.post(url,data=data,headers=headers,timeout=30)
print r.content
if __name__=='__main__':
loginin()
一次带token的模拟登录
有些网站防脚本爬虫登录使用很多手段,有基本的用U-A检测,还有使用token,还有使用js脚本来预防的,像加速乐,直接使用js获取固定信息再次重定向的等等,下面是模拟dvwa的登录
import requests
from bs4 import BeautifulSoup
user='admin'
pwd='password'
cookies=' '
def getToken():
global cookies
'''获取必须的token值'''
url = 'http://localhost/login.php'
r = requests.get(url=url)
soup = BeautifulSoup(r.content, 'lxml')
res = str(soup.find_all(name='input', attrs={'type': 'hidden'}))
if 'Set-Cookie' in r.headers:
sessionid = r.headers['Set-Cookie'][0:36]
cookie = "security=low; " + sessionid
cookies= cookie
else:
pass
return res[-36:-4]
def getHeaders(cookie):
'''构造包含phpsessionid的headers头信息'''
global cookies
headers = {
'Host': 'localhost',
'Content-Length': '89',
'Origin': 'http://localhost',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Referer': 'http://localhost/login.php',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.8',
'Connection': 'close'
}
headers['Cookie'] = cookie
return headers
def httpLogin(user, pwd, token, session):
'''登录请求,并相应location参数值'''
headers = getHeaders(session)
url = 'http://localhost/login.php'
payload = {'username': user, 'password': pwd, 'Login': 'Login', 'user_token': token}
req = requests.post(url=url, data=payload, allow_redirects=False,headers = headers)
addr = req.headers['Location']
return addr
def main():
token = getToken()
addr = httpLogin(user, pwd, token, cookies)
if addr == "login.php":
print "账号:%s 密码:%s is error!" % (user,pwd)
else:
print "账号:%s 密码:%s is correct!" % (user,pwd)
if __name__ == '__main__':
main()
###课后作业 1.阅读request完整的官方文档
|