目录
一、目标网址分析
二、代码部分
学习记录:
一、目标网址分析
目标网址:化妆品生产许可信息管理系统服务平台?, 进入网址:
data:image/s3,"s3://crabby-images/6c59e/6c59e15ef37a67171766b327c69f1eb127d75571" alt=""
?随便点击一个公司,会跳转一个新的页面:想要收集的数据也在里面。
data:image/s3,"s3://crabby-images/7a078/7a07883a73422f60b876aa9610f8bfbdc484d285" alt=""
现在,可以先看看网页的源码:
data:image/s3,"s3://crabby-images/b151d/b151d4f3d04c5c2592bbde97dd392ed9a81a22ed" alt=""
在这里ctr+f定位我们想要的数据,发现并没有搜索到,再此处找到数据:
data:image/s3,"s3://crabby-images/6585a/6585a5a752513f992182674a656822b5c6debaf3" alt=""
?再对其检查:
data:image/s3,"s3://crabby-images/d48c3/d48c3dd61e14051a1811a37cb6c4a2c13a0ef4b0" alt=""
?参数:多找几页后
data:image/s3,"s3://crabby-images/90597/90597d6cf4376d3db8777734e998be9f7a270b77" alt=""
?打开返回给我们的数据:
data:image/s3,"s3://crabby-images/ed2f6/ed2f68ffa7730577afa3d2d8fa60046f65428504" alt=""
?来到其详情页:
data:image/s3,"s3://crabby-images/a2143/a2143a120d4757d16ba7146d77efd24e52870ab3" alt=""
?我们将返回的json数据复制出来,提取数据:
data:image/s3,"s3://crabby-images/91545/91545f743807c825ada5029435334ba28c5cc684" alt=""
?发现是个字典形式:
data:image/s3,"s3://crabby-images/4e3f7/4e3f729f21a1d0bd6253f38aaebaf813b24bca73" alt=""
?哦可,可以代码了
二、代码部分
1)构造主页面的url参数,并通过requests拿到json 数据:
import requests
import csv
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
}
def get_down(page):
url = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsList'
data = {
'on': 'true',
'page': page,
'pageSize': 15,
'productName': '',
'conditionType': 1,
'applyname': '',
'applysn': ''
}
resp = requests.post(url=url, headers=headers, data=data).json()
提取出详情页需要的ID值:
dict_list = resp['list']
for ids in dict_list:
_id = ids['ID']
2)对详情页请求:
url2 = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsById'
data2 = {
'id': _id
}
resp2 = requests.post(url=url2, headers=headers, data=data2).json()
?3)提取数据:
com_name = resp2['epsName'] # 企业名称
number = resp2['productSn'] # 许可证编号
allow_p = resp2['certStr'] # 生产许可证项目
epsAddress = resp2['epsAddress'] # 企业住址
epsProductAddress = resp2['epsProductAddress'] # 生产地址
businessLicenseNumber = resp2['businessLicenseNumber'] # 信用编号
legalPerson = resp2['legalPerson'] # 法定代表人
qfManagerName = resp2['qfManagerName'] # 发证机关
xkDate = resp2['xkDate'] # 有效期至
xkDateStr = resp2['xkDateStr'] # 发证日期
总代吗:
"""
2022年
CSDN:抄代码抄错的小牛马
"""
import requests
import csv
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
}
def get_down(page):
url = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsList'
data = {
'on': 'true',
'page': page,
'pageSize': 15,
'productName': '',
'conditionType': 1,
'applyname': '',
'applysn': ''
}
resp = requests.post(url=url, headers=headers, data=data).json()
dict_list = resp['list']
for ids in dict_list:
_id = ids['ID']
url2 = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?method=getXkzsById'
data2 = {
'id': _id
}
resp2 = requests.post(url=url2, headers=headers, data=data2).json()
f = open('数据.csv', mode='a', encoding='utf-8', newline='') # newline=''可以免空行写入
csvwriter = csv.writer(f)
com_name = resp2['epsName'] # 企业名称
number = resp2['productSn'] # 许可证编号
allow_p = resp2['certStr'] # 生产许可证项目
epsAddress = resp2['epsAddress'] # 企业住址
epsProductAddress = resp2['epsProductAddress'] # 生产地址
businessLicenseNumber = resp2['businessLicenseNumber'] # 信用编号
legalPerson = resp2['legalPerson'] # 法定代表人
qfManagerName = resp2['qfManagerName'] # 发证机关
xkDate = resp2['xkDate'] # 有效期至
xkDateStr = resp2['xkDateStr'] # 发证日期
csvwriter.writerow([com_name, number, allow_p, epsAddress,
epsProductAddress, businessLicenseNumber, legalPerson, qfManagerName, xkDate,
xkDateStr]) # 列表写入
f.close()
def main():
stat_page = int(input("请输入起始页码:"))
end_page = int(input("请输入结束页码:"))
for page in range(stat_page, end_page + 1):
get_down(page)
if __name__ == '__main__':
main()
运行看看:
data:image/s3,"s3://crabby-images/80216/802166d7b36eaf6e7e8b20c5628930f86c401b65" alt=""
?data:image/s3,"s3://crabby-images/26cb0/26cb067ba303dc432ce04ef8e5f582dc74467555" alt=""
记录~~~
data:image/s3,"s3://crabby-images/19a8b/19a8b307a59b74f5e63c9bf551a9a56b35207b1e" alt=""
?
|