在写博客方面,我有一点点成长了,现在能够把话讲得更简洁了,而且能够让别人复用我代码的难度降低了,这得记录一下,哈哈哈哈
之前写过一篇增值税发票ocr的博客,是我写的这些文章里获赞最多的,哈哈哈,虽然也只有几个赞了,近期有读者问到一些问题,比如access_token怎么获取,我就回过头来看了一下,当初写的确实太糙了,对新手玩家不是太友好,今天把代码封装成类,做成傻瓜版的了,整体逻辑是没有什么变动的,做了一些用户体验的优化,比如说原来需要读者去找三个参数,有的参数还比较难找,现在只要两个参数了,对输出文件也做了优化,做了一下居中对齐,字体设置之类的,总的上手难度降到了较低的水平,也希望能够帮到有需要的伙伴,提升效率是我们共同的追求,加油!!!
我直接上代码了,如果有想看逻辑以及参数获取的小伙伴还是得去看那篇文章:《python实现批量增值税发票文字识别(ocr)
你的代码文件下还需要有个fapiao文件夹,用于存放需要识别的发票
import requests
import base64
import os
import xlwt
import datetime
class fapiao_OCR:
'''
这是一个增值税发票识别的类,
在代码同一文件夹下新建一个文件夹fapiao,把需要识别的发票放在这个文件夹下,
用户只需提供AK和SK就能调用了,
'''
def __init__(self,AK,SK):
self.AK = AK
self.SK = SK
def get_access_token(self):
try:
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + self.AK + '&client_secret=' + self.SK
response = requests.get(host)
if response:
return response.json()['access_token']
except Exception as e:
print(e)
def get_context(self,pic):
data = {}
try:
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice"
f = open(pic, 'rb')
img = base64.b64encode(f.read())
params = {"image": img}
access_token = self.get_access_token()
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
json1 = response.json()
data['SellerRegisterNum'] = json1['words_result']['SellerRegisterNum']
data['InvoiceDate'] = json1['words_result']['InvoiceDate']
data['PurchasserName'] = json1['words_result']['PurchaserName']
data['SellerName'] = json1['words_result']['SellerName']
data['AmountInFiguers'] = json1['words_result']['AmountInFiguers']
return data
except Exception as e:
print(e)
return data
def pics(self,path):
print('正在生成图片路径')
pics = []
for filename in os.listdir(path):
if filename.endswith('jpg') or filename.endswith('png'):
pic = path + '/' + filename
pics.append(pic)
print('图片路径生成成功!')
return pics
def datas(self,pics):
datas = []
for p in pics:
data = self.get_context(p)
datas.append(data)
return datas
def save(self,datas):
print('正在写入数据!')
book = xlwt.Workbook(encoding='utf-8', style_compression=0)
sheet = book.add_sheet('增值税发票内容登记', cell_overwrite_ok=True)
style = xlwt.XFStyle()
alignment = xlwt.Alignment()
alignment.horz = 2
font = xlwt.Font()
font.name = 'Calibri'
font.height = 200
style.font = font
style.alignment = alignment
title = ['开票日期', '纳税人识别号', '购买方名称', '卖方名称', '购买金额']
for i in range(len(title)):
sheet.col(i).width = 7777
sheet.write(0, i, title[i],style)
for d in range(len(datas)):
for j in range(5):
sheet.write(d + 1, 0, datas[d]['InvoiceDate'],style)
sheet.write(d + 1, 1, datas[d]['SellerRegisterNum'],style)
sheet.write(d + 1, 2, datas[d]['PurchasserName'],style)
sheet.write(d + 1, 3, datas[d]['SellerName'],style)
sheet.write(d + 1, 4, datas[d]['AmountInFiguers'],style)
print('数据写入成功!')
now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
book.save(now+'增值税发票.xls')
def main(self):
print('开始执行!!!')
path = './fapiao'
Pics = self.pics(path)
Datas = self.datas(Pics)
self.save(Datas)
print('执行结束!')
if __name__ == '__main__':
AK = ''
SK = ''
fapiao = fapiao_OCR(AK,SK)
fapiao.main()
使用这段代码的门槛已经降低到只需要你会安装第三方库和会打开网页注册账号就行了,当然你还得会运行python代码哦!!!
感谢你的阅读,我们下次见!!!
|