1. 下载Tesseract文本识别引擎
Tesseract是一个开源文本识别(OCR)引擎,可在Apache 2.0许可证下使用。
- 获取二进制文件
- 下载
- 下载完.exe之后,安装一路next,在自己喜欢的路径即可
2. pycharm下创建工程
- 在刚刚下载tesseract的文件夹下,复制好tesseract.exe的绝对路径
- 安装对应的依赖包(pytesseract opencv-python)
3. 检测字符
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'E:\\nodeanddata\\Python\\OpenCV\\Text_Detection_OCR\\TesseractModel\\tesseract.exe'
img = cv2.imread('E:/nodeanddata/Python/OpenCV/Text_Detection_OCR/image/03.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hImg, wImg,_ = img.shape
boxes = pytesseract.image_to_boxes(img)
for CharProperties in boxes.splitlines():
CharProperties = CharProperties.split(' ')
x, y, w, h = int(CharProperties[1]), int(CharProperties[2]), int(CharProperties[3]), int(CharProperties[4])
cv2.rectangle(img, (x,hImg- y), (w,hImg- h), (50, 50, 255), 1)
cv2.putText(img,CharProperties[0],(x,hImg- y+25),cv2.FONT_HERSHEY_SIMPLEX,1,(50,50,255),1)
cv2.imshow('result', img)
4. 检测单词
同理的,读取图片,转换,显示跟上面是一样的。 主要函数接口:pytesseract.image_to_data(img) 给出来的字符串经过分割之后,所在的列表的信息为,以下代码注释所表示
hImg, wImg,_ = img.shape
boxes = pytesseract.image_to_data(img)
for a,b in enumerate(boxes.splitlines()):
if a!=0:
b = b.split()
if len(b)==12:
x,y,w,h = int(b[6]),int(b[7]),int(b[8]),int(b[9])
cv2.rectangle(img, (x,y), (x+w, y+h), (50, 50, 255), 2)
在for循环boxes.splitlines() 中由于第一行是标题,我们不想使用第一行的信息,那么我们就一边遍历一边索引即可; 就是说索引出来的第一个0我们不用他,这相当于在for循环前面加了一个标志变量,自动把第一行的列表给忽略; 而python中的枚举可以很好的帮我们把索引的值给a。(enumerate()函数)
5. 只检查数字
hImg, wImg,_ = img.shape
conf = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_boxes(img,config=conf)
for b in boxes.splitlines():
b = b.split(' ')
x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4])
cv2.rectangle(img, (x,hImg- y), (w,hImg- h), (50, 50, 255), 2)
cv2.putText(img,b[0],(x,hImg- y+25),cv2.FONT_HERSHEY_SIMPLEX,1,(50,50,255),2)
conf = r'--oem 3 --psm 6 outputbase digits' ,通过编写配置的搜索引擎来配置pytesseract要检测的是什么
Member name | Value | Description |
OEM_TESSERACT_ONLY | 0 | Run Tesseract only - fastest | OEM_CUBE_ONLY | 1 | Run Cube only-better accuracy, but slower | OEM_TESSERACT_CUBE_COMBINED | 2 | Run both and combine results - best accuracy | OEM_DEFAULT | 3 | Specify this mode when calling init_*0, to indicat that any of the abovemodes should be automatically inferred from the variables in the language-specific config, |
- PSM:页面分割:
0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
https://www.bilibili.com/video/BV18B4y1c7r4?p=1 python的tesseract库几个重要的命令 Python——enumerate()函数用法总结