资讯

历史

科技

环境与自然

成长

游戏

财经

文学与艺术

美食

健康

家居

文化

情感

汽车

三农

军事

旅行

运动

教育

生活

星座命理

从零开始：手把手教你实现高效OCR文档识别系统

创作时间:

作者:

@小白创作中心

从零开始：手把手教你实现高效OCR文档识别系统

引用

CSDN

https://blog.csdn.net/weixin_43413871/article/details/146135078

本文将手把手教你从零开始构建一个高效的OCR（光学字符识别）文档识别系统。通过本教程，你将学习到OCR技术的基础知识、实现步骤以及性能优化技巧，并通过完整的代码示例掌握如何使用Python实现OCR系统。

OCR技术简介

什么是OCR？
定义：光学字符识别（Optical Character Recognition）是一种将图像中的文字转换为可编辑文本的技术。
应用场景：文档数字化、车牌识别、票据处理等。
OCR的工作原理：
图像预处理 → 文字检测 → 文字识别 → 后处理。

实现OCR的常用工具与库

Tesseract OCR：
开源、跨平台的OCR引擎，支持多种语言。
安装方法及配置。
Pytesseract：
Python对Tesseract的封装，便于集成到Python项目中。
OpenCV：
用于图像预处理（如灰度化、二值化、去噪等）。
其他工具：
Google Cloud Vision API、AWS Textract等商业解决方案的简要对比。

环境搭建

安装依赖：
安装Tesseract OCR引擎。
安装Python库：
pytesseract
、
opencv-python
。
验证安装：
测试简单的OCR功能，确保环境配置正确。

手把手实现OCR文档识别

步骤1：加载图像

使用OpenCV读取图像文件。
示例代码：

import cv2
# 加载图像
image = cv2.imread('example.jpg')
cv2.imshow('Original Image', image)
cv2.waitKey(0)

步骤2：图像预处理

灰度化、二值化、去噪等操作。
示例代码：

# 转换为灰度图像
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 二值化处理
_, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)
# 显示处理后的图像
cv2.imshow('Binary Image', binary)
cv2.waitKey(0)

步骤3：调用Tesseract进行文字识别

使用
pytesseract
提取文字。
示例代码：

import pytesseract
# 设置Tesseract路径（如果需要）
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
# 提取文字
text = pytesseract.image_to_string(binary)
print("识别结果：", text)

步骤4：后处理

去除多余空格、标点符号错误等。
示例代码：

# 去除多余空格
cleaned_text = ' '.join(text.split())
print("清理后的文本：", cleaned_text)

性能优化与常见问题

性能优化：
使用GPU加速Tesseract。
调整图像分辨率和预处理参数。
常见问题及解决方法：
图像质量差导致识别率低。
多语言混合文本的处理。

项目代码

import numpy as np
import argparse
import cv2

def order_points(pts):
    # 按顺序找到对应坐标0123分别是 左上，右上，右下，左下
    rect = np.zeros((4, 2), dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]  # 左上角
    rect[2] = pts[np.argmax(s)]  # 右下角
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]  # 右上角
    rect[3] = pts[np.argmax(diff)]  # 左下角
    return rect

def four_point_transform(image, pts):
    # 获取输入坐标点并进行透视变换
    rect = order_points(pts)
    (tl, tr, br, bl) = rect
    # 计算变换后的宽度和高度
    width_a = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    width_b = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    max_width = max(int(width_a), int(width_b))
    height_a = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    height_b = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    max_height = max(int(height_a), int(height_b))
    # 定义目标图像的四个顶点
    dst = np.array([
        [0, 0],
        [max_width - 1, 0],
        [max_width - 1, max_height - 1],
        [0, max_height - 1]], dtype="float32")
    # 计算变换矩阵并应用透视变换
    m = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, m, (max_width, max_height))
    return warped

def resize_image(image, width=None, height=None, interpolation=cv2.INTER_AREA):
    # 根据给定的高度或宽度调整图像大小
    dim = None
    h, w = image.shape[:2]
    if width is None and height is None:
        return image
    if width is None:
        r = height / float(h)
        dim = (int(w * r), height)
    else:
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=interpolation)
    return resized

def preprocess_image(image):
    # 对图像进行预处理：灰度转换、高斯模糊和边缘检测
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(gray, 75, 200)
    return gray, edged

def find_contours(edged):
    # 查找图像中的轮廓并按面积排序
    contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]
    contours = sorted(contours, key=cv2.contourArea, reverse=True)[:5]
    return contours

def get_screen_contour(contours):
    # 找到最接近矩形的轮廓
    for contour in contours:
        perimeter = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.02 * perimeter, True)
        if len(approx) == 4:
            return approx
    return None

def main(image_path):
    # 解析命令行参数
    original_image = cv2.imread(image_path)
    ratio = original_image.shape[0] / 500.0
    resized_image = resize_image(original_image, height=500)
    # 图像预处理
    gray_image, edged_image = preprocess_image(resized_image)
    print("STEP 1: Edge Detection")
    cv2.imshow("Image", resized_image)
    cv2.imshow("Edged", edged_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    # 查找轮廓
    contours = find_contours(edged_image)
    screen_contour = get_screen_contour(contours)
    print("STEP 2: Find Contours")
    cv2.drawContours(resized_image, [screen_contour], -1, (0, 255, 0), 2)
    cv2.imshow("Outline", resized_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    # 进行透视变换并保存结果
    if screen_contour is not None:
        transformed_image = four_point_transform(original_image, screen_contour.reshape(4, 2) * ratio)
        binary_image = cv2.cvtColor(transformed_image, cv2.COLOR_BGR2GRAY)
        binary_ref = cv2.threshold(binary_image, 100, 255, cv2.THRESH_BINARY)[1]
        cv2.imwrite('data/scan.jpg', binary_ref)
        print("STEP 3: Perspective Transform")
        cv2.imshow("Original", resize_image(original_image, height=650))
        cv2.imshow("Scanned", resize_image(binary_ref, height=650))
        cv2.waitKey(0)
        cv2.destroyAllWindows()
    else:
        print("Could not find document edges.")

if __name__ == "__main__":
    main("data/receipt.jpg")

经过上面图片的的预处将不规则带有噪点的图片优化，能有效的提高下一步的orc识别准确率。

from PIL import Image
import pytesseract
import cv2
import os

preprocess = 'blur'  # thresh
image = cv2.imread('data/scan.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
if preprocess == "thresh":
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
if preprocess == "blur":
    gray = cv2.medianBlur(gray, 3)
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename))
print(text)
os.remove(filename)
cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)