YOLOv5预处理详解：官方实现与自定义实现对比

创作时间:

作者:

@小白创作中心

YOLOv5预处理详解：官方实现与自定义实现对比

引用

来源

https://www.cnblogs.com/wancy/p/18746134

YOLOv5是当前主流的目标检测模型之一，其预处理流程对于模型的输入数据有着重要的影响。本文将详细介绍YOLOv5的预处理步骤，并通过官方实现和自定义实现两种方式，帮助读者深入理解这一过程。

1. YOLOv5预处理流程

YOLOv5的预处理主要包括以下几个步骤：

等比缩放与填充（Letterbox）：将输入图像等比缩放到目标尺寸（如640×640），并在多余部分填充灰条，保持图像的宽高比不变。
颜色空间转换：将图像从BGR格式转换为RGB格式（OpenCV默认读取为BGR）。
归一化：将像素值从[0, 255]归一化到[0, 1]，通过除以255.0实现。
通道顺序调整：将图像的通道顺序从HWC（高度、宽度、通道）调整为CHW（通道、高度、宽度）。
添加批次维度：将调整后的图像数据扩展为四维张量（B, C, H, W），其中B表示批次大小，通常为1。

2. 官方预处理实现

import math
import cv2
import numpy as np

def preprocess_image(image, img_size=640):
    # image为cv2读取的BGR图
    h0, w0 = image.shape[:2] # orig hw
    r = img_size / max(h0, w0) # ratio
    if r != 1: # if sizes are not equal
        interp = cv2.INTER_LINEAR if r > 1 else cv2.INTER_AREA # 上采样 or 下采样
        image = cv2.resize(image, (math.ceil(w0 * r), math.ceil(h0 * r)), interpolation=interp) # 以较小的比例缩放图像
    h1, w1 = image.shape[:2] # 缩放后的图
    # 填充
    dw = (img_size - w1) / 2 # 要填充的宽度的一半
    dh = (img_size - h1) / 2 # 要填充的高度的一半
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    color = (114, 114, 114) # padding颜色
    im = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
    # BGR to and RGB HWC to CHW
    im = im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
    im = np.ascontiguousarray(im)
    # 归一化[0,1],添加批次维度
    input_image = np.expand_dims(im, axis=0).astype(np.float32) / 255.0
    return input_image, (h0, w0), im.shape[1:3] # im, hw_original, hw_resized

if __name__ == '__main__':
    image = cv2.imread(r'flower.png')
    print(image.shape) # (424, 359, 3)
    input_image, size, input_image_shape = preprocess_image(image, img_size=640)
    print(input_image.shape) # (1, 3, 640, 640)
    print(size) # (424, 359)
    print(input_image_shape) # (1, 3, 640, 640)

输入为flower.png，大小为h=424,w=359（左图，这里缩小了显示了），经过预处理后变为了右图。

3. 自定义预处理实现

import cv2
import numpy as np

def preprocess_image(image, img_size=640, padding_color_value=(114, 114, 114)):
    '''
    :param image: 原始图
    :param img_size: 模型的输入尺寸大小:640x640
    :return:
    '''
    # 获取原始图像的尺寸
    h, w = image.shape[:2]
    # 计算缩放比例
    scale = min(img_size / w, img_size / h)
    # 计算新的尺寸
    new_w = int(w * scale)
    new_h = int(h * scale)
    # 重新调整大小
    img_resized = cv2.resize(image, (new_w, new_h))
    # 创建一个目标尺寸的新图像，填充颜色为黑色
    img_padded = np.zeros((img_size, img_size, 3), dtype=np.uint8) + padding_color_value
    # 将调整后的图像放入新图像中
    img_padded[(img_size - new_h) // 2:(img_size + new_h) // 2,
               (img_size - new_w) // 2:(img_size + new_w) // 2] = img_resized
    # 转换颜色通道 BGR -> RGB
    img_padded = img_padded[..., ::-1]
    # 归一化到 [0, 1]
    img_padded = img_padded.astype(np.float32) / 255.0
    # 添加批次维度
    img_padded = np.expand_dims(img_padded, axis=0)
    # 调整维度顺序 NCHW
    img_padded = np.transpose(img_padded, (0, 3, 1, 2)) # 从 NHWC 转换为 NCHW
    return img_padded

if __name__ == '__main__':
    image = cv2.imread(r'flower.png')
    print(image.shape) # (424, 359, 3)
    img_padded = preprocess_image(image, img_size=640)
    print(img_padded.shape) # (1, 3, 640, 640)

效果与上图差不多。