torchvision


原文链接: torchvision

浅谈深度学习训练中数据规范化(Normalization)的重要性 - Oldpan的个人博客

【pytorch】图像基本操作 - 知乎

  1. 在使用网络训练数据之前,需要对数据进行预处理


transform = transforms.Compose([
        transforms.RandomResizedCrop(100),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
  1. Normalize 一定是在 ToTensor 之后

归一化(Normalization)
说到重点了,我们在文章最开始说的归一化,其实在一组图中,每个图像的像素点首先减去所有图像均值的像素点,然后再除以标准差 z-score。这样可以保证所有的图像分布都相似,也就是在训练的时候更容易收敛,也就是训练的更快更好了。另外,不同图像像素点范围的mean和std是不一样的,一般我们输入的都是[0-1]或者[0-255]的图像数据,在pytorch的模型中,输入的是[0-1],而在caffe的模型中,我们输入的是[0-255]。

为了方便进行数据的操作,pytorch团队提供了一个torchvision.transforms包,我们可以用transforms进行以下操作:
PIL.Image/numpy.ndarray与Tensor的相互转化;
归一化;
对PIL.Image进行裁剪、缩放等操作。

通常,在使用torchvision.transforms,我们通常使用transforms.Compose将transforms组合在一起。

PIL.Image/numpy.ndarray与Tensor的相互转换
PIL.Image/numpy.ndarray转化为Tensor,常常用在训练模型阶段的数据读取,而Tensor转化为PIL.Image/numpy.ndarray则用在验证模型阶段的数据输出。
我们可以使用 transforms.ToTensor() 将 PIL.Image/numpy.ndarray 数据进转化为torch.FloadTensor,并归一化到[0, 1.0]:

取值范围为[0, 255]的PIL.Image,转换成形状为[C, H, W],取值范围是[0, 1.0]的torch.FloadTensor;
形状为[H, W, C]的numpy.ndarray,转换成形状为[C, H, W],取值范围是[0, 1.0]的torch.FloadTensor。
而transforms.ToPILImage则是将Tensor转化为PIL.Image。如果,我们要将Tensor转化为numpy,只需要使用 .numpy() 即可。如下:
img_path = "./data/img_37.jpg"

transforms.ToTensor()

transform1 = transforms.Compose([

transforms.ToTensor(), # range [0, 255] -> [0.0,1.0]
]

)

##numpy.ndarray
img = cv2.imread(img_path)# 读取图像
img1 = transform1(img) # 归一化到 [0.0,1.0]
print("img1 = ",img1)

转化为numpy.ndarray并显示

img_1 = img1.numpy()*255
img_1 = img_1.astype('uint8')
img_1 = np.transpose(img_1, (1,2,0))
cv2.imshow('img_1', img_1)
cv2.waitKey()

##PIL
img = Image.open(img_path).convert('RGB') # 读取图像
img2 = transform1(img) # 归一化到 [0.0,1.0]
print("img2 = ",img2)
#转化为PILImage并显示
img_2 = transforms.ToPILImage()(img2).convert('RGB')
print("img_2 = ",img_2)
img_2.show()

归一化
归一化对神经网络的训练是非常重要的,那么我们如何归一化到[-1.0, -1.0]呢?只需要将上面的transform1改为如下所示:

transform2 = transforms.Compose([

transforms.ToTensor(),
transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.5, 0.5, 0.5))
]

)
(1)transforms.Compose就是将transforms组合在一起;

(2)transforms.Normalize使用如下公式进行归一化:

channel=(channel-mean)/std

这样一来,我们的数据中的每个值就变成了[-1,1]的数了。

PIL.Image的缩放裁剪等操作
此外,transforms还提供了裁剪,缩放等操作,以便进行数据增强。下面就看一个随机裁剪的例子,这个例子中,仍然使用 Compose 将 transforms 组合在一起,如下:

transforms.RandomCrop()

transform4 = transforms.Compose([

transforms.ToTensor(), 
transforms.ToPILImage(),
transforms.RandomCrop((300,300)),
]

)

img = Image.open(img_path).convert('RGB')
img3 = transform4(img)
img3.show()
相关代码可以查看:tfygg/pytorch-tutorials

最后,安利一下pytorch中文文档。

首先介绍一种粘合剂Compose,将所有的变换按照给定顺序粘合起来:

class Compose(object):
    """Composes several transforms together.
    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.
    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img):
        for t in self.transforms:
            img = t(img)
        return img

可以看到所有的Transform构成一个list,Compose(object)是将所有的变换按照这个列表的顺序兑现。在上述繁杂的变换中,变换ToTensor(object)的作作用是转换数据类型,他可以当做是变换的分水岭,因为有一部分变换的操作对象是torch.*Tensor,另一部分变化的操作对象是PIL Image。下面看一下该变换:

class ToTensor(object):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self):
        return self.__class__.__name__ + '()'

转换ToTensor()实现了两个功能:

  1. 将图像的像素范围由[0,255]映射为[0,1];
  2. 将像素的组织顺序由numpy.ndarray的(H x W x C)或PIL格式的图像转换为 (C x H x W)。

对应地有另一种转换ToPILImage:

class ToPILImage(object):
    """Convert a tensor or an ndarray to PIL Image.
    Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
    H x W x C to a PIL Image while preserving the value range.
    Args:
        mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
            If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
            1. If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
            2. If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
            3. If the input has 1 channel, the ``mode`` is determined by the data type (i,e,
            ``int``, ``float``, ``short``).
    .. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
    """
    def __init__(self, mode=None):
        self.mode = mode

    def __call__(self, pic):
        """
        Args:
            pic (Tensor or numpy.ndarray): Image to be converted to PIL Image.
        Returns:
            PIL Image: Image converted to PIL Image.
        """
        return F.to_pil_image(pic, self.mode)

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        if self.mode is not None:
            format_string += 'mode={0}'.format(self.mode)
        format_string += ')'
        return format_string

关于归一化:

class Normalize(object):
    """Normalize a tensor image with mean and standard deviation.
    Given mean: ``(M1,...,Mn)`` and std: ``(S1,..,Sn)`` for ``n`` channels, this transform
    will normalize each channel of the input ``torch.*Tensor`` i.e.
    ``input[channel] = (input[channel] - mean[channel]) / std[channel]``
    .. note::
        This transform acts in-place, i.e., it mutates the input tensor.
    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
    """

    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def __call__(self, tensor):
        """
        Args:
            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
        Returns:
            Tensor: Normalized Tensor image.
        """
        return F.normalize(tensor, self.mean, self.std)

    def __repr__(self):
        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)

可以看出需要先通过 ToTensor()进行规范,然后才能通过Normalize()实现归一化。【主要是调整顺序】

看一下class Resize(object):的说明:

"""Resize the input PIL Image to the given size.
Args:

    size (sequence or int): Desired output size. If size is a sequence like
        (h, w), output size will be matched to this. If size is an int,
        smaller edge of the image will be matched to this number.
        i.e, if height > width, then image will be rescaled to
        (size * height / width, size)
    interpolation (int, optional): Desired interpolation. Default is
        PIL.Image.BILINEAR

"""
该变换用于调整图像的尺寸,调整对象为PIL Image。所以需要在变换ToTensor()之前。

接下来有个类似的变换class Scale(Resize),文档中的说法是:

Note: This transform is deprecated in favor of Resize.

然后是裁剪:

class CenterCrop(object):
    """Crops the given PIL Image at the center.
    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made.
    """

    def __init__(self, size):
        if isinstance(size, numbers.Number):
            self.size = (int(size), int(size))
        else:
            self.size = size

    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be cropped.
        Returns:
            PIL Image: Cropped image.
        """
        return F.center_crop(img, self.size)

    def __repr__(self):
        return self.__class__.__name__ + '(size={0})'.format(self.size)

变换CenterCrop(object)是从中心位置裁剪,对象为PIL Image。还有一个类似的变换RandomCrop(object),实现从任意位置的裁剪:

class RandomCrop(object):

"""Crop the given PIL Image at a random location.
Args:
    size (sequence or int): Desired output size of the crop. If size is an
        int instead of sequence like (h, w), a square crop (size, size) is
        made.
    padding (int or sequence, optional): Optional padding on each border
        of the image. Default is None, i.e no padding. If a sequence of length
        4 is provided, it is used to pad left, top, right, bottom borders
        respectively. If a sequence of length 2 is provided, it is used to
        pad left/right, top/bottom borders, respectively.
    pad_if_needed (boolean): It will pad the image if smaller than the
        desired size to avoid raising an exception.
    fill: Pixel fill value for constant fill. Default is 0. If a tuple of
        length 3, it is used to fill R, G, B channels respectively.
        This value is only used when the padding_mode is constant
    padding_mode: Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
         - constant: pads with a constant value, this value is specified with fill
         - edge: pads with the last value on the edge of the image
         - reflect: pads with reflection of image (without repeating the last value on the edge)
            padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
            will result in [3, 2, 1, 2, 3, 4, 3, 2]
         - symmetric: pads with reflection of image (repeating the last value on the edge)
            padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
            will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""

def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant'):
    if isinstance(size, numbers.Number):
        self.size = (int(size), int(size))
    else:
        self.size = size
    self.padding = padding
    self.pad_if_needed = pad_if_needed
    self.fill = fill
    self.padding_mode = padding_mode

@staticmethod
def get_params(img, output_size):
    """Get parameters for ``crop`` for a random crop.
    Args:
        img (PIL Image): Image to be cropped.
        output_size (tuple): Expected output size of the crop.
    Returns:
        tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
    """
    w, h = img.size
    th, tw = output_size
    if w == tw and h == th:
        return 0, 0, h, w

    i = random.randint(0, h - th)
    j = random.randint(0, w - tw)
    return i, j, th, tw

def __call__(self, img):
    """
    Args:
        img (PIL Image): Image to be cropped.
    Returns:
        PIL Image: Cropped image.
    """
    if self.padding is not None:
        img = F.pad(img, self.padding, self.fill, self.padding_mode)

    # pad the width if needed
    if self.pad_if_needed and img.size[0] < self.size[1]:
        img = F.pad(img, (self.size[1] - img.size[0], 0), self.fill, self.padding_mode)
    # pad the height if needed
    if self.pad_if_needed and img.size[1] < self.size[0]:
        img = F.pad(img, (0, self.size[0] - img.size[1]), self.fill, self.padding_mode)

    i, j, h, w = self.get_params(img, self.size)

    return F.crop(img, i, j, h, w)

def __repr__(self):
    return self.__class__.__name__ + '(size={0}, padding={1})'.format(self.size, self.padding)

除了上述两种常用的裁剪,剩下的裁剪方式有:

RandomResizedCrop(object):

Crop the given PIL Image to random size and aspect ratio.

RandomSizedCrop(RandomResizedCrop):

This transform is deprecated in favor of RandomResizedCrop.

FiveCrop(object):

Crop the given PIL Image into four corners and the central crop.

TenCrop(object):

Crop the given PIL Image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).

关于变换Pad(object):

class Pad(object):

"""Pad the given PIL Image on all sides with the given "pad" value.
Args:
    padding (int or tuple): Padding on each border. If a single int is provided this
        is used to pad all borders. If tuple of length 2 is provided this is the padding
        on left/right and top/bottom respectively. If a tuple of length 4 is provided
        this is the padding for the left, top, right and bottom borders
        respectively.
    fill (int or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
        length 3, it is used to fill R, G, B channels respectively.
        This value is only used when the padding_mode is constant
    padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
        Default is constant.
        - constant: pads with a constant value, this value is specified with fill
        - edge: pads with the last value at the edge of the image
        - reflect: pads with reflection of image without repeating the last value on the edge
            For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
            will result in [3, 2, 1, 2, 3, 4, 3, 2]
        - symmetric: pads with reflection of image repeating the last value on the edge
            For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
            will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""

def __init__(self, padding, fill=0, padding_mode='constant'):
    assert isinstance(padding, (numbers.Number, tuple))
    assert isinstance(fill, (numbers.Number, str, tuple))
    assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']
    if isinstance(padding, collections.Sequence) and len(padding) not in [2, 4]:
        raise ValueError("Padding must be an int or a 2, or 4 element tuple, not a " +
                         "{} element tuple".format(len(padding)))

    self.padding = padding
    self.fill = fill
    self.padding_mode = padding_mode

def __call__(self, img):
    """
    Args:
        img (PIL Image): Image to be padded.
    Returns:
        PIL Image: Padded image.
    """
    return F.pad(img, self.padding, self.fill, self.padding_mode)

def __repr__(self):
    return self.__class__.__name__ + '(padding={0}, fill={1}, padding_mode={2})'.\
        format(self.padding, self.fill, self.padding_mode)

除了官方提供的变换操作,还可以自定义变化:

class Lambda(object):

"""Apply a user-defined lambda as a transform.
Args:
    lambd (function): Lambda/function to be used for transform.
"""

def __init__(self, lambd):
    assert isinstance(lambd, types.LambdaType)
    self.lambd = lambd

def __call__(self, img):
    return self.lambd(img)

def __repr__(self):
    return self.__class__.__name__ + '()'

上述都是比较常用的变换。为了增加数据的多样性,随机性,使得训练更充分,具有更强的generalization,官方还提供了一系列的操作,主要是随机操作:

class RandomTransforms(object):

Base class for a list of transformations with randomness.

class RandomApply(RandomTransforms):

Apply randomly a list of transformations with a given probability.

class RandomOrder(RandomTransforms):

Apply a list of transformations in a random order.

class RandomChoice(RandomTransforms):

Apply single transformation randomly picked from a list.

class RandomHorizontalFlip(object):

Horizontally flip the given PIL Image randomly with a given probability.

class RandomVerticalFlip(object):

Vertically flip the given PIL Image randomly with a given probability.

class RandomResizedCrop(object):

Crop the given PIL Image to random size and aspect ratio.

class RandomSizedCrop(RandomResizedCrop):

This transform is deprecated in favor of RandomResizedCrop.

class LinearTransformation(object):

Transform a tensor image with a square transformation matrix computed offline.

class ColorJitter(object):

Randomly change the brightness, contrast and saturation of an image.

class RandomRotation(object):

Rotate the image by angle.

class RandomAffine(object):

Random affine transformation of the image keeping center invariant.

class Grayscale(object):

Convert image to grayscale.

class RandomGrayscale(object):

Randomly convert image to grayscale with a probability of p (default 0.1).

Reference:
https://github.com/pytorch/vision/blob/master/torchvision/transforms/transforms.py

https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose

https://blog.csdn.net/u014380165/article/details/79167753

另外,关于initcallrepr的说明,可以参考:
http://funhacks.net/explore-python/Class/magic_method.html,https://www.cnblogs.com/shengulong/p/7456435.html。

`