torchvision

by · 2019年09月19日 · 4345 Words · ~9min reading time | Improve on

浅谈深度学习训练中数据规范化(Normalization)的重要性 - Oldpan的个人博客

在使用网络训练数据之前，需要对数据进行预处理

transform = transforms.Compose([
        transforms.RandomResizedCrop(100),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

Normalize 一定是在 ToTensor 之后

归一化(Normalization)
说到重点了，我们在文章最开始说的归一化，其实在一组图中，每个图像的像素点首先减去所有图像均值的像素点，然后再除以标准差 z-score。这样可以保证所有的图像分布都相似，也就是在训练的时候更容易收敛，也就是训练的更快更好了。另外，不同图像像素点范围的mean和std是不一样的，一般我们输入的都是[0-1]或者[0-255]的图像数据，在pytorch的模型中，输入的是[0-1]，而在caffe的模型中，我们输入的是[0-255]。

为了方便进行数据的操作，pytorch团队提供了一个torchvision.transforms包，我们可以用transforms进行以下操作：
PIL.Image/numpy.ndarray与Tensor的相互转化；
归一化；
对PIL.Image进行裁剪、缩放等操作。

通常，在使用torchvision.transforms，我们通常使用transforms.Compose将transforms组合在一起。

PIL.Image/numpy.ndarray与Tensor的相互转换
PIL.Image/numpy.ndarray转化为Tensor，常常用在训练模型阶段的数据读取，而Tensor转化为PIL.Image/numpy.ndarray则用在验证模型阶段的数据输出。
我们可以使用 transforms.ToTensor() 将 PIL.Image/numpy.ndarray 数据进转化为torch.FloadTensor，并归一化到[0, 1.0]：

取值范围为[0, 255]的PIL.Image，转换成形状为[C, H, W]，取值范围是[0, 1.0]的torch.FloadTensor；
形状为[H, W, C]的numpy.ndarray，转换成形状为[C, H, W]，取值范围是[0, 1.0]的torch.FloadTensor。
而transforms.ToPILImage则是将Tensor转化为PIL.Image。如果，我们要将Tensor转化为numpy，只需要使用 .numpy() 即可。如下：
img_path = "./data/img_37.jpg"

transforms.ToTensor()

transform1 = transforms.Compose([

transforms.ToTensor(), # range [0, 255] -> [0.0,1.0]
]

)

##numpy.ndarray
img = cv2.imread(img_path)# 读取图像
img1 = transform1(img) # 归一化到 [0.0,1.0]
print("img1 = ",img1)

转化为numpy.ndarray并显示

img_1 = img1.numpy()*255
img_1 = img_1.astype('uint8')
img_1 = np.transpose(img_1, (1,2,0))
cv2.imshow('img_1', img_1)
cv2.waitKey()

##PIL
img = Image.open(img_path).convert('RGB') # 读取图像
img2 = transform1(img) # 归一化到 [0.0,1.0]
print("img2 = ",img2)
#转化为PILImage并显示
img_2 = transforms.ToPILImage()(img2).convert('RGB')
print("img_2 = ",img_2)
img_2.show()

归一化
归一化对神经网络的训练是非常重要的，那么我们如何归一化到[-1.0, -1.0]呢？只需要将上面的transform1改为如下所示：

transform2 = transforms.Compose([

transforms.ToTensor(),
transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.5, 0.5, 0.5))
]

)
（1）transforms.Compose就是将transforms组合在一起；

（2）transforms.Normalize使用如下公式进行归一化：

channel=（channel-mean）/std

这样一来，我们的数据中的每个值就变成了[-1,1]的数了。

PIL.Image的缩放裁剪等操作
此外，transforms还提供了裁剪，缩放等操作，以便进行数据增强。下面就看一个随机裁剪的例子，这个例子中，仍然使用 Compose 将 transforms 组合在一起，如下：

transforms.RandomCrop()

transform4 = transforms.Compose([

transforms.ToTensor(), 
transforms.ToPILImage(),
transforms.RandomCrop((300,300)),
]

)

img = Image.open(img_path).convert('RGB')
img3 = transform4(img)
img3.show()
相关代码可以查看：tfygg/pytorch-tutorials

最后，安利一下pytorch中文文档。

首先介绍一种粘合剂Compose，将所有的变换按照给定顺序粘合起来：

class Compose(object):
    """Composes several transforms together.
    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.
    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img):
        for t in self.transforms:
            img = t(img)
        return img

可以看到所有的Transform构成一个list，Compose(object)是将所有的变换按照这个列表的顺序兑现。在上述繁杂的变换中，变换ToTensor(object)的作作用是转换数据类型，他可以当做是变换的分水岭，因为有一部分变换的操作对象是torch.*Tensor，另一部分变化的操作对象是PIL Image。下面看一下该变换：

class ToTensor(object):
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self):
        return self.__class__.__name__ + '()'

转换ToTensor()实现了两个功能：

将图像的像素范围由[0,255]映射为[0,1]；
将像素的组织顺序由numpy.ndarray的(H x W x C)或PIL格式的图像转换为 (C x H x W)。

对应地有另一种转换ToPILImage：

class ToPILImage(object):
    """Convert a tensor or an ndarray to PIL Image.
    Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
    H x W x C to a PIL Image while preserving the value range.
    Args:
        mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
            If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
            1. If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
            2. If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
            3. If the input has 1 channel, the ``mode`` is determined by the data type (i,e,
            ``int``, ``float``, ``short``).
    .. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
    """
    def __init__(self, mode=None):
        self.mode = mode

    def __call__(self, pic):
        """
        Args:
            pic (Tensor or numpy.ndarray): Image to be converted to PIL Image.
        Returns:
            PIL Image: Image converted to PIL Image.
        """
        return F.to_pil_image(pic, self.mode)

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        if self.mode is not None:
            format_string += 'mode={0}'.format(self.mode)
        format_string += ')'
        return format_string

关于归一化：

class Normalize(object):
    """Normalize a tensor image with mean and standard deviation.
    Given mean: ``(M1,...,Mn)`` and std: ``(S1,..,Sn)`` for ``n`` channels, this transform
    will normalize each channel of the input ``torch.*Tensor`` i.e.
    ``input[channel] = (input[channel] - mean[channel]) / std[channel]``
    .. note::
        This transform acts in-place, i.e., it mutates the input tensor.
    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
    """

    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def __call__(self, tensor):
        """
        Args:
            tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
        Returns:
            Tensor: Normalized Tensor image.
        """
        return F.normalize(tensor, self.mean, self.std)

    def __repr__(self):
        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)

可以看出需要先通过 ToTensor()进行规范，然后才能通过Normalize()实现归一化。【主要是调整顺序】

看一下class Resize(object):的说明：

"""Resize the input PIL Image to the given size.
Args:

    size (sequence or int): Desired output size. If size is a sequence like
        (h, w), output size will be matched to this. If size is an int,
        smaller edge of the image will be matched to this number.
        i.e, if height > width, then image will be rescaled to
        (size * height / width, size)
    interpolation (int, optional): Desired interpolation. Default is
        PIL.Image.BILINEAR

"""
该变换用于调整图像的尺寸，调整对象为PIL Image。所以需要在变换ToTensor()之前。

接下来有个类似的变换class Scale(Resize)，文档中的说法是：

Note: This transform is deprecated in favor of Resize.

然后是裁剪：

class CenterCrop(object):
    """Crops the given PIL Image at the center.
    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made.
    """

    def __init__(self, size):
        if isinstance(size, numbers.Number):
            self.size = (int(size), int(size))
        else:
            self.size = size

    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be cropped.
        Returns:
            PIL Image: Cropped image.
        """
        return F.center_crop(img, self.size)

    def __repr__(self):
        return self.__class__.__name__ + '(size={0})'.format(self.size)

变换CenterCrop(object)是从中心位置裁剪，对象为PIL Image。还有一个类似的变换RandomCrop(object)，实现从任意位置的裁剪：

class RandomCrop(object):

"""Crop the given PIL Image at a random location.
Args:
    size (sequence or int): Desired output size of the crop. If size is an
        int instead of sequence like (h, w), a square crop (size, size) is
        made.
    padding (int or sequence, optional): Optional padding on each border
        of the image. Default is None, i.e no padding. If a sequence of length
        4 is provided, it is used to pad left, top, right, bottom borders
        respectively. If a sequence of length 2 is provided, it is used to
        pad left/right, top/bottom borders, respectively.
    pad_if_needed (boolean): It will pad the image if smaller than the
        desired size to avoid raising an exception.
    fill: Pixel fill value for constant fill. Default is 0. If a tuple of
        length 3, it is used to fill R, G, B channels respectively.
        This value is only used when the padding_mode is constant
    padding_mode: Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
         - constant: pads with a constant value, this value is specified with fill
         - edge: pads with the last value on the edge of the image
         - reflect: pads with reflection of image (without repeating the last value on the edge)
            padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
            will result in [3, 2, 1, 2, 3, 4, 3, 2]
         - symmetric: pads with reflection of image (repeating the last value on the edge)
            padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
            will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""

def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant'):
    if isinstance(size, numbers.Number):
        self.size = (int(size), int(size))
    else:
        self.size = size
    self.padding = padding
    self.pad_if_needed = pad_if_needed
    self.fill = fill
    self.padding_mode = padding_mode

@staticmethod
def get_params(img, output_size):
    """Get parameters for ``crop`` for a random crop.
    Args:
        img (PIL Image): Image to be cropped.
        output_size (tuple): Expected output size of the crop.
    Returns:
        tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
    """
    w, h = img.size
    th, tw = output_size
    if w == tw and h == th:
        return 0, 0, h, w

    i = random.randint(0, h - th)
    j = random.randint(0, w - tw)
    return i, j, th, tw

def __call__(self, img):
    """
    Args:
        img (PIL Image): Image to be cropped.
    Returns:
        PIL Image: Cropped image.
    """
    if self.padding is not None:
        img = F.pad(img, self.padding, self.fill, self.padding_mode)

    # pad the width if needed
    if self.pad_if_needed and img.size[0] < self.size[1]:
        img = F.pad(img, (self.size[1] - img.size[0], 0), self.fill, self.padding_mode)
    # pad the height if needed
    if self.pad_if_needed and img.size[1] < self.size[0]:
        img = F.pad(img, (0, self.size[0] - img.size[1]), self.fill, self.padding_mode)

    i, j, h, w = self.get_params(img, self.size)

    return F.crop(img, i, j, h, w)

def __repr__(self):
    return self.__class__.__name__ + '(size={0}, padding={1})'.format(self.size, self.padding)

除了上述两种常用的裁剪，剩下的裁剪方式有：

RandomResizedCrop(object)：

Crop the given PIL Image to random size and aspect ratio.

RandomSizedCrop(RandomResizedCrop)：

This transform is deprecated in favor of RandomResizedCrop.

FiveCrop(object)：

Crop the given PIL Image into four corners and the central crop.

TenCrop(object)：

Crop the given PIL Image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).

关于变换Pad(object)：

class Pad(object):

"""Pad the given PIL Image on all sides with the given "pad" value.
Args:
    padding (int or tuple): Padding on each border. If a single int is provided this
        is used to pad all borders. If tuple of length 2 is provided this is the padding
        on left/right and top/bottom respectively. If a tuple of length 4 is provided
        this is the padding for the left, top, right and bottom borders
        respectively.
    fill (int or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
        length 3, it is used to fill R, G, B channels respectively.
        This value is only used when the padding_mode is constant
    padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
        Default is constant.
        - constant: pads with a constant value, this value is specified with fill
        - edge: pads with the last value at the edge of the image
        - reflect: pads with reflection of image without repeating the last value on the edge
            For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
            will result in [3, 2, 1, 2, 3, 4, 3, 2]
        - symmetric: pads with reflection of image repeating the last value on the edge
            For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
            will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""

def __init__(self, padding, fill=0, padding_mode='constant'):
    assert isinstance(padding, (numbers.Number, tuple))
    assert isinstance(fill, (numbers.Number, str, tuple))
    assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']
    if isinstance(padding, collections.Sequence) and len(padding) not in [2, 4]:
        raise ValueError("Padding must be an int or a 2, or 4 element tuple, not a " +
                         "{} element tuple".format(len(padding)))

    self.padding = padding
    self.fill = fill
    self.padding_mode = padding_mode

def __call__(self, img):
    """
    Args:
        img (PIL Image): Image to be padded.
    Returns:
        PIL Image: Padded image.
    """
    return F.pad(img, self.padding, self.fill, self.padding_mode)

def __repr__(self):
    return self.__class__.__name__ + '(padding={0}, fill={1}, padding_mode={2})'.\
        format(self.padding, self.fill, self.padding_mode)

除了官方提供的变换操作，还可以自定义变化：

class Lambda(object):

"""Apply a user-defined lambda as a transform.
Args:
    lambd (function): Lambda/function to be used for transform.
"""

def __init__(self, lambd):
    assert isinstance(lambd, types.LambdaType)
    self.lambd = lambd

def __call__(self, img):
    return self.lambd(img)

def __repr__(self):
    return self.__class__.__name__ + '()'

上述都是比较常用的变换。为了增加数据的多样性，随机性，使得训练更充分，具有更强的generalization，官方还提供了一系列的操作，主要是随机操作：

class RandomTransforms(object)：

Base class for a list of transformations with randomness.

class RandomApply(RandomTransforms)：

Apply randomly a list of transformations with a given probability.

class RandomOrder(RandomTransforms)：

Apply a list of transformations in a random order.

class RandomChoice(RandomTransforms)：

Apply single transformation randomly picked from a list.

class RandomHorizontalFlip(object)：

Horizontally flip the given PIL Image randomly with a given probability.

class RandomVerticalFlip(object)：

Vertically flip the given PIL Image randomly with a given probability.

class RandomResizedCrop(object)：

Crop the given PIL Image to random size and aspect ratio.

class RandomSizedCrop(RandomResizedCrop)：

This transform is deprecated in favor of RandomResizedCrop.

class LinearTransformation(object)：

Transform a tensor image with a square transformation matrix computed offline.

class ColorJitter(object)：

Randomly change the brightness, contrast and saturation of an image.

class RandomRotation(object)：

Rotate the image by angle.

class RandomAffine(object)：

Random affine transformation of the image keeping center invariant.

class Grayscale(object)：

Convert image to grayscale.

class RandomGrayscale(object)：

Randomly convert image to grayscale with a probability of p (default 0.1).

Reference：
https://github.com/pytorch/vision/blob/master/torchvision/transforms/transforms.py

https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose

https://blog.csdn.net/u014380165/article/details/79167753

另外，关于init，call和repr的说明，可以参考：
http://funhacks.net/explore-python/Class/magic_method.html，https://www.cnblogs.com/shengulong/p/7456435.html。

torchvision

transforms.ToTensor()

转化为numpy.ndarray并显示

transforms.RandomCrop()

然后是裁剪：

分类

标签云