torchvision
浅谈深度学习训练中数据规范化(Normalization)的重要性 - Oldpan的个人博客
- 在使用网络训练数据之前,需要对数据进行预处理
transform = transforms.Compose([
transforms.RandomResizedCrop(100),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
- Normalize 一定是在 ToTensor 之后
归一化(Normalization)
说到重点了,我们在文章最开始说的归一化,其实在一组图中,每个图像的像素点首先减去所有图像均值的像素点,然后再除以标准差 z-score。这样可以保证所有的图像分布都相似,也就是在训练的时候更容易收敛,也就是训练的更快更好了。另外,不同图像像素点范围的mean和std是不一样的,一般我们输入的都是[0-1]或者[0-255]的图像数据,在pytorch的模型中,输入的是[0-1],而在caffe的模型中,我们输入的是[0-255]。
为了方便进行数据的操作,pytorch团队提供了一个torchvision.transforms包,我们可以用transforms进行以下操作:
PIL.Image/numpy.ndarray与Tensor的相互转化;
归一化;
对PIL.Image进行裁剪、缩放等操作。
通常,在使用torchvision.transforms,我们通常使用transforms.Compose将transforms组合在一起。
PIL.Image/numpy.ndarray与Tensor的相互转换
PIL.Image/numpy.ndarray转化为Tensor,常常用在训练模型阶段的数据读取,而Tensor转化为PIL.Image/numpy.ndarray则用在验证模型阶段的数据输出。
我们可以使用 transforms.ToTensor() 将 PIL.Image/numpy.ndarray 数据进转化为torch.FloadTensor,并归一化到[0, 1.0]:
取值范围为[0, 255]的PIL.Image,转换成形状为[C, H, W],取值范围是[0, 1.0]的torch.FloadTensor;
形状为[H, W, C]的numpy.ndarray,转换成形状为[C, H, W],取值范围是[0, 1.0]的torch.FloadTensor。
而transforms.ToPILImage则是将Tensor转化为PIL.Image。如果,我们要将Tensor转化为numpy,只需要使用 .numpy() 即可。如下:
img_path = "./data/img_37.jpg"
transforms.ToTensor()
transform1 = transforms.Compose([
transforms.ToTensor(), # range [0, 255] -> [0.0,1.0]
]
)
##numpy.ndarray
img = cv2.imread(img_path)# 读取图像
img1 = transform1(img) # 归一化到 [0.0,1.0]
print("img1 = ",img1)
转化为numpy.ndarray并显示
img_1 = img1.numpy()*255
img_1 = img_1.astype('uint8')
img_1 = np.transpose(img_1, (1,2,0))
cv2.imshow('img_1', img_1)
cv2.waitKey()
##PIL
img = Image.open(img_path).convert('RGB') # 读取图像
img2 = transform1(img) # 归一化到 [0.0,1.0]
print("img2 = ",img2)
#转化为PILImage并显示
img_2 = transforms.ToPILImage()(img2).convert('RGB')
print("img_2 = ",img_2)
img_2.show()
归一化
归一化对神经网络的训练是非常重要的,那么我们如何归一化到[-1.0, -1.0]呢?只需要将上面的transform1改为如下所示:
transform2 = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.5, 0.5, 0.5))
]
)
(1)transforms.Compose就是将transforms组合在一起;
(2)transforms.Normalize使用如下公式进行归一化:
channel=(channel-mean)/std
这样一来,我们的数据中的每个值就变成了[-1,1]的数了。
PIL.Image的缩放裁剪等操作
此外,transforms还提供了裁剪,缩放等操作,以便进行数据增强。下面就看一个随机裁剪的例子,这个例子中,仍然使用 Compose 将 transforms 组合在一起,如下:
transforms.RandomCrop()
transform4 = transforms.Compose([
transforms.ToTensor(),
transforms.ToPILImage(),
transforms.RandomCrop((300,300)),
]
)
img = Image.open(img_path).convert('RGB')
img3 = transform4(img)
img3.show()
相关代码可以查看:tfygg/pytorch-tutorials
最后,安利一下pytorch中文文档。
首先介绍一种粘合剂Compose,将所有的变换按照给定顺序粘合起来:
class Compose(object):
"""Composes several transforms together.
Args:
transforms (list of ``Transform`` objects): list of transforms to compose.
Example:
>>> transforms.Compose([
>>> transforms.CenterCrop(10),
>>> transforms.ToTensor(),
>>> ])
"""
def __init__(self, transforms):
self.transforms = transforms
def __call__(self, img):
for t in self.transforms:
img = t(img)
return img
可以看到所有的Transform构成一个list,Compose(object)是将所有的变换按照这个列表的顺序兑现。在上述繁杂的变换中,变换ToTensor(object)的作作用是转换数据类型,他可以当做是变换的分水岭,因为有一部分变换的操作对象是torch.*Tensor,另一部分变化的操作对象是PIL Image。下面看一下该变换:
class ToTensor(object):
"""Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
Converts a PIL Image or numpy.ndarray (H x W x C) in the range
[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
"""
def __call__(self, pic):
"""
Args:
pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
Returns:
Tensor: Converted image.
"""
return F.to_tensor(pic)
def __repr__(self):
return self.__class__.__name__ + '()'
转换ToTensor()实现了两个功能:
- 将图像的像素范围由[0,255]映射为[0,1];
- 将像素的组织顺序由numpy.ndarray的(H x W x C)或PIL格式的图像转换为 (C x H x W)。
对应地有另一种转换ToPILImage:
class ToPILImage(object):
"""Convert a tensor or an ndarray to PIL Image.
Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape
H x W x C to a PIL Image while preserving the value range.
Args:
mode (`PIL.Image mode`_): color space and pixel depth of input data (optional).
If ``mode`` is ``None`` (default) there are some assumptions made about the input data:
1. If the input has 3 channels, the ``mode`` is assumed to be ``RGB``.
2. If the input has 4 channels, the ``mode`` is assumed to be ``RGBA``.
3. If the input has 1 channel, the ``mode`` is determined by the data type (i,e,
``int``, ``float``, ``short``).
.. _PIL.Image mode: https://pillow.readthedocs.io/en/latest/handbook/concepts.html#concept-modes
"""
def __init__(self, mode=None):
self.mode = mode
def __call__(self, pic):
"""
Args:
pic (Tensor or numpy.ndarray): Image to be converted to PIL Image.
Returns:
PIL Image: Image converted to PIL Image.
"""
return F.to_pil_image(pic, self.mode)
def __repr__(self):
format_string = self.__class__.__name__ + '('
if self.mode is not None:
format_string += 'mode={0}'.format(self.mode)
format_string += ')'
return format_string
关于归一化:
class Normalize(object):
"""Normalize a tensor image with mean and standard deviation.
Given mean: ``(M1,...,Mn)`` and std: ``(S1,..,Sn)`` for ``n`` channels, this transform
will normalize each channel of the input ``torch.*Tensor`` i.e.
``input[channel] = (input[channel] - mean[channel]) / std[channel]``
.. note::
This transform acts in-place, i.e., it mutates the input tensor.
Args:
mean (sequence): Sequence of means for each channel.
std (sequence): Sequence of standard deviations for each channel.
"""
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, tensor):
"""
Args:
tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
Returns:
Tensor: Normalized Tensor image.
"""
return F.normalize(tensor, self.mean, self.std)
def __repr__(self):
return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)
可以看出需要先通过 ToTensor()进行规范,然后才能通过Normalize()实现归一化。【主要是调整顺序】
看一下class Resize(object):的说明:
"""Resize the input PIL Image to the given size.
Args:
size (sequence or int): Desired output size. If size is a sequence like
(h, w), output size will be matched to this. If size is an int,
smaller edge of the image will be matched to this number.
i.e, if height > width, then image will be rescaled to
(size * height / width, size)
interpolation (int, optional): Desired interpolation. Default is
PIL.Image.BILINEAR
"""
该变换用于调整图像的尺寸,调整对象为PIL Image。所以需要在变换ToTensor()之前。
接下来有个类似的变换class Scale(Resize),文档中的说法是:
Note: This transform is deprecated in favor of Resize.
然后是裁剪:
class CenterCrop(object):
"""Crops the given PIL Image at the center.
Args:
size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is
made.
"""
def __init__(self, size):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
def __call__(self, img):
"""
Args:
img (PIL Image): Image to be cropped.
Returns:
PIL Image: Cropped image.
"""
return F.center_crop(img, self.size)
def __repr__(self):
return self.__class__.__name__ + '(size={0})'.format(self.size)
变换CenterCrop(object)是从中心位置裁剪,对象为PIL Image。还有一个类似的变换RandomCrop(object),实现从任意位置的裁剪:
class RandomCrop(object):
"""Crop the given PIL Image at a random location.
Args:
size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is
made.
padding (int or sequence, optional): Optional padding on each border
of the image. Default is None, i.e no padding. If a sequence of length
4 is provided, it is used to pad left, top, right, bottom borders
respectively. If a sequence of length 2 is provided, it is used to
pad left/right, top/bottom borders, respectively.
pad_if_needed (boolean): It will pad the image if smaller than the
desired size to avoid raising an exception.
fill: Pixel fill value for constant fill. Default is 0. If a tuple of
length 3, it is used to fill R, G, B channels respectively.
This value is only used when the padding_mode is constant
padding_mode: Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value on the edge of the image
- reflect: pads with reflection of image (without repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image (repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""
def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant'):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
self.padding = padding
self.pad_if_needed = pad_if_needed
self.fill = fill
self.padding_mode = padding_mode
@staticmethod
def get_params(img, output_size):
"""Get parameters for ``crop`` for a random crop.
Args:
img (PIL Image): Image to be cropped.
output_size (tuple): Expected output size of the crop.
Returns:
tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
"""
w, h = img.size
th, tw = output_size
if w == tw and h == th:
return 0, 0, h, w
i = random.randint(0, h - th)
j = random.randint(0, w - tw)
return i, j, th, tw
def __call__(self, img):
"""
Args:
img (PIL Image): Image to be cropped.
Returns:
PIL Image: Cropped image.
"""
if self.padding is not None:
img = F.pad(img, self.padding, self.fill, self.padding_mode)
# pad the width if needed
if self.pad_if_needed and img.size[0] < self.size[1]:
img = F.pad(img, (self.size[1] - img.size[0], 0), self.fill, self.padding_mode)
# pad the height if needed
if self.pad_if_needed and img.size[1] < self.size[0]:
img = F.pad(img, (0, self.size[0] - img.size[1]), self.fill, self.padding_mode)
i, j, h, w = self.get_params(img, self.size)
return F.crop(img, i, j, h, w)
def __repr__(self):
return self.__class__.__name__ + '(size={0}, padding={1})'.format(self.size, self.padding)
除了上述两种常用的裁剪,剩下的裁剪方式有:
RandomResizedCrop(object):
Crop the given PIL Image to random size and aspect ratio.
RandomSizedCrop(RandomResizedCrop):
This transform is deprecated in favor of RandomResizedCrop.
FiveCrop(object):
Crop the given PIL Image into four corners and the central crop.
TenCrop(object):
Crop the given PIL Image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).
关于变换Pad(object):
class Pad(object):
"""Pad the given PIL Image on all sides with the given "pad" value.
Args:
padding (int or tuple): Padding on each border. If a single int is provided this
is used to pad all borders. If tuple of length 2 is provided this is the padding
on left/right and top/bottom respectively. If a tuple of length 4 is provided
this is the padding for the left, top, right and bottom borders
respectively.
fill (int or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
length 3, it is used to fill R, G, B channels respectively.
This value is only used when the padding_mode is constant
padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric.
Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value at the edge of the image
- reflect: pads with reflection of image without repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image repeating the last value on the edge
For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
will result in [2, 1, 1, 2, 3, 4, 4, 3]
"""
def __init__(self, padding, fill=0, padding_mode='constant'):
assert isinstance(padding, (numbers.Number, tuple))
assert isinstance(fill, (numbers.Number, str, tuple))
assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']
if isinstance(padding, collections.Sequence) and len(padding) not in [2, 4]:
raise ValueError("Padding must be an int or a 2, or 4 element tuple, not a " +
"{} element tuple".format(len(padding)))
self.padding = padding
self.fill = fill
self.padding_mode = padding_mode
def __call__(self, img):
"""
Args:
img (PIL Image): Image to be padded.
Returns:
PIL Image: Padded image.
"""
return F.pad(img, self.padding, self.fill, self.padding_mode)
def __repr__(self):
return self.__class__.__name__ + '(padding={0}, fill={1}, padding_mode={2})'.\
format(self.padding, self.fill, self.padding_mode)
除了官方提供的变换操作,还可以自定义变化:
class Lambda(object):
"""Apply a user-defined lambda as a transform.
Args:
lambd (function): Lambda/function to be used for transform.
"""
def __init__(self, lambd):
assert isinstance(lambd, types.LambdaType)
self.lambd = lambd
def __call__(self, img):
return self.lambd(img)
def __repr__(self):
return self.__class__.__name__ + '()'
上述都是比较常用的变换。为了增加数据的多样性,随机性,使得训练更充分,具有更强的generalization,官方还提供了一系列的操作,主要是随机操作:
class RandomTransforms(object):
Base class for a list of transformations with randomness.
class RandomApply(RandomTransforms):
Apply randomly a list of transformations with a given probability.
class RandomOrder(RandomTransforms):
Apply a list of transformations in a random order.
class RandomChoice(RandomTransforms):
Apply single transformation randomly picked from a list.
class RandomHorizontalFlip(object):
Horizontally flip the given PIL Image randomly with a given probability.
class RandomVerticalFlip(object):
Vertically flip the given PIL Image randomly with a given probability.
class RandomResizedCrop(object):
Crop the given PIL Image to random size and aspect ratio.
class RandomSizedCrop(RandomResizedCrop):
This transform is deprecated in favor of RandomResizedCrop.
class LinearTransformation(object):
Transform a tensor image with a square transformation matrix computed offline.
class ColorJitter(object):
Randomly change the brightness, contrast and saturation of an image.
class RandomRotation(object):
Rotate the image by angle.
class RandomAffine(object):
Random affine transformation of the image keeping center invariant.
class Grayscale(object):
Convert image to grayscale.
class RandomGrayscale(object):
Randomly convert image to grayscale with a probability of p (default 0.1).
Reference:
https://github.com/pytorch/vision/blob/master/torchvision/transforms/transforms.py
https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose
https://blog.csdn.net/u014380165/article/details/79167753
另外,关于init,call和repr的说明,可以参考:
http://funhacks.net/explore-python/Class/magic_method.html,https://www.cnblogs.com/shengulong/p/7456435.html。