Giter Club home page Giter Club logo

ssd's Introduction

数据增强(Data Augmentation)

【技术综述】深度学习中的数据增强方法都有哪些?

[TOC]

1.在pipeline的何处进行增强数据

线下增强(offline augmentation):事先执行所有转换,实质上会增加数据集的大小,适用于较小的数据集;

线上增强(online augmentation):在送入机器学习之前,在小批量(mini-batch)上执行这些转换;

2.常见的数据增强技术

(1) 几何变换类

几何变换类即对图像进行几何变换,包括翻转,旋转,裁剪,变形,缩放等各类操作;

(2) 颜色变换类

上面的几何变换类操作,没有改变图像本身的内容,它可能是选择了图像的一部分或者对像素进行了重分布。如果要改变图像本身的内容,就属于颜色变换类的数据增强了,常见的包括噪声、模糊、颜色变换、擦除、填充等;

(3) 其他

GAN生成数据集,mosaic拼接四张图片等;

2.1 翻转(Flip)

数据增强:数据有限时如何使用深度学习 ? (续)

可以对图片进行水平和垂直翻转,从左侧开始,原始图片,水平翻转的图片,垂直翻转的图片。

2.2 旋转(Rotation)

数据增强:数据有限时如何使用深度学习 ? (续)

从左向右,图像相对于前一个图像顺时针旋转90度

2.3 缩放比例(Scale)

数据增强:数据有限时如何使用深度学习 ? (续)

图像可以向外或向内缩放。向外缩放时,最终图像尺寸将大于原始图像尺寸。大多数图像框架从新图像中剪切出一个部分,其大小等于原始图像。从左到右,原始图像,向外缩放10%,向外缩放20%

2.4 裁剪(Crop)

数据增强:数据有限时如何使用深度学习 ? (续)

与缩放不同,我们只是从原始图像中随机抽样一个部分。然后,我们将此部分的大小调整为原始图像大小。这种方法通常称为随机裁剪(random crop)。从左至右,原始图像,左上角裁剪的图像,右下角裁剪的图像。

2.5 移位(Translation)

数据增强:数据有限时如何使用深度学习 ? (续)

移位只涉及沿X或Y方向(或两者)移动图像。从左至右,原始图像,向右移位,向上移位。

2.6 调整大小(Resize)

在这里插入图片描述

对图像进行缩放,以保证模型具有尺度不变性

2.7 高斯噪声(Gaussian Noise)

数据增强:数据有限时如何使用深度学习 ? (续)

神经网络试图学习可能无用的高频特征(大量出现的模式)时,通常会发生过度拟合。具有零均值的高斯噪声基本上在所有频率中具有数据点,从而有效地扭曲高频特征,添加适量的噪音可以增强学习能力。从左至右,原始图形,加入高斯噪声图片,加入盐和胡椒噪声图片

2.8 RGB 颜色扰动(Color distortion)& 调整亮度,对比度,饱和度,色调(brightness, contrast, saturation, and hue)

img preview

2.9 随机擦除(Random erase、CutOut)

在这里插入图片描述

对图像中随机选取一个矩形区域用特定的值(随机值或者数据均值)进行覆盖

2.10 Mixup

在这里插入图片描述

对两张图像和对应标签进行线性叠加

2.11 CutMix

在这里插入图片描述

在Mixup和CutOut的基础上,将图像中的某一区域去除,填充成另一图像

2.12 GAN

数据增强:数据有限时如何使用深度学习 ? (续)

风格转换、生成fake图片

2.13 Mosaic

img

mosaic 利用了四张图片拼接,丰富检测物体的背景,而且在BN计算的时候一下子会计算四张图片的数据

参考资料:

【1】Data Augmentation | How to use Deep Learning when you have Limited Data

【2】知乎-深度学习中的数据增强方法都有哪些?

【3】CSDN-深度学习数据增强方法总结

3.动手实践

基于SSD的数据增强,代码如下:

import torchvision.transforms.functional as FT

1.flip

def flip(image, boxes):
    # Flip image
    new_image = FT.hflip(image)

    # Flip boxes
    new_boxes = boxes
    new_boxes[:, 0] = image.width - boxes[:, 0] - 1
    new_boxes[:, 2] = image.width - boxes[:, 2] - 1
    new_boxes = new_boxes[:, [2, 1, 0, 3]]

    return new_image, new_boxes

2.resize

def resize(image, boxes, dims=(300, 300), return_percent_coords=True):
    # Resize image
    new_image = FT.resize(image, dims)

    # Resize bounding boxes
    old_dims = torch.FloatTensor([image.width, image.height, image.width, image.height]).unsqueeze(0)
    new_boxes = boxes / old_dims  # percent coordinates

    if not return_percent_coords:
        new_dims = torch.FloatTensor([dims[1], dims[0], dims[1], dims[0]]).unsqueeze(0)
        new_boxes = new_boxes * new_dims

    return new_image, new_boxes

3.padding

def expand(image, boxes, filler):
    """Helps to learn to detect smaller objects.
    """
    original_h = image.size(1)
    original_w = image.size(2)
    max_scale = 4
    scale = random.uniform(1, max_scale)
    new_h = int(scale * original_h)
    new_w = int(scale * original_w)

    # Create such an image with the filler
    filler = torch.FloatTensor(filler)  # (3)
    new_image = torch.ones((3, new_h, new_w), dtype=torch.float) * filler.unsqueeze(1).unsqueeze(1)  

    # Place the original image at random coordinates in this new image (origin at top-left of image)
    left = random.randint(0, new_w - original_w)
    right = left + original_w
    top = random.randint(0, new_h - original_h)
    bottom = top + original_h
    new_image[:, top:bottom, left:right] = image

    # Adjust bounding boxes' coordinates accordingly
    new_boxes = boxes + torch.FloatTensor([left, top, left, top]).unsqueeze(0) 

    return new_image, new_boxes

4.Distort brightness, contrast, saturation, and hue

def photometric_distort(image):
    new_image = image
    distortions = [FT.adjust_brightness,
                   FT.adjust_contrast,
                   FT.adjust_saturation,
                   FT.adjust_hue]

    random.shuffle(distortions)

    for d in distortions:
        if random.random() < 0.5:
            if d.__name__ is 'adjust_hue':
                adjust_factor = random.uniform(-18 / 255., 18 / 255.)
            else:
                adjust_factor = random.uniform(0.5, 1.5)

            # Apply this distortion
            new_image = d(new_image, adjust_factor)

    return new_image

5.random crop

def random_crop(image, boxes, labels, difficulties):
    original_h = image.size(1)
    original_w = image.size(2)
    # Keep choosing a minimum overlap until a successful crop is made
    while True:
        # Randomly draw the value for minimum overlap
        min_overlap = random.choice([0., .1, .3, .5, .7, .9, None])  # 'None' refers to no cropping
        # If not cropping
        if min_overlap is None:
            return image, boxes, labels, difficulties
        max_trials = 50
        for _ in range(max_trials):
            # Crop dimensions must be in [0.3, 1] of original dimensions
            min_scale = 0.3
            scale_h = random.uniform(min_scale, 1)
            scale_w = random.uniform(min_scale, 1)
            new_h = int(scale_h * original_h)
            new_w = int(scale_w * original_w)
            # Aspect ratio has to be in [0.5, 2]
            aspect_ratio = new_h / new_w
            if not 0.5 < aspect_ratio < 2:
                continue
            # Crop coordinates (origin at top-left of image)
            left = random.randint(0, original_w - new_w)
            right = left + new_w
            top = random.randint(0, original_h - new_h)
            bottom = top + new_h
            crop = torch.FloatTensor([left, top, right, bottom])  # (4)
            # Calculate Jaccard overlap between the crop and the bounding boxes
            overlap = find_jaccard_overlap(crop.unsqueeze(0),boxes)  
            overlap = overlap.squeeze(0)  
            if overlap.max().item() < min_overlap:
                continue
            # Crop image
            new_image = image[:, top:bottom, left:right]  # (3, new_h, new_w)
            # Find centers of original bounding boxes
            bb_centers = (boxes[:, :2] + boxes[:, 2:]) / 2.  # (n_objects, 2)
            # Find bounding boxes whose centers are in the crop
            centers_in_crop = (bb_centers[:, 0] > left) * (bb_centers[:, 0] < right) * (bb_centers[:, 1] > top) * (bb_centers[:, 1] < bottom)  
            # If not a single bounding box has its center in the crop, try again
            if not centers_in_crop.any():
                continue
            # Discard bounding boxes that don't meet this criterion
            new_boxes = boxes[centers_in_crop, :]
            new_labels = labels[centers_in_crop]
            new_difficulties = difficulties[centers_in_crop]
            # Calculate bounding boxes' new coordinates in the crop
            new_boxes[:, :2] = torch.max(new_boxes[:, :2], crop[:2]) 
            new_boxes[:, :2] -= crop[:2]
            new_boxes[:, 2:] = torch.min(new_boxes[:, 2:], crop[2:])
            new_boxes[:, 2:] -= crop[:2]

            return new_image, new_boxes, new_labels, new_difficulties
6.Mosaic
    def get_data_with_Mosaic(self, index, image, boxes, labels, difficulties):
        """loads images in a mosaic
        """
        img_size = 300
        image_s4 = Image.new('RGB', (img_size*2, img_size*2), (255, 255, 255))
        boxes_s4 = torch.FloatTensor([])
        labels_s4 = labels
        difficulties_s4 = difficulties
				# 3 additional image indices
        indices = [index] + [random.randint(0, self.num_samples - 1) for _ in range(3)]  
        for i, index in enumerate(indices):
            image = Image.open(self.images[index], mode='r')
            image = image.convert('RGB')
            objects = self.objects[index]
            boxes = torch.FloatTensor(objects['boxes'])
            labels = torch.LongTensor(objects['labels'])
            difficulties = torch.ByteTensor(objects['difficulties'])
            new_image, new_boxes = resize(image, boxes, dims=(img_size, img_size), return_percent_coords=False)

            # place img in img4
            if i == 0:  # top left
                image_s4.paste(new_image, (0, img_size))
                box = new_boxes+torch.FloatTensor([0, img_size, 0, img_size]).unsqueeze(0)
            elif i == 1:  # top right
                labels_s4 = torch.cat([labels_s4,labels])
                difficulties_s4 = torch.cat([difficulties_s4,difficulties])
                image_s4.paste(new_image, (img_size, img_size))
                box = new_boxes + torch.FloatTensor([img_size, img_size, img_size, img_size]).unsqueeze(0)
            elif i == 2:  # bottom left
                labels_s4 = torch.cat([labels_s4, labels])
                difficulties_s4 = torch.cat([difficulties_s4, difficulties])
                image_s4.paste(new_image, (0, 0))
                box = new_boxes
            elif i == 3:  # bottom right
                labels_s4 = torch.cat([labels_s4, labels])
                difficulties_s4 = torch.cat([difficulties_s4, difficulties])
                image_s4.paste(new_image, (img_size, 0))
                box = new_boxes + torch.FloatTensor([img_size, 0, img_size, 0]).unsqueeze(0)

            boxes_s4 = torch.cat([boxes_s4, box], dim=0)
        return image_s4, boxes_s4, labels_s4, difficulties_s4
7.简单的验证

基于SSD300的框架的检验:

(PS:因为实验室服务器远程连接的人数太多了,连不上,所以个人电脑CPU运行的代码,运行时间一个epoch都是四个多小时,测试时间也是四五个小时,所以进行了简单的检验,主要是对Mosaic方法的检验,初始实验一天,对照实验一天,中间还悲惨的跑了一天的测试发现结果不理想,原来是mosaic的数据框标注反了,后面可视化mosaic才看到,可视化的结果也在下面)

epoch:2,batch_size:8,数据集:VOC2007

初始实验:有正常的flip,padding,random crop等,2个epoch之后,结果如下,mAP:0.064,非常低,因为跑的轮数太少了;

image-20200724111626818 image-20200725094234930

在前面的基础加上Mosaic方法后,同样2个epoch后,mAP:0.099,有所进步,结果如下,因为SSD300输入是固定的300x300,感觉Mosaic和前面的padding一样,将物体的尺寸缩小了,同时也有利于小物体的检测;

image-20200725101218361 image-20200725094157387

Mosaic的可视化:

image-20200724192214695 image-20200724192248551

参考资料:

【1】Github:a-PyTorch-Tutorial-to-Object-Detection

【2】YOLOv4: Optimal Speed and Accuracy of Object Detection

4.总结感想

1.平时数据增强都是直接利用pytorch 中 torchvision.transforms 几个数据增强函数,小的比赛中基本上都没怎么用数据增强,没有太多的关注这方面。其实也是很有启发性的,比如Mixup的思路和我们TIP在投论文的水下数据的增强网络大致想法是一样的,我们还利用了GAN更好的生成数据,当时我是受了显著性目标检测的启发想到的,不过YOLOv4今年4月份才出来;

2.本科时候主要还是看的比较多,虽然课内的所有大作业和各种比赛的代码核心都是自己,但是因为时间(大部分时间得刷课内成绩和奖学金)和算力卡(实验室资源留给本科生的不多)的限制,很少动手去复现和实践CV方面的任务,大部分都是跑实验之类。所以也是很想读博,希望给自己一个很长的时间去提升自己。

3.现在成型的框架太多了,都是一个团队很久的积淀,也有MMdetection等工具,很疑惑好像自己个人造轮子写出来或者复现的东西有点差劲

ssd's People

Contributors

tangtaogo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.