计算机视觉入门

卷积网络图像分类器

一个卷积神经网络通常包含了base和head两个部分。base是用于提取特征,head用于对特征进行分类(就与机器学习里面的一样)。

读取数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Imports
import os, warnings
import matplotlib.pyplot as plt
from matplotlib import gridspec

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory

# Reproducability
def set_seed(seed=31415):
np.random.seed(seed)
tf.random.set_seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
set_seed(31415)

# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells


# Load training and validation sets
ds_train_ = image_dataset_from_directory(
'../input/car-or-truck/train',
labels='inferred',
label_mode='binary',
image_size=[128, 128],
interpolation='nearest',
batch_size=64,
shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
'../input/car-or-truck/valid',
labels='inferred',
label_mode='binary',
image_size=[128, 128],
interpolation='nearest',
batch_size=64,
shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
ds_train_
.map(convert_to_float)
.cache()
.prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
ds_valid_
.map(convert_to_float)
.cache()
.prefetch(buffer_size=AUTOTUNE)
)

读取预训练base

1
2
3
4
pretrained_base = tf.keras.models.load_model(
'../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False # 直接使用在大规模数据训练好的head也叫Transfer Learning,不在小规模数据中改动

接上head后训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
pretrained_base,
layers.Flatten(),
layers.Dense(6, activation='relu'),
layers.Dense(1, activation='sigmoid'),
])

model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_accuracy'],
)

history = model.fit(
ds_train,
validation_data=ds_valid,
epochs=30,
verbose=0,
)

import pandas as pd

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot()

filter detect condense

在head中,使用filters对input图像进行过滤,然后使用relu实现detect效果,再用maxpooling实现condense
先relu后maxpooling就有intensifying的效果

1
2
3
4
5
6
7
8
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
layers.Conv2D(filters=64, kernel_size=3), # activation is None
layers.MaxPool2D(pool_size=2),
# More layers follow
])

Maxpooling 具有平移不变性

原理是将shape以内的最大数据作为shape的新数据,因此对于最大值所在的位置是不在乎的,因此位置不敏感了

关于滑动窗口

特征提取过程有:1. 使用conv层filter;2. 使用relu层detect;3. 使用maximum pooling来condense
其中,conv和maximum的操作都是按照滑动窗口来实现的。
滑动窗口大小由kernel_size给定,每次滑动距离由strides给定,使用何种类型边缘处理由padding给定。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
layers.Conv2D(filters=64,
kernel_size=3,
strides=1,
padding='same',
activation='relu'),
layers.MaxPool2D(pool_size=2,
strides=1,
padding='same')
# More layers follow
])

为了获得更多的特征用于分类,conv的strides一般是(1,1)。而maximum pooling的strides通常为(2,2),(3,3)而不超过窗口本身。

边界处理使用padding='valid'会让conv完全在图内部运行,会导致输出尺寸缩小。使用padding='same'会在输入图像周围加上几圈0,使得输出尺寸不变。

感受野 receptive field

多层conv和maximum pooling后的一个unit对应input的区域

数据扩增 Data Augmentation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

from tensorflow import keras
from tensorflow.keras import layers
# these are a new feature in TF 2.2
from tensorflow.keras.layers.experimental import preprocessing


pretrained_base = tf.keras.models.load_model(
'../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False

model = keras.Sequential([
# Preprocessing
preprocessing.RandomFlip('horizontal'), # flip left-to-right
preprocessing.RandomContrast(0.5), # contrast change by up to 50%
# Base
pretrained_base,
# Head
layers.Flatten(),
layers.Dense(6, activation='relu'),
layers.Dense(1, activation='sigmoid'),
])