2024-04-18

深度学习入门

最简单的神经元模型

import tensorflow as tf
import pandas as pd
red_wine = pd.read_csv('../input/dl-course-data/red-wine.csv')
print(red_wine.shape)   # (1599,12)

model = keras.Sequential([
    layers.Dense(units=1, input_shape=[11]),
])

x = tf.linspace(-1.0, 1.0, 100)
y = model.predict(x)

w, b = model.weights
print("weights\n{}\nBias\n{}".format(w,b))

深度神经元模型

建立Sequential Model

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([  # 所有层都放在一个list中
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu', input_shape=[2]),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer 
    layers.Dense(units=1),
])

把Activation单独作为层来写

model = keras.Sequential([
    layers.Dense(32, input_shape=[8]),
    layers.Activation('relu'),
    layers.Dense(32),
    layers.Activation('relu'),
    layers.Dense(1),
])

损失函数 Loss Function

描述预测值与真实值之间的差距的一个算法，对于Regression任务的例子有MAE和MSE等

优化器 Optimizer

调节模型权重以尽快减小Loss的算法
如随机梯度下降Stochastic Gradient Decent

随机取样一些训练数据（数量为minibatch 或 batch），经过model得到predict
算出loss
向loss减小的方向（梯度）调节weights

所有数据都过一遍model叫做一个epoch

学习率 Learning Rate 和 Batch Size

学习率和Batch Size决定SGD以多大步子和速度进行，通常不是显而易见的。
为此，Adam是一种自适应的SGD算法，不需要调参。

为模型增加Loss和Optimizer

model.compile(
    opertimizer='adam',
    loss='mae',
)

训练fit

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=10,
)

查看loss

import pandas as pd

# convert the training history to a dataframe
history_df = pd.DataFrame(history.history)
# use Pandas native plot method
history_df['loss'].plot();

Overfitting 和 Underfitting

训练数据中的信息由有用信号和噪声组成。理想模型是学习所有信号而没有噪声，但是实际上不存在。

信号不够为Underfitting,增加模型容量Capacity，增宽增深

model = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])

wider = keras.Sequential([
    layers.Dense(32, activation='relu'),
    layers.Dense(1),
])

deeper = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])

噪声过多Overfitting

在valid Loss开始上升时Early Stop

from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    min_delta=0.001, # minimium amount of change(validation loss) to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True,
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=500,
    callbacks=[early_stopping], # put your callbacks in a list  每epoch都会调用一次callback
    verbose=0,  # turn off training log
)

history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot();
print("Minimum validation loss: {}".format(history_df['val_loss'].min()))

使用Dropout减少Overfitting

Dropout让层间连接随机断开一些，直接加到你需要断开的层前面去

keras.Sequential([
    # ...
    layers.Dropout(rate=0.3), # apply 30% dropout to the next layer
    layers.Dense(16),
    # ...
])

Batch Normalization

用每一个batch数据自身的mean和deviation做normalization，还用两个可训练的参数把数据缩放到新尺度。
使用时适当增加网络units数量

# 直接在层后添加
layers.Dense(16, activation='relu'),
layers.BatchNormalization(),

# 在激活函数前添加
layers.Dense(16),
layers.BatchNormalization(),
layers.Activation('relu'),

# 在第一层添加，相当于自适应的预处理，类似于sklearn的StandardScaler

二分类任务 Binary Classification

分类精度是结果，但是不能作为Loss，因为变化是跳跃的。为此，使用cross-entropy，是概率分布的距离表示，预测正确的概率大则loss小。
为了将网络输出变为0到1之间的概率表示，就要用到sigmoid这种activation function了。

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'], # 对于二分类 使用binary_accuracy
)

黄河水澄的技术专栏

分享有用的知识