2024-05-04

C++ 基础

public protected private

public在任何地方都能访问。proteted和private不能被实例访问，但可以被友元函数访问。

public继承，基类权限不变；派生类成员函数可访问基类public和protected，不能访问private

protected继承，基类的public成员在派生类中的权限变成了protected；派生类成员函数可访问基类public和protected，不能访问private。

private继承，基类的所有成员在派生类中的权限变成了private；派生类成员函数可访问基类public、protected，不能访问基类private成员。

多线程

使用std::thread

#include <iostream>
#include <thread>

void print_sum(int a, int b) {
    std::cout << "The sum is: " << a + b << std::endl;
}

int main() {
    std::thread t(print_sum, 3, 5);
    t.join();
    return 0;
}

当出现线程之间资源竞争时，需要用mutex锁

#include <iostream>
#include <mutex>
#include <thread>

std::mutex mtx;

void print_block(int n, char c) {
    {
        std::unique_lock<std::mutex> locker(mtx);
        for (int i = 0; i < n; ++i) {
            std::cout << c;
        }
        std::cout << std::endl;
    }
}

int main() {
    std::thread t1(print_block, 50, '*');
    std::thread t2(print_block, 50, '$');

    t1.join();
    t2.join();

    return 0;
}

2024-04-24

目标检测Datasets格式

Ultralytics YOLO 格式

每个图对应一个txt文件，与图同名
在txt中：每行对应一个目标物件
在行中有:

类别索引，用数字表示的类别（如，0是person，1是car，等）
Mask区域的边界点坐标，用归一化的数字表示
1
<class-index> <x1> <y1> <x2> <y2> ... <xn> <yn>
每行的长度不一定相同，如果是作为分割任务，最少要有三个xy坐标点

YAML格式

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8-seg  # dataset root dir
train: images/train  # train images (relative to 'path') 4 images
val: images/val  # val images (relative to 'path') 4 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  # ...
  77: teddy bear
  78: hair drier
  79: toothbrush

这里的train和val分别指的是训练和验证数据的存储位置
names是类别名称的字典，序号与YOLO格式数据里面的对应

使用方法 python

from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n-seg.pt')  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data='coco8-seg.yaml', epochs=100, imgsz=640)

使用方法 CLI

1 2	# Start training from a pretrained *.pt model yolo detect train data=coco8-seg.yaml model=yolov8n-seg.pt epochs=100 imgsz=640

自动标注功能

使用SAM模型自动标注

1
2
3

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model='sam_b.pt')

Argument Type Description Default
data 待标注的数据
det_model 预训练的检测模型 Defaults to ‘yolov8x.pt’. ‘yolov8x.pt’
sam_model 预训练的SAM模型 Defaults to ‘sam_b.pt’. ‘sam_b.pt’
device 运行设备. Defaults to an empty string (CPU or GPU, if available). ‘’
output_dir 保存标注结果的路径. Defaults to a ‘labels’ folder in the same directory as ‘data’. None

2024-04-21

计算机视觉常用概念

IoU 交并比

给定pred_bbox和gt_bbox，计算交集面积与并集的比值。
bbox的表示方法是[x1,y1,x2,y2]

import numpy as np

def get_IoU(pred_bbox, gt_bbox):
    """
    :param pred_bbox: predicted bbox coordinate
    :param gt_bbox: ground truth bbox coordinate
    :return: iou score
    """
    ix1 = max(pred_bbox[0], gt_bbox[0])
    iy1 = max(pred_bbox[1], gt_bbox[1])
    ix2 = min(pred_bbox[2], gt_bbox[2])
    iy2 = min(pred_bbox[3], gt_bbox[3])
    iw = np.maximum(ix2 - ix1 + 1, 0)
    ih = np.maximum(iy2 - iy1 + 1, 0)

    inter = iw * ih

    union = (pred_bbox[2] - pred_bbox[0]) * (pred_bbox[3] - pred_bbox[1]) + (gt_bbox[2] - gt_bbox[0]) * (gt_bbox[3] - gt_bbox[1]) - inter

    return inter / union

def get_max_IoU(pred_bboxes, gt_bbox):
    """
    given 1 gt bbox, >1 pred bboxes, return max iou score for the given gt bbox and pred_bboxes
    :param pred_bbox: predict bboxes coordinates, we need to find the max iou score with gt bbox for these pred bboxes
    :param gt_bbox: ground truth bbox coordinate
    :return: max iou score
    """

    if pred_bboxes.shape[0] > 0:
        # -----0---- get coordinates of inters, but with multiple predict bboxes
        ix1 = np.maximum(pred_bboxes[:, 0], gt_bbox[0])
        iy1 = np.maximum(pred_bboxes[:, 1], gt_bbox[1])
        ix2 = np.minimum(pred_bboxes[:, 2], gt_bbox[2])
        iy2 = np.minimum(pred_bboxes[:, 3], gt_bbox[3])
        iw = np.maximum(ix2 - ix1 + 1., 0.) # max用于过滤无交集情况，+1是当出现边界框边缘接触的时候留下数值
        ih = np.maximum(iy2 - iy1 + 1., 0.)

        # -----1----- intersection
        inters = iw * ih

        # -----2----- union, uni = S1 + S2 - inters
        uni = ((gt_bbox[2] - gt_bbox[0] + 1.) * (gt_bbox[3] - gt_bbox[1] + 1.) +
               (pred_bboxes[:, 2] - pred_bboxes[:, 0] + 1.) * (pred_bboxes[:, 3] - pred_bboxes[:, 1] + 1.) -
               inters)

        # -----3----- iou, get max score and max iou index
        overlaps = inters / uni
        ovmax = np.max(overlaps)
        jmax = np.argmax(overlaps)

    return overlaps, ovmax, jmax

if __name__ == "__main__":

    # test1
    pred_bbox = np.array([50, 50, 90, 100])   # top-left: <50, 50>, bottom-down: <90, 100>, <x-axis, y-axis>
    gt_bbox = np.array([110, 110, 150, 150])
    print (get_IoU(pred_bbox, gt_bbox))
    
    # test2
    pred_bboxes = np.array([[15, 18, 47, 60],
                          [50, 50, 90, 100],
                          [70, 80, 120, 145],
                          [130, 160, 250, 280],
                          [25.6, 66.1, 113.3, 147.8]])
    gt_bbox = np.array([70, 80, 120, 150])
    print (get_max_IoU(pred_bboxes, gt_bbox))

NMS 非极大值抑制

预测结果可能是这样的
x1, y1, x2, y2, score, class;
x1, y1, x2, y2, score, class;

按score降序排列
提取bboxes，即x1, y1, x2, y2组成的矩阵
计算IoU

新建一个空矩阵，将原来的结果矩阵中高IoU和重复名称的取出放入，在剩下的里面重复取高IoU和重名的

import numpy as np
import cv2

from draw_bbox import draw_box


def nms(bboxes, scores, iou_thresh):
    """
    :param bboxes: 检测框列表
    :param scores: 置信度列表
    :param iou_thresh: IOU阈值
    :return:
    """

    x1 = bboxes[:, 0]
    y1 = bboxes[:, 1]
    x2 = bboxes[:, 2]
    y2 = bboxes[:, 3]
    areas = (y2 - y1) * (x2 - x1)

    # 结果列表
    result = []
    index = scores.argsort()[::-1]  # 对检测框按照置信度进行从高到低的排序，并获取索引
    # 下面的操作为了安全，都是对索引处理
    while index.size > 0:
        # 当检测框不为空一直循环
        i = index[0]
        result.append(i)  # 将置信度最高的加入结果列表

        # 计算其他边界框与该边界框的IOU
        x11 = np.maximum(x1[i], x1[index[1:]])
        y11 = np.maximum(y1[i], y1[index[1:]])
        x22 = np.minimum(x2[i], x2[index[1:]])
        y22 = np.minimum(y2[i], y2[index[1:]])
        w = np.maximum(0, x22 - x11 + 1)
        h = np.maximum(0, y22 - y11 + 1)
        overlaps = w * h
        ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)
        # 只保留满足IOU阈值的索引
        idx = np.where(ious <= iou_thresh)[0]
        index = index[idx + 1]  # 处理剩余的边框
    bboxes, scores = bboxes[result], scores[result]
    return bboxes, scores


if __name__ == '__main__':
    raw_img = cv2.imread('test.png')
    # 这里为了编码方便，将检测的结果直接作为变量
    bboxes = [[183, 625, 269, 865], [197, 603, 296, 853], [190, 579, 295, 864], [537, 507, 618, 713], [535, 523, 606, 687]]
    confidences = [0.7, 0.9, 0.95, 0.9, 0.6]

    # 进行nms处理
    bboxes, scores = nms(np.array(bboxes), np.array(confidences), 0.5)

mAP

mean average precision 用于判断模型在目标识别方面的能力。
presision = TP / (TP + FP) 如模型认为positive有100个，其中有80个是对的，就是80 /（80+20）分母是做预测的次数
recall = TP / (TP + FN) 如模型认为positive有100个，而样本总共是200个，就是 100 / (80+120) 分母是所有样本的数量

average precision 使用一个IoU阈值划定是否Positive。这样就能计算每次增加一个样本的precision和变动的recall，能绘出曲线。而曲线下的面积是AP，范围0到1.
AP(COCO) 考虑不同 IoU 的阈值。一般可能会写成 AP[.50:.05:.95]，代表会分別计算当 IoU = 0.5、IoU = 0.55、…、IoU = 0.9 與 IoU = 0.95 的 interpolated AP，最后再将这些 AP 取平均得到最后的 AP。
mAP 计算每一个类别的AP之后再平均就是了。

2024-04-20

目标识别

Deep Residual Learning for Image Recognition

Kaiming He大神2015发表

Abstract

越深的model越难训练。而大神的方法可以比以前网络更深（152层）的同时还易于optimize。

Introduction

深度学习刚开始认为model层数越多越好，然后发现到一定程度增加层数会有退化degradation的效果，所谓梯度消失/爆炸问题。
所以这里提出的方法是F(x)+x，在网络中就相当于增加可以跨越一层或者多层的捷径，关键是这操作还不需要增加模型参数和计算复杂度。
摆试验结论：对照组层数增加误差增加，而resnet轻松享受增加层数带来的好处

试验结果

单模型resnet152的top-5 error达到了4.49%，6个不同深度的模型组合达到了 3.57%
普通网络18层精度比34层高。而resnet18比普通18收敛快。
resnet1202比resnet101差，可能是因为参数相对于数据集太大导致。

PVANet : Lightweight Deep Neural Networks for Real-time Object Detection

Sanghoon Hong 2016发表

Abstract

减少计算量的情况下提高多类别分类任务精度，使用less channels with more layers。结果是voc2007的mAP达83.8%，voc2012达82.5%（第二）。计算量是resnet101的12.3%。

Rich feature hierarchies for accurate object detection and semantic segmentation (RCNN)

Ross Girshick 2014发表

Abstract

提出方法比先前的组合模型在VOC2012提高mAP约30%，达到53.3%。

Introduction

关注如何用一个深层模型来定位目标，仅使用少量标注检测数据训练一个高容量的模型
定位此前滑动窗口被用了近20年，最佳模型OverFeat在ILSVRC2013的mAP是24.3%，而rcnn达到了31.4%。
有监督的预训练加上特定领域的微调。

输入图片
提取约2000个与类别无关的proposals
使用一个大CNN模型提取每个proposal的特征
使用特定类别的SVM对proposals进行分类

bbox regression

将预测的bbox P与gt的bbox G之间做一个映射
G_x = P_w * d_x(P) + P_x
G_y = P_h * d_y(P) + P_y
G_w = P_w * exp(d_w(P))
G_h = P_h * exp(d_h(P))

第pool5特征的线性函数
d_#(P) = w_#^T * phi_5(P) 其中#表示x,y,w,h

用ridge regression学习w_#，而其target
t_x = (G_x - P_x)/P_w
t_y = (G_y - P_y)/P_h
t_w = log(G_w/P_w)
t_h = log(G_h/P_h)

只对与gt的IoU高于0.6的P进行学习，其他的丢弃

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Kaiming He 2014发布
SPP-net可以确保输出的尺寸不变，不管输入图片的尺寸。这个方法只计算一次特征图，然后对随机区域pool到固定长度的representation，然后训练detector。
实现比RCNN快24-102倍情况下效果还略好。
对特征图分别进行全部pool得1个256维的向量，再分4块pool得4个256维向量，再分16块pool得到16个256维向量，把这些向量组合起来，得到21*256维的向量（固定长度），再接全连接层。

Fast R-CNN Ross Girshick

训练VGG比R-CNN快9倍，测试快213倍，精度还要高些。比SPPnet训练3倍，测试快10倍，精度也高些。
一个模型直接包括定位和识别。
RCNN慢是因为每次都需要重新计算特征图，而SPPnet一次计算特征图有共享的效果所以快。这两种方法也都是需要分步操作的。

输入图像和多个roi到全卷积网络FCN，输入的ROI被prejecting到特征图上面的roi。
对特征图roi进行pool到固定尺寸，再使用全连接层FCs变成特征
每个特征roi都最终连了两个输出：softmax probabilities、per-class bbox regression offsets
使用多任务loss进行end-to-end 训练。

Truncated SVD减少推理时间30%。

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren 2015
提出RPN网络，将全卷积的特征与检测分享，同时给出目标位置和类别分数，达到接近零cost的proposal。拿下ILSVRC and COCO的2015第一名。

在共享的特征图的最后一层上，用一个nxn的窗口进行滑动（3x3感受野约171ZF和228VGG）,nxn进行卷积引出两个分支cls和reg，每个窗口设定了最大proposal数量k，因此cls为是目标或不是目标2个因此是2k个，而reg就是4k个了

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon 2015
将目标检测问题构建为regression用以获得分离的边界框类别概率。极快。相较于其他方法的定位精度稍逊，但不太可能误判背景。

resize输入到448x448
运行cnn
根据模型的confidence对检测结果进行阈值处理

YOLO看全局信息，比fast rcnn少一半的背景误差。

将输入图片划分为SxS的grid，每个grid预测B个bbox和C个概率，因此编成S * S * (B5 + C)的预测
每个bbox后面附赠一个有目标的confidence，因此是B5
类别数量C

YOLO的泛化性能强。即只在VOC2007训练，然后在Picasso dataset和people-art dataset做测试，比R-CNN强（而R-CNN在VOC2007的mAP最高）

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

Sean Bell 2015
PASCAL VOC 2012的mAP达到76.4%，在MS COCO dataset，mAP有33.1%
将roi外部的contextual信息用RNNs考虑进来，其余部分有点像SPPnet的操作+RCNN的detection

R-FCN: Object Detection via Region-based Fully Convolutional Networks

Jifeng Dai

VOC2007的mAP达83.6%，比Faster R-CNN快2.5-40倍

SSD: Single Shot MultiBox Detector

Wei Liu

voc2007达到mAP 72.1%，比Faster rcnn快
有8x8和4x4的特征图grid，在每一个位置有几个不同长宽比的默认框，将其与gt框重叠的为positive，其余作为negative，每个框有4个坐标和c个confidence

YOLO9000 (YOLO V2)

应用了BN，提高了输入resolution，不限定输入图像尺寸，等操作，better、faster、stronger

YOLOv3

借鉴Resnet设计的darknet53，提高了小目标检测能力；增加了多尺度特征融合，即下采样和上采样融合。将softmax改为logistic支持多标签。
YOLO原作者因为技术被用作军事和隐私而退出。

YOLOv4

ultralytics 增加了一些功能，权序列连接 (WRC)、跨阶段部分连接 (CSP)、交叉迷你批归一化 (CmBN)、自对抗训练 (SAT)、误激活、马赛克数据增强、DropBlock 正则化和 CIoU 损失

YOLOv5

ultralytics 无锚点分割Ultralytics Head，优化准确性与速度之间的权衡，多种预训练模型

YOLOv6

美团
双向串行 (BiC) 模块：YOLOv6 在探测器的颈部引入了双向并联（BiC）模块，可增强定位信号并提高性能，而速度降低可忽略不计。
锚点辅助训练（AAT）策略：该模型提出的 AAT 可同时享受基于锚和无锚范式的优势，而不会降低推理效率。
增强型骨干和颈部设计：通过深化 YOLOv6，在骨干和颈部加入另一个阶段，该模型在高分辨率输入的 COCO 数据集上实现了最先进的性能。
自蒸馏策略：为了提高 YOLOv6 较小模型的性能，我们采用了一种新的自蒸馏策略，在训练过程中增强辅助回归分支，在推理过程中去除辅助回归分支，以避免速度明显下降。

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Chien-Yao Wang 2022
模型重新参数化：YOLOv7 提出了一种有计划的重新参数化模型，这是一种适用于不同网络层的策略，具有梯度传播路径的概念。

动态标签分配：多输出层模型的训练提出了一个新问题：”如何为不同分支的输出分配动态目标？为了解决这个问题，YOLOv7 引入了一种新的标签分配方法，即从粗到细的引导标签分配法。

扩展和复合缩放YOLOv7 为实时对象检测器提出了 “扩展 “和 “复合缩放 “方法，可有效利用参数和计算。

效率：YOLOv7 提出的方法能有效减少最先进的实时物体检测器约 40% 的参数和 50% 的计算量，推理速度更快，检测精度更高。

YOLOv8

Ultralytics
先进的骨干和颈部架构： YOLOv8 采用了最先进的骨干和颈部架构，从而提高了特征提取和物体检测性能。
无锚分裂Ultralytics 头： YOLOv8 采用无锚分裂Ultralytics 头，与基于锚的方法相比，它有助于提高检测过程的准确性和效率。
优化精度与速度之间的权衡： YOLOv8 专注于保持精度与速度之间的最佳平衡，适用于各种应用领域的实时目标检测任务。
各种预训练模型： YOLOv8 提供一系列预训练模型，以满足各种任务和性能要求，从而更容易为您的特定用例找到合适的模型。

YOLOv9

YOLOv9 在其架构中加入了可逆函数，以降低信息退化的风险
PGI 是 YOLOv9 为解决信息瓶颈问题而引入的一个新概念，可确保在深层网络中保留重要数据。这样就能生成可靠的梯度，促进模型的准确更新，提高整体检测性能。
GELAN 是一项战略性的架构进步，使 YOLOv9 能够实现更高的参数利用率和计算效率。
YOLOv9c 模型尤其凸显了架构优化的有效性。与 YOLOv7 AF 相比，它的运行参数减少了 42%，计算需求减少了 21%，但精度却不相上下，这表明 YOLOv9 的效率有了显著提高。此外，YOLOv9e 模型还为大型模型设立了新标准，其参数比 YOLOv7 AF 少 15%，计算需求比 YOLOv7 AF 少 25%。 YOLOv8x相比，参数减少了 15%，计算需求减少了 25%，同时 AP 增量提高了 1.7%。
这些结果展示了 YOLOv9 在模型设计方面的战略性进步，强调了它在提高效率的同时并没有降低实时物体检测任务所必需的精度。该模型不仅突破了性能指标的界限，而且强调了计算效率的重要性，使其成为计算机视觉领域的一项关键性发展。

Segment Anything

Alexander Kirillov 2023
在大语言模型中网络级别的大数据训练可以让NLP模型泛化到未见数据。这种容量与prompt engineering常在一起使用。cv目前缺少大量的数据。
提出SA-1B分割数据集，比现有任何都大400倍以上

自动标注是SAM 的一项重要功能，允许用户使用预先训练好的检测模型生成分割数据集。这一功能可以快速、准确地标注大量图像，避免了耗时的人工标注。

RT-DTER

百度 2023
极快，效果好的一个detection model

2024-04-20

计算机视觉入门

卷积网络图像分类器

一个卷积神经网络通常包含了base和head两个部分。base是用于提取特征，head用于对特征进行分类（就与机器学习里面的一样）。

读取数据

# Imports
import os, warnings
import matplotlib.pyplot as plt
from matplotlib import gridspec

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory

# Reproducability
def set_seed(seed=31415):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
set_seed(31415)

# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells


# Load training and validation sets
ds_train_ = image_dataset_from_directory(
    '../input/car-or-truck/train',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
    '../input/car-or-truck/valid',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

读取预训练base

pretrained_base = tf.keras.models.load_model(
    '../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False # 直接使用在大规模数据训练好的head也叫Transfer Learning，不在小规模数据中改动

接上head后训练

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    pretrained_base,
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dense(1, activation='sigmoid'),
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=30,
    verbose=0,
)

import pandas as pd

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot()

filter detect condense

在head中，使用filters对input图像进行过滤，然后使用relu实现detect效果，再用maxpooling实现condense
先relu后maxpooling就有intensifying的效果

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3), # activation is None
    layers.MaxPool2D(pool_size=2),
    # More layers follow
])

Maxpooling 具有平移不变性

原理是将shape以内的最大数据作为shape的新数据，因此对于最大值所在的位置是不在乎的，因此位置不敏感了

关于滑动窗口

特征提取过程有：1. 使用conv层filter；2. 使用relu层detect；3. 使用maximum pooling来condense
其中，conv和maximum的操作都是按照滑动窗口来实现的。
滑动窗口大小由kernel_size给定，每次滑动距离由strides给定，使用何种类型边缘处理由padding给定。

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  strides=1,
                  padding='same',
                  activation='relu'),
    layers.MaxPool2D(pool_size=2,
                     strides=1,
                     padding='same')
    # More layers follow
])

为了获得更多的特征用于分类，conv的strides一般是（1，1）。而maximum pooling的strides通常为（2，2），（3，3）而不超过窗口本身。

边界处理使用padding='valid'会让conv完全在图内部运行，会导致输出尺寸缩小。使用padding='same'会在输入图像周围加上几圈0，使得输出尺寸不变。

感受野 receptive field

多层conv和maximum pooling后的一个unit对应input的区域

数据扩增 Data Augmentation


from tensorflow import keras
from tensorflow.keras import layers
# these are a new feature in TF 2.2
from tensorflow.keras.layers.experimental import preprocessing


pretrained_base = tf.keras.models.load_model(
    '../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False

model = keras.Sequential([
    # Preprocessing
    preprocessing.RandomFlip('horizontal'), # flip left-to-right
    preprocessing.RandomContrast(0.5), # contrast change by up to 50%
    # Base
    pretrained_base,
    # Head
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dense(1, activation='sigmoid'),
])

2024-04-18

深度学习入门

最简单的神经元模型

import tensorflow as tf
import pandas as pd
red_wine = pd.read_csv('../input/dl-course-data/red-wine.csv')
print(red_wine.shape)   # (1599,12)

model = keras.Sequential([
    layers.Dense(units=1, input_shape=[11]),
])

x = tf.linspace(-1.0, 1.0, 100)
y = model.predict(x)

w, b = model.weights
print("weights\n{}\nBias\n{}".format(w,b))

深度神经元模型

建立Sequential Model

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([  # 所有层都放在一个list中
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu', input_shape=[2]),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer 
    layers.Dense(units=1),
])

把Activation单独作为层来写

model = keras.Sequential([
    layers.Dense(32, input_shape=[8]),
    layers.Activation('relu'),
    layers.Dense(32),
    layers.Activation('relu'),
    layers.Dense(1),
])

损失函数 Loss Function

描述预测值与真实值之间的差距的一个算法，对于Regression任务的例子有MAE和MSE等

优化器 Optimizer

调节模型权重以尽快减小Loss的算法
如随机梯度下降Stochastic Gradient Decent

随机取样一些训练数据（数量为minibatch 或 batch），经过model得到predict
算出loss
向loss减小的方向（梯度）调节weights

所有数据都过一遍model叫做一个epoch

学习率 Learning Rate 和 Batch Size

学习率和Batch Size决定SGD以多大步子和速度进行，通常不是显而易见的。
为此，Adam是一种自适应的SGD算法，不需要调参。

为模型增加Loss和Optimizer

model.compile(
    opertimizer='adam',
    loss='mae',
)

训练fit

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=10,
)

查看loss

import pandas as pd

# convert the training history to a dataframe
history_df = pd.DataFrame(history.history)
# use Pandas native plot method
history_df['loss'].plot();

Overfitting 和 Underfitting

训练数据中的信息由有用信号和噪声组成。理想模型是学习所有信号而没有噪声，但是实际上不存在。

信号不够为Underfitting,增加模型容量Capacity，增宽增深

model = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])

wider = keras.Sequential([
    layers.Dense(32, activation='relu'),
    layers.Dense(1),
])

deeper = keras.Sequential([
    layers.Dense(16, activation='relu'),
    layers.Dense(16, activation='relu'),
    layers.Dense(1),
])

噪声过多Overfitting

在valid Loss开始上升时Early Stop

from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    min_delta=0.001, # minimium amount of change(validation loss) to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True,
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=500,
    callbacks=[early_stopping], # put your callbacks in a list  每epoch都会调用一次callback
    verbose=0,  # turn off training log
)

history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot();
print("Minimum validation loss: {}".format(history_df['val_loss'].min()))

使用Dropout减少Overfitting

Dropout让层间连接随机断开一些，直接加到你需要断开的层前面去

keras.Sequential([
    # ...
    layers.Dropout(rate=0.3), # apply 30% dropout to the next layer
    layers.Dense(16),
    # ...
])

Batch Normalization

用每一个batch数据自身的mean和deviation做normalization，还用两个可训练的参数把数据缩放到新尺度。
使用时适当增加网络units数量

# 直接在层后添加
layers.Dense(16, activation='relu'),
layers.BatchNormalization(),

# 在激活函数前添加
layers.Dense(16),
layers.BatchNormalization(),
layers.Activation('relu'),

# 在第一层添加，相当于自适应的预处理，类似于sklearn的StandardScaler

二分类任务 Binary Classification

分类精度是结果，但是不能作为Loss，因为变化是跳跃的。为此，使用cross-entropy，是概率分布的距离表示，预测正确的概率大则loss小。
为了将网络输出变为0到1之间的概率表示，就要用到sigmoid这种activation function了。

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'], # 对于二分类 使用binary_accuracy
)

2024-04-18

机器学习中级

解决丢失值

删除不完整列

最简单直接，但是会浪费很多数据

# Get names of columns with missing values
cols_with_missing = [col for col in X_train.columns
                     if X_train[col].isnull().any()]

# Drop columns in training and validation data
reduced_X_train = X_train.drop(cols_with_missing, axis=1)
reduced_X_valid = X_valid.drop(cols_with_missing, axis=1)

插补

在缺失处填上诸如列均值的方法，但要根据缺失项目的实际特征来决定是否应该这样做

from sklearn.impute import SimpleImputer

# Imputation
my_imputer = SimpleImputer()
imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))

# Imputation removed column names; put them back
imputed_X_train.columns = X_train.columns
imputed_X_valid.columns = X_valid.columns

插补-拓展

补上新值，新增一列布尔值表示是否为插补值

# Make copy to avoid changing original data (when imputing)
X_train_plus = X_train.copy()
X_valid_plus = X_valid.copy()

# Make new columns indicating what will be imputed
for col in cols_with_missing:
    X_train_plus[col + '_was_missing'] = X_train_plus[col].isnull()
    X_valid_plus[col + '_was_missing'] = X_valid_plus[col].isnull()

# Imputation
my_imputer = SimpleImputer()
imputed_X_train_plus = pd.DataFrame(my_imputer.fit_transform(X_train_plus))
imputed_X_valid_plus = pd.DataFrame(my_imputer.transform(X_valid_plus))

# Imputation removed column names; put them back
imputed_X_train_plus.columns = X_train_plus.columns
imputed_X_valid_plus.columns = X_valid_plus.columns

Categorical 分类数据

一个类别数据，例如问你有什么品牌的车，答“大众”、“丰田”、“奔驰”等

处理类别数据的3个方法

如果没有很重要数据，drop丢掉该变量

# Get list of categorical variables
s = (X_train.dtypes == 'object')
object_cols = list(s[s].index)

print("Categorical variables:")
print(object_cols)

drop_X_train = X_train.select_dtypes(exclude=['object'])
drop_X_valid = X_valid.select_dtypes(exclude=['object'])

print("MAE from Approach 1 (Drop categorical variables):")
print(score_dataset(drop_X_train, drop_X_valid, y_train, y_valid))

序数编码：为每一个类别制定数字，适用于强度指标如“强”、“中”、“弱”

from sklearn.preprocessing import OrdinalEncoder

# Make copy to avoid changing original data 
label_X_train = X_train.copy()
label_X_valid = X_valid.copy()

# Apply ordinal encoder to each column with categorical data
ordinal_encoder = OrdinalEncoder()
label_X_train[object_cols] = ordinal_encoder.fit_transform(X_train[object_cols])
label_X_valid[object_cols] = ordinal_encoder.transform(X_valid[object_cols])

print("MAE from Approach 2 (Ordinal Encoding):") 
print(score_dataset(label_X_train, label_X_valid, y_train, y_valid))

使用序数编码时，如果训练数据中的变量与测试数据变量不一样，会出现问题因此需要避免

# Categorical columns in the training data
object_cols = [col for col in X_train.columns if X_train[col].dtype == "object"]

# Columns that can be safely ordinal encoded
good_label_cols = [col for col in object_cols if 
                   set(X_valid[col]).issubset(set(X_train[col]))]
        
# Problematic columns that will be dropped from the dataset
bad_label_cols = list(set(object_cols)-set(good_label_cols))
        
print('Categorical columns that will be ordinal encoded:', good_label_cols)
print('\nCategorical columns that will be dropped from the dataset:', bad_label_cols)

One-hot Encoding:创建类别数变量相同的变量，每行只有一个1，其他都是0，适用于无序类别，即名义变量nominal variables

from sklearn.preprocessing import OneHotEncoder

# Apply one-hot encoder to each column with categorical data
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(X_train[object_cols]))
OH_cols_valid = pd.DataFrame(OH_encoder.transform(X_valid[object_cols]))

# One-hot encoding removed index; put it back
OH_cols_train.index = X_train.index
OH_cols_valid.index = X_valid.index

# Remove categorical columns (will replace with one-hot encoding)
num_X_train = X_train.drop(object_cols, axis=1)
num_X_valid = X_valid.drop(object_cols, axis=1)

# Add one-hot encoded columns to numerical features
OH_X_train = pd.concat([num_X_train, OH_cols_train], axis=1)
OH_X_valid = pd.concat([num_X_valid, OH_cols_valid], axis=1)

# Ensure all columns have string type
OH_X_train.columns = OH_X_train.columns.astype(str)
OH_X_valid.columns = OH_X_valid.columns.astype(str)

print("MAE from Approach 3 (One-Hot Encoding):") 
print(score_dataset(OH_X_train, OH_X_valid, y_train, y_valid))

Pipline 批处理

import pandas as pd
from sklearn.model_selection import train_test_split

# Read the data
X_full = pd.read_csv('../input/train.csv', index_col='Id')
X_test_full = pd.read_csv('../input/test.csv', index_col='Id')

# Remove rows with missing target, separate target from predictors
X_full.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = X_full.SalePrice
X_full.drop(['SalePrice'], axis=1, inplace=True)

# Break off validation set from training data
X_train_full, X_valid_full, y_train, y_valid = train_test_split(X_full, y, 
                                                                train_size=0.8, test_size=0.2,
                                                                random_state=0)

# "Cardinality" means the number of unique values in a column
# Select categorical columns with relatively low cardinality (convenient but arbitrary)
categorical_cols = [cname for cname in X_train_full.columns if
                    X_train_full[cname].nunique() < 10 and 
                    X_train_full[cname].dtype == "object"]

# Select numerical columns
numerical_cols = [cname for cname in X_train_full.columns if 
                X_train_full[cname].dtype in ['int64', 'float64']]

# Keep selected columns only
my_cols = categorical_cols + numerical_cols
X_train = X_train_full[my_cols].copy()
X_valid = X_valid_full[my_cols].copy()
X_test = X_test_full[my_cols].copy()

看下面

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# Preprocessing for numerical data
numerical_transformer = SimpleImputer(strategy='constant')

# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

# Define model
model = RandomForestRegressor(n_estimators=100, random_state=0)

# Bundle preprocessing and modeling code in a pipeline
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('model', model)
                     ])

# Preprocessing of training data, fit model 
clf.fit(X_train, y_train)

# Preprocessing of validation data, get predictions
preds = clf.predict(X_valid)

print('MAE:', mean_absolute_error(y_valid, preds))

data leakage

data leakage是训练时包括目标的信息，在预测时却没有该项信息，从而在训练集和验证集表现很好，但是在实际应用或者测试集表现不好。
target leakage: 以目标值为因素进行变化的任何变量都应该舍弃,如判断是否感冒，就应该把吃感冒药这种变量舍去
Train-Test 污染：人会根据测试结果调整预处理方式

2024-04-17

Pandas入门

导入库

1	import pandas as pd

创建数据

在Pandas中有两种核心数据：DataFrame和Series

创建DataFrame

DataFrame是一个表格，包含多个独立条目的序列，每个条目下面是一列数据。每一行叫record

1 2	pd.DataFrame({'Yes':[50,21], 'No':[131,2]}) pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

行标被称为Index

1
2
3

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

创建Series

1	pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

读取数据

1 2	wine_reviews = pd.read_csv("filename.csv") wine_reviews = pd.read_csv("filename.csv",index_col=0) # 使用Excel文档中已有的index

观察数据

看形状、看头看尾

1
2
3

wine_reviews.shape
wine_reviews.head()
wine_reviews.tail()

保存数据

1	wine_reviews.to_csv('savefilename.csv')

索引

基本索引方式

1 2	reviews.country reviews['country']

pandas中有loc和iloc方法，都是先行row后列column，与基本方式相反

基于index的索引

1 2	reviews.iloc[:, 0] # 获取第一列 reviews.iloc[-5:] # 最后5行

基于标签的索引

1 2	reviews.loc[0, 'country'] # 获取第一行，country列的元素

使用loc索引0：10就是0,…,10
使用iloc索引0：10就是0,…,9

基于标签判断的缩影

1 2	italian_wines = reviews[reviews.country == 'Italy'] top_oceania_wines = reviews.loc[(reviews.country.isin(['Australia','New Zealand'])) & (reviews.points>=95)]

不重复元素

1	reviews.country.unique()

元素计数

1	reviews.country.value_count()

使用map()

1	reviews.points.map(lambda p: p - review_points_mean)

传给map()的function每次接收Series中的一个值，返回一个新的Series包含所有被function转换后的值

同样的操作可以用apply()实现

def remean_points(row):
    row.points = row.points - review_points_mean
    return row

reviews.apply(remean_points, axis='columns') # axis=index 时则应用函数到每列

idxmax()

1 2	bargain_idx = (reviews.points / reviews.price).idxmax() bargain_wine = reviews.loc[bargain_idx,'title']

grouping 和 sorting

1 2	reviews.groupby('points').points.count() reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

agg()函数可以实现同时运行多种函数

1	reviews.groupby(['country']).price.agg([len, min, max])

多索引

1	countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])

复原

1	countries_reviewed.reset_index()

Sorting

countries_reviewed = countries_reviewed.reset_index()
countries_reviewed.sort_values(by='len')  # 默认ascending = True
# 使用index排序
countries_reviewed.sort_index()
# 使用两个同时排序
countries_reviewed.sort_values(by=['country', 'len'])

计数

1
2
3

reviews_written = reviews.groupby('taster_twitter_handle').size()

reviews_written = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()

数据类型和缺失值

数据类型

1
2
3

reviews.price.dtype
# 更改类型
reviews.points.astype('float64')

丢失值NaN，类型是float64

1	reviews[pd.isnull(reviews.country)]

替换丢失值是常见操作，例如替换为Unknown

1	reviews.region_2.fillna('Unknown')

替换正常值

1	reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

重命名

# 给column重命名
reviews.rename(columns={'points': 'score'})

# 给index重命名
reviews.rename(index={0:'FirstEntry',1:'SecondEntry'})

行和列都可以有自己的名字属性

1	reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns')

组合数据的三种方式

concat(),join(),merge()，后者与join类似

# concat() 组合具有相同column的数据
canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")

pd.concat([canadian_youtube, british_youtube])

# join() 组合具有相同index的数据
left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])

left.join(right, lsuffix='_CAN', rsuffix='_UK') # 后两个参数是后缀必须

2024-04-16

机器学习入门

使用Pandas打开csv数据

Pandas中数据是DataFrame，保存的数据类似一个Excel的sheet，或者SQL中的一个table

import pandas as pd
# save filepath to variable for easier access
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(melbourne_file_path)

了解数据

# print a summary of the data in Melbourne data
melbourne_data.describe()

# 显示所有列索引
print(melbourne_data.columns)

设定Target

1	y = melbourne_data['SalePrice']

设定Input

在所有列索引中选择需要的作为输入特征，如

feature_names = ['LotArea','YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr','TotRmsAbvGrd']
X = melbourne_data[feature_names]
# 观察输入数据
print(X.describe())

数据划分 train_X、train_y、val_X、val_y

1 2	from sklearn.model_selection import train_test_split train_X, train_y, val_X, val_y = train_test_split(X,y,random_state=1)

配置和拟合Model

以DecisionTreeRegressor为例

1
2
3

from sklearn.tree import DecisionTreeRegressor
melbourne_model = DecisionTreeRegressor(random_state=1)
melbourne_model.fit(train_X, train_y)

做预测

输入X为例

1	predictions = melbourne_model.predict(val_X)

模型验证

以Mean Absolute Error（MAE）为例

1 2	from sklearn.metrics import mean_absolute_error val_mae = mean_absolute_error(melbourne_model.predict(val_X), val_y)

调参时候的操作方法

以寻找最大叶节点为例

def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    model.fit(train_X, train_y)
    preds_val = model.predict(val_X)
    mae = mean_absolute_error(val_y, preds_val)
    return(mae)

# 使用不同的最大叶节点
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
# Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
maes = [get_mae(mln,train_X, val_X, train_y, val_y) for mln in candidate_max_leaf_nodes]
best_tree_size = candidate_max_leaf_nodes[maes.index(min(maes))]
print(best_tree_size)

使用全部data拟合Model

# 基于前述最佳参数建立模型，如最大叶节点100
final_model = DecisionTreeRegressor(max_leaf_nodes=100)
# 使用所有数据参与训练
final_model.fit(X,y)

使用其他算法

随机森林Random Forest 是比决策树更容易得到好结果的模型

from sklearn.ensemble import RandomForestRegressor
from sklearn.metric import mean_absolute_error

# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state=1)

# fit your model
rf_model.fit(train_X, train_y)

# Calculate the mean absolute error of your Random Forest model on the validation data
rf_val_mae = mean_absolute_error(rf_model.predict(val_X),val_y)

print("Validation MAE for Random Forest Model: {}".format(rf_val_mae))

在kaggle打比赛

首先需要加入比赛join competition
然后运行后在kaggle/working中生成submission.csv
点击submit即可

继续折腾

试验是最好的手段，尝试选用不同的features

2024-04-12

微信运动自动点赞

思路

通过USB调试在电脑显示并可操作手机屏幕
使用pyautogui获取屏幕图像
使用opencv处理和模板匹配，识别需要点击的区域
使用pyautogui点击，并滚动下一页

方法

安卓手机

激活开发者模式
在开发者选项中勾选“允许USB调试”
用USB线缆连接电脑

电脑

打开软件scrcpy.exe，正常可看到并操控手机屏幕，软件官网scrcpy
将窗口调整到固定位置，使用截图工具将待点击的心形图截图保存为heart.jpg
在同一文件夹创建并运行Python脚本如下

import pyautogui
import time
import numpy as np
import cv2 as cv

# 设定点击的间隔时长
pyautogui.PAUSE = 0.02

def match_click(image, templ):
    assert image is not None, "file could not be read, check with os.path.exists()"
    # 二值化操作，选用了阈值240，如果发现完全不工作，在0-255之间调整一下
    ret,img = cv.threshold(image,240,255,cv.THRESH_BINARY)
    # 模板图片，保存好的
    template = cv.imread(templ,cv.IMREAD_GRAYSCALE)
    assert template is not None, "file could not be read, check with os.path.exists()"
    h, w = template.shape[:2]
    # 匹配模板
    res = cv.matchTemplate(img, template, cv.TM_CCOEFF_NORMED)
    # 只保留高置信度结果
    threshold = 0.8
    loc = np.where( res >= threshold)
    for pt in zip(*loc[::-1]):
        # 点击
        pyautogui.click(pt[0],pt[1], button='left')
    for i in range(12):
        pyautogui.scroll(-100)
        time.sleep(0.2)

i = 0
time.sleep(1)
while(True):
   

    try:
        # 获取截屏图像
        img = pyautogui.screenshot()
        
        # 从PIL转OpenCV
        img = cv.cvtColor(np.asarray(img),cv.COLOR_RGB2GRAY) 
        
        match_click(img, 'heart.jpg')
      
        i +=1
        print('第{}页'.format(i))

    except:
        print("没有发现目标")
        break

存在的Bug

不能避免给自己点赞，导致进入自己的微信运动主页而中断。