2024-04-01

OpenCV 特征匹配

1. 暴力匹配

暴力匹配使用一些距离计算两组特征描述之间的匹配度

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
 
img1 = cv.imread('images/box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage

# 使用 ORB 
# Initiate ORB detector
orb = cv.ORB_create()
 
# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)

# create BFMatcher object
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
 
# Match descriptors.
matches = bf.match(des1,des2)
 
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
 
# Draw first 10 matches.
img3 = cv.drawMatches(img1,kp1,img2,kp2,matches[:15],None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
 
plt.imshow(img3),plt.title('ORB'),plt.show()

# 使用 SIFT
# Initiate SIFT detector
sift = cv.SIFT_create()
 
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
 
# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1,des2,k=2)
 
# Apply ratio test
good = []
for m,n in matches:
 if m.distance < 0.35*n.distance:
    good.append([m])
 
# cv.drawMatchesKnn expects list of lists as matches.
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
 
plt.imshow(img3),plt.title('SIFT'),plt.show()

2. 基于FLANN的匹配

FLANN是近似最近邻的快速库.包含了针对大型高维特征快速最近邻搜索优化的算法集.在大数据集上比BFMatcher更好.
使用FLANN,需要传入索引字典IndexParams描述所用算法及其参数.如需要使用SIFT/SURF等,传入如下内容

1 2	FLANN_INDEX_KDTREE = 1 index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)

而使用ORB,使用如下,参数要自己调整,文档中建议的不一定最符合实际使用

FLANN_INDEX_LSH = 6
index_params= dict(algorithm = FLANN_INDEX_LSH,
    table_number = 6, # 12
    key_size = 12, # 20
    multi_probe_level = 1) #2

第二个字典是SearchParams,定义应该递归遍历索引中的树的次数,这个值越高精度越高,但是同样会消耗更多的时间

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
 
img1 = cv.imread('images/box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
 
# Initiate SIFT detector
sift = cv.SIFT_create()
 
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
 
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
 
flann = cv.FlannBasedMatcher(index_params,search_params)
 
matches = flann.knnMatch(des1,des2,k=2)
 
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in range(len(matches))]
 
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
 if m.distance < 0.7*n.distance:
    matchesMask[i]=[1,0]
 
draw_params = dict(matchColor = (0,255,0),
 singlePointColor = (255,0,0),
 matchesMask = matchesMask,
 flags = cv.DrawMatchesFlags_DEFAULT)
 
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
 
plt.imshow(img3,),plt.show()

3. 特征匹配与单应实现物体查找

前述的匹配是在另一张图片中寻找目标物体的一些部分.使用cv.findHomography(),传入两图片的点,可以找到物体在两个图片中的视角转换. 然后可以用cv.perspectiveTransform找到目标.至少需要4个正确的点来找到这个转换矩阵.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
MIN_MATCH_COUNT = 10
 
img1 = cv.imread('images/box.png', cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png', cv.IMREAD_GRAYSCALE) # trainImage
 
# Initiate SIFT detector
sift = cv.SIFT_create()
 
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
 
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
 
flann = cv.FlannBasedMatcher(index_params, search_params)
 
matches = flann.knnMatch(des1,des2,k=2)
 
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
 if m.distance < 0.7*n.distance:
    good.append(m)

if len(good)>MIN_MATCH_COUNT:
 src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
 dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
 
 M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
 matchesMask = mask.ravel().tolist()
 
 h,w = img1.shape
 pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
 dst = cv.perspectiveTransform(pts,M)
 
 img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)
 
else:
 print( "Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT) )
 matchesMask = None

draw_params = dict(matchColor = (0,255,0), # draw matches in green color
 singlePointColor = None,
 matchesMask = matchesMask, # draw only inliers
 flags = 2)
 
img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
 
plt.imshow(img3, 'gray'),plt.show()

2024-03-28

OpenCV相机畸变校准

理论

相机包含了径向畸变和切向畸变。
径向畸变可以让实际中的直线在图像中弯曲，这种效应离图像中心越远越强烈。
径向畸变表示为

切向畸变来自于镜片与传感器之间的平行度误差，导致有一些区域图像看起来比实际要近。切向畸变表示为

因此需要找到如下畸变系数

此外，我们还需要获取相机的内参和外参。内参包括了焦距（f_x, f_y）和光学中心（c_x, c_y），可用来创建一个相机矩阵。而相机矩阵也是消除一个相机畸变需要的。相机矩阵是相机固有的属性，一旦求得，可以复用到同一相机的所有图片。

外参对应了将3D点转换到一个坐标系的平移向量和旋转向量

通常在立体视觉应用中，校正镜头畸变是必须的事情。校正的原理是，提供一些完好定义的样本图片（例如，棋盘图、圆点图），已知其上特征点的真实相对坐标，也知道对应点在图像上的坐标，就可以计算出来畸变系数。至少提供10张样本图片以确保好的效果。

代码

相机校正需要的输入是一系列3D真实点坐标和对应的2D图像坐标。在图像中找到2D坐标没有任何问题。但真实3D点坐标有点难了。为了简化，认为棋盘格都是在XY平面固定的，这样Z全是0，事情好办了起来。
在代码中,3D点是object points, 2D点是image points

import numpy as np
import cv2 as cv
import glob

criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)

objp = np.zeros((6*9, 3), np.float32)
objp[:,:2] = np.mgrid[0:9, 0:6].T.reshape(-1,2)

objpoints = []  # 真实世界的3D坐标
imgpoints = []  # 图像中的2D坐标

images = glob.glob('chessboard/*.jpg')

for fname in images:
    img = cv.imread(fname)
    gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

    # 寻找棋盘的角点
    ret, corners = cv.findChessboardCorners(gray, (9,6),None)

    # 如果找到了，加入目标点、图像点
    if ret == True:
        objpoints.append(objp)
        corners2 = cv.cornerSubPix(gray, corners, (11,11),(-1,-1),criteria)
        imgpoints.append(corners2)

        cv.drawChessboardCorners(img,(9,6),corners2,ret)
        cv.imshow('img', img)
        cv.waitKey(500)

# 校正 返回：ret、相机矩阵、扭曲系数、旋转向量s、平移向量s
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# 存储为npz文件，便于读取使用
np.savez('cameracalib',mtx=mtx, dist=dist, rvecs=rvecs, tvecs=tvecs)
calib_file = np.load('cameracalib.npz')
print(calib_file['mtx'])

img = cv.imread('chessboard/left12.jpg')
h, w = img.shape[:2]
newcameramtx, roi = cv.getOptimalNewCameraMatrix(calib_file['mtx'], calib_file['dist'], (w,h), 1, (w,h))

# 消除畸变
dst = cv.undistort(img,mtx, dist, None, newcameramtx)
x,y,w,h = roi
dst1 = dst[y:y+h, x:x+w]
cv.imshow('ds1',dst1)
cv.imwrite('calibresult.png', dst1)

# 另一种方法消除畸变
mapx, mapy = cv.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w,h), 5)
dst2 = cv.remap(img, mapx, mapy, cv.INTER_LINEAR)
dst2 = dst2[y:y+h, x:x+w]
cv.imshow('ds2',dst2)

cv.waitKey(0)
cv.destroyAllWindows()

2024-03-21

OpenCV 特征提取与描述

1. 什么是特征

图像中的小区域，向周围小范围移动时变化最大，即特征。寻找到这些特征的过程叫做特征检测（Feature Detection）。
比如一个白色背景的矩形图像，位于四个角落的小区域是特征点，位于边线上的次要特征，位于纯色区域的没有特征

2. Harris 角落检测

寻找在全方向的（u,v）位移的亮度变化，形成函数，使用泰勒展开，推为矩阵M。创建一个分数R，取决于矩阵M的两个特征值之间的相对大小关系，判断为平区域、边界、角点

R = det(M) - k(trace(M))²

其中

det(M) = λ₁λ₂
trace(M) = λ₁ + λ₂
λ₁和λ₂是M的特征值

代码使用cv.cornerHarris()

img -输入图像，灰度float32
blockSize -角点检测考虑的邻域大小
kSize -Sobel 微分使用的Aperture参数
k -Harris检测公式中的自由参数

import numpy as np
import cv2 as cv
 
filename = 'calibresult.png'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
 
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.1)
 
#result is dilated for marking the corners, not important
dst = cv.dilate(dst,None)
 
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
 
cv.imshow('dst',img)
if cv.waitKey(0) & 0xff == 27:
 cv.destroyAllWindows()

如果要获得亚像素精度的焦点，使用cv.cornerSubPix()

# find centroids
ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)
 
# define the criteria to stop and refine the corners
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)

3. Shi-Tomasi 角点检测 Good Features to Track

将分数定义为了

R = min(λ₁, λ₂)

若R大于某个阈值，则认为是角点。

使用cv.goodFeaturesToTrack()

输入图像
需要寻找的角点数目
0-1之间的质量等级

角点之间的最小欧式距离

corners = cv.goodFeaturesToTrack(gray,250,0.01,20)
corners = np.int0(corners)
 
for i in corners:
    x,y = i.ravel()
    cv.circle(img,(x,y),3,255,-1)

4. SIFT 尺度不变特征转换

小窗口中的角点图被放大后，用同样大的窗口观察看起来变得平滑了。SIFT中，分别进行尺度空间极值检测、关键点定位、方向赋值、关键点描述、关键点匹配。这个算法在2020年专利已经到期，可放心使用。

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
filename = 'images/home.jpg'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

sift = cv.SIFT_create()
kp = sift.detect(gray,None)
 
img=cv.drawKeypoints(gray,kp,img)
 
cv.imwrite('sift_keypoints.jpg',img)

sift.detect()可以输入mask指定寻找区域
cv.drawKeyPoints()用于绘制关键点的圆圈，如果传入flag=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS，会绘制关键点直径的圆圈并显示其方向。

1	img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

计算特征描述(Descriptor)

已知关键点kp，用sift.compute(),``kp, des = sift.compute(gray, kp)
一步到位，用sift.detectdAndCompute()
1
2
sift = cv.SIFT_create()
kp, des = sift.detectAndCompute(gray,None)
kp是关键点列表，des是numpy数组形状是（关键点数）*128

通常获得了关键点和描述，我们就可以在之后的操作中匹配不同图片中的关键点了。

5. SURF 快速鲁棒特征

SURF在每个步骤增加了很多特征，达到同样效果比SIFT快3倍，适用于带有模糊和旋转的图片，但不适用视角转变和光线变化的情况。

如何在OpenCV中还处于专利保护阶段，要想使用，需要卸载当前高版本，重新安装opencv-contrib-python==3.4.2.17

# 寻找SURF关键点和描述符并绘制
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('images/fly.png', 0)

# 创建SURF对象，可以在创建时指定参数也可以稍后设置参数
# 此处设置 Hessian阈值为 400
sift = cv2.xfeatures2d.SIFT_create()
print('sift: ', sift)
surf = cv2.xfeatures2d.SURF_create(400)
print('surf: ', surf,
      ' \ndefaultParameter\thessianThreshold: ', surf.getHessianThreshold(),
      ' upright: ', surf.getUpright(),
      ' extended: ', surf.getExtended(),
      ' descriptors: ',surf.descriptorSize())

# 寻找SURF关键点和描述符
# kp:返回的关键点列表，des：numpy数组
kp, des = surf.detectAndCompute(img, None)
# 绘制关键点在图片上
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)
plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('more keypoints'), plt.show()

print('keypoints: ', len(kp))

# 检查当前Hessian阈值
# print(surf.getHessianThreshold())

# 调整Hessian阈值，此处设置为50000，但一般最佳设置为300~500
surf.setHessianThreshold(50000)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
      ' upright: ', surf.getUpright(),
      ' extended: ', surf.getExtended(),
      ' descriptors: ',surf.descriptorSize())

# 再一次计算关键点和描述符
kp, des = surf.detectAndCompute(img, None)

print('keypoints: ', len(kp))

# 绘制关键点在图片上
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('less than 50 keypoints'), plt.show()

# U-SURF不会计算方向
# print(surf.getUpright())
surf.setUpright(True)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
      ' upright: ', surf.getUpright(),
      ' extended: ', surf.getExtended(),
      ' descriptors: ',surf.descriptorSize())

# 重新计算关键点和描述符，并绘制
kp = surf.detect(img, None)
print('keypoints: ', len(kp))
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('U-SURF'), plt.show()

# 所有方向显示在同一方向，它比以前快多了。如果您正在处理方向不成问题的情况（如全景缝合）等，使用U-SURF会更好。
# 寻找描述符的大小
# print(surf.descriptorSize())
# extended为false，默认为64D
# print(surf.getExtended())

# 设置描述符为128D
surf.setExtended(True)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
      ' upright: ', surf.getUpright(),
      ' extended: ', surf.getExtended(),
      ' descriptors: ',surf.descriptorSize())

kp, des = surf.detectAndCompute(img, None)
print('keypoints: ',len(kp))
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('128D res'), plt.show()

代码来自这

6. FAST 快速角点检测方法

比前几种方法快几倍，但对高噪音不鲁棒。有一个阈值参数。

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('images/blox.jpg', cv.IMREAD_GRAYSCALE) # `<opencv_root>/samples/data/blox.jpg`
 
# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create()
 
# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
 
# Print all default params
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )
 
cv.imwrite('fast_true.png', img2)
 
# Disable nonmaxSuppression
fast.setNonmaxSuppression(0)
kp = fast.detect(img, None)
 
print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
 
img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
 
cv.imwrite('fast_false.png', img3)

7. BRIEF 二元鲁棒独立基本特征

一种更快的特征描述与匹配方法，需要使用其他的方法检测到关键点，适用于CenSurE（STAR）方法

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('images/aero1.jpg', cv.IMREAD_GRAYSCALE)
 
# Initiate FAST detector
fast = cv.xfeatures2d.StarDetector_create()
 
# Initiate BRIEF extractor
brief = cv.xfeatures2d.BriefDescriptorExtractor_create()
 
# find the keypoints with STAR
kp = fast.detect(img,None)

 
# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)
 
print( brief.descriptorSize() )
print( des.shape )

8. ORB（Oriented FAST and Rotated BRIEF）

没有专利，安全使用，更快更好。

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('images/blox.jpg', cv.IMREAD_GRAYSCALE)
 
# Initiate ORB detector
orb = cv.ORB_create()
 
# find the keypoints with ORB
kp = orb.detect(img,None)
 
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
 
# draw only keypoints location,not size and orientation
img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0)
plt.imshow(img2), plt.show()

具体使用直接看官方教程

2024-03-20

OpenCV 图像处理

1. 更改图像颜色空间

使用cv.cvtColor()，输入：图像，方法（比如cv.COLOR_BGR2HSV）

1	hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)

2. 几何转换操作

调整图像大小

import numpy as np
import cv2 as cv
 
img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"
 
res = cv.resize(img,None,fx=2, fy=2, interpolation = cv.INTER_CUBIC)
 
#OR
 
height, width = img.shape[:2]
res = cv.resize(img,(2*width, 2*height), interpolation = cv.INTER_CUBIC)

使用cv.warpAffine()平移图片;
使用cv.getRotationMatrix2D()获取2x3旋转矩阵，如旋转90度

import numpy as np
import cv2 as cv
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
rows,cols = img.shape
 
#M = np.float32([[1,0,100],[0,1,50]]) # 平移矩阵
M = cv.getRotationMatrix2D(((cols-1)/2.0,(rows-1)/2.0),90,1) # 旋转矩阵
dst = cv.warpAffine(img,M,(cols,rows)) # 输入：图像，转换矩阵，尺寸（宽，高）
 
cv.imshow('img',dst)
cv.waitKey(0)
cv.destroyAllWindows()

3. 仿射变换

在仿射变换中，原图中平行线在变换后仍然平行。
方法为使用cv.getAffineTransform()获取2x3的转换矩阵后传入cv.warpAffine()
生成操作矩阵需要3组对应坐标

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
 
img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"
rows,cols,ch = img.shape
 
pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])
 
M = cv.getAffineTransform(pts1,pts2)
 
dst = cv.warpAffine(img,M,(cols,rows))
 
plt.subplot(121),plt.imshow(img),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()

4. 视角变换

视角变换后直线还是直线。变换矩阵是3x3的，需要4组对应已知的坐标，其中不能存在3点共线，这样便能使用cv.getPerspectiveTransform()获得转换矩阵，然后矩阵传入cv.warpPespective()即可

pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
 
M = cv.getPerspectiveTransform(pts1,pts2)
 
dst = cv.warpPerspective(img,M,(300,300))

5. 阈值处理

使用cv.threshold()，输入：图像（灰度），阈值，最大值（超阈值的设定为此），方法
返回：使用阈值, 阈值处理的图

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
ret,thresh1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
ret,thresh2 = cv.threshold(img,127,255,cv.THRESH_BINARY_INV)
ret,thresh3 = cv.threshold(img,127,255,cv.THRESH_TRUNC)
ret,thresh4 = cv.threshold(img,127,255,cv.THRESH_TOZERO)
ret,thresh5 = cv.threshold(img,127,255,cv.THRESH_TOZERO_INV)
 
titles = ['Original Image','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]
 
for i in range(6):
    plt.subplot(2,3,i+1),plt.imshow(images[i],'gray',vmin=0,vmax=255)
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
 
plt.show()

自适应阈值
cv.ADAPTIVE_THRESH_MEAN_C 阈值 = 邻域均值 - C
cv.ADAPTIVE_THRESH_GAUSSIAN_C 阈值 = 邻域高斯加权和均值 - C
blockSize决定领域大小，C是从邻域均值或加权均值中减去的常数

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img = cv.medianBlur(img,5)
 
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
 cv.THRESH_BINARY,11,2)
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
 cv.THRESH_BINARY,11,2)
 
titles = ['Original Image', 'Global Thresholding (v = 127)',
 'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [img, th1, th2, th3]
 
for i in range(4):
 plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
 plt.title(titles[i])
 plt.xticks([]),plt.yticks([])
plt.show()

大津二值化 Otsu’s Binarization
可以不需要选择一个数值作为阈值，从图像直方图确定全局最优阈值，该值通过最小化加权组内方差求得

1 2	# Otsu's thresholding ret2,th2 = cv.threshold(img,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)

6. 图像平滑（卷积）

cv.filter2D()实现卷积

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"
 
kernel = np.ones((5,5),np.float32)/25   # 平均kernel
dst = cv.filter2D(img,-1,kernel)

blr = cv.blur()
 
plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.show()

图像过滤模糊的一些常用函数

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"
 
kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)

blr = cv.blur(img,(5,5))

g_blr = cv.GaussianBlur(img, (5,5),0)   # 高斯模糊：最后参数σx和σy

m_blr = cv.medianBlur(img, 5)   # 有效消除椒盐噪声

bi_blr = cv.bilateralFilter(img,9, 75, 75)  # 由一个空间高斯方程和一个像素差异方程组成，可以实现去纹理而存边界
 
plt.subplot(321),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(322),plt.imshow(dst),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.subplot(323),plt.imshow(blr),plt.title('Bluring')
plt.xticks([]), plt.yticks([])
plt.subplot(324),plt.imshow(g_blr),plt.title('gaussian')
plt.xticks([]), plt.yticks([])
plt.subplot(325),plt.imshow(m_blr),plt.title('medianBlur')
plt.xticks([]), plt.yticks([])
plt.subplot(326),plt.imshow(bi_blr),plt.title('Bilaterial filtering')
plt.xticks([]), plt.yticks([])

plt.show()

7. 形态学操作

腐蚀：使用一个kernel扫过二值图片每一个角落，其下所有像素都1时候保留中心位置在原图对应的像素，否则被“腐蚀”为0

import cv2 as cv
import numpy as np
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
kernel = np.ones((5,5),np.uint8)
erosion = cv.erode(img,kernel,iterations = 1) # 腐蚀
dilation = cv.dilate(img,kernel,iterations = 1) # 膨胀：但凡kernel下面有个数值都将kernel中心位置像素设为1

# 开运算 腐蚀后膨胀    可用于去噪
opening = cv.morphologyEx(img, cv.MORPH_OPEN, kernel)

# 闭运算 膨胀后腐蚀    用于消除形状内部的小孔
closing = cv.morphologyEx(img, cv.MORPH_CLOSE, kernel)

# 梯度 膨胀与腐蚀的差异 结果看起来像图形的外轮廓
gradient = cv.morphologyEx(img, cv.MORPH_GRADIENT, kernel)

# Top Hat 输入图像与开运算的差异
tophat = cv.morphologyEx(img, cv.MORPH_TOPHAT, kernel)

# Black Hat 输入图像与闭运算的差异
blackhat = cv.morphologyEx(img, cv.MORPH_BLACKHAT, kernel)

使用cv.getStructuringElement()生成结构化kernel，输入形状和尺寸元组即可

kernel = cv.getStructuringElement(cv.MORPH_RECT,(5,5))  # 矩形的
print(kernel)
cv.getStructuringElement(cv.MORPH_ELLIPSE,(5,5))    # 椭圆形的
cv.getStructuringElement(cv.MORPH_CROSS,(5,5))  #   十字形的

8. 图像梯度

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
 
laplacian = cv.Laplacian(img,cv.CV_64F)
sobelx = cv.Sobel(img,cv.CV_64F,1,0,ksize=5)
sobely = cv.Sobel(img,cv.CV_64F,0,1,ksize=5)
 
plt.subplot(2,2,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,2),plt.imshow(laplacian,cmap = 'gray')
plt.title('Laplacian'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,3),plt.imshow(sobelx,cmap = 'gray')
plt.title('Sobel X'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,4),plt.imshow(sobely,cmap = 'gray')
plt.title('Sobel Y'), plt.xticks([]), plt.yticks([])
 
plt.show()

特别注意：将输出的数据类型设定高一些，如cv.CV_64F等，再转为cv.CV_8U，可避免丢失信息

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('opencv.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
 
# Output dtype = cv.CV_8U
#sobelx8u = cv.Sobel(img,cv.CV_8U,1,0,ksize=3)
sobelx8u = cv.Laplacian(img,cv.CV_8U)
 
# Output dtype = cv.CV_64F. Then take its absolute and convert to cv.CV_8U
#sobelx64f = cv.Sobel(img,cv.CV_64F,1,0,ksize=3)
sobelx64f = cv.Laplacian(img,cv.CV_64F)
abs_sobel64f = np.absolute(sobelx64f)
sobel_8u = np.uint8(abs_sobel64f)
 
plt.subplot(1,3,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,2),plt.imshow(sobelx8u,cmap = 'gray')
plt.title('Sobel CV_8U'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,3),plt.imshow(sobel_8u,cmap = 'gray')
plt.title('Sobel abs(CV_64F)'), plt.xticks([]), plt.yticks([])
 
plt.show()

9. 图像金字塔

1
2
3

lower_reso = cv.pyrDown(higher_reso)    # 下取样

higher_reso2 = cv.pyrUp(lower_reso)     # 上取样

使用金字塔操作混合两个图片的例子

import cv2 as cv
import numpy as np,sys
 
A = cv.imread('opencv.png')
B = cv.imread('opencv_white.png')
assert A is not None, "file could not be read, check with os.path.exists()"
assert B is not None, "file could not be read, check with os.path.exists()"
 
# generate Gaussian pyramid for A
G = A.copy()
gpA = [G]
for i in range(6):
 G = cv.pyrDown(G)
 gpA.append(G)
 
# generate Gaussian pyramid for B
G = B.copy()
gpB = [G]
for i in range(6):
 G = cv.pyrDown(G)
 gpB.append(G)
 
# generate Laplacian Pyramid for A
lpA = [gpA[5]]
for i in range(5,0,-1):
 GE = cv.pyrUp(gpA[i])
 L = gpA[i-1] - GE
 lpA.append(L)
 
# generate Laplacian Pyramid for B
lpB = [gpB[5]]
for i in range(5,0,-1):
 GE = cv.pyrUp(gpB[i])
 L = gpB[i-1] - GE
 lpB.append(L)
 
# Now add left and right halves of images in each level
LS = []
for la,lb in zip(lpA,lpB):
 rows,cols,dpt = la.shape
 ls = np.hstack((la[:,0:cols//2], lb[:,cols//2:]))
 LS.append(ls)
 
# now reconstruct
ls_ = LS[0]
for i in range(1,6):
 ls_ = cv.pyrUp(ls_)
 ls_ = cv.add(ls_, LS[i])
 
# image with direct connecting each half
real = np.hstack((A[:,:cols//2],B[:,cols//2:]))
 
cv.imshow('Pyramid_blending2.jpg',ls_)
cv.imshow('Direct_blending.jpg',real)

cv.waitKey(0)

10. Canny 边缘检测

求梯度，边缘与梯度法线垂直，判断边缘点在梯度方向是否为局部最大值，是则保留，否则置零（非极大值抑制）；下一步，小于最小值不认为是边缘，大于最大值是确定边缘，位于其间而又与确定边缘相连的也予以保留，否则舍弃。
使用cv.Canny()，输入：图像，最小值，最大值，用于算梯度的Sobel kernel 尺寸（默认3），L2gradient(True 全面公式较精确，False 默认的简化公式)

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
edges = cv.Canny(img,100,180)
 
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])
 
plt.show()

11. 轮廓检测

使用cv.findCountours()获取轮廓，输入：源图，轮廓模式，近似方法
源图只能是二值图，白色为目标黑色为背景
近似方法用cv.CHAIN_APPROX_NONE将保存轮廓所有点，而用cv.CHAIN_APPROX_SIMPLE可以将轮廓用少量数据描述，如矩形仅用4个焦点等。

contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)

# 绘制Contours或者通过提供的边界点绘图
cv.drawContours(im, contours, -1, (0,255,0), 2)
# 输入：源图，轮廓，轮廓索引（-1全部），颜色，线宽

cnt = countours[2]

M = cv.moments(cnt) # 求轮廓moments

# 求轮廓质心
c_x = M['m10'] / M['m00']
c_y = M['m01'] / M['m00']

area = cv.contourArea(cnt)  # 求轮廓面积，即M['m00']

# 求轮廓周长
perimeter = cv.arcLength(cnt,True)  # 输入：轮廓，轮廓是否闭合（True闭合）

# 轮廓近似，根据指定的精度要求使用更少的顶点近似轮廓。
epsilon = 0.1*cv.arcLength(cnt,True)    # 精度参数：从轮廓到模拟轮廓的最大允许距离
approx = cv.approxPolyDP(cnt,epsilon,True)

凸包 Convex Hull
凸曲线都是往外凸的，只少也是平的。cv.convexHull用来修复凸性缺陷（内凹情况）

1	hull = cv.convexHull(points[, hull[, clockwise[, returnPoints]]])

points 传入的轮廓
hull 输出，通常免去
clockwise 方向标志 True顺时针
returnPoints 默认True返回凸点坐标，而False返回凸点坐标对应的轮廓点索引
实际使用hull = cv.convexHull(cnt)就行了

1	print(cv.isContourConvex(cnt)) # 检查轮廓凸性

轮廓的直边界框

1 2	x, y, w, h = cv.boundingRect(cnt) cv.rectangle(im, (x,y),(x+w, y+h),(0,0,255),1)

轮廓的旋转边界宽
使用cv.minAreaRect()绘制轮廓的最小面积边界框，返回（中心坐标(x,y),(宽, 高)，旋转角度）。传入cv.boxPoints()绘制矩形

rect = cv.minAreaRect(cnt)
box = cv.boxPoints(rect)
box = np.int0(box)
cv.drawContours(im,[box],0,(0,0,255),2)

最小封闭圆
完全包括轮廓的最小圆

(x,y),radius = cv.minEnclosingCircle(cnt)
center = (int(x),int(y))
radius = int(radius)
cv.circle(im,center,radius,(0,255,0),2)

拟合椭圆

1 2	ellipse = cv.fitEllipse(cnt) cv.ellipse(im,ellipse,(0,255,0),2)

拟合直线

rows,cols = img.shape[:2]
[vx,vy,x,y] = cv.fitLine(cnt, cv.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((cols-x)*vy/vx)+y)
cv.line(img,(cols-1,righty),(0,lefty),(0,255,0),2)

12. 轮廓属性

# 宽高比 Aspect_ratio
x,y,w,h = cv.boundingRect(cnt)
aspect_ratio = float(w)/h

# 轮廓占边界框面积比 Extent
area = cv.contourArea(cnt)
x,y,w,h = cv.boundingRect(cnt)
rect_area = w*h
extent = float(area)/rect_area

# 坚固性 Solidity
area = cv.contourArea(cnt)
hull = cv.convexHull(cnt)
hull_area = cv.contourArea(hull)
solidity = float(area)/hull_area

# 当量直径 Equivalent Diameter 与轮廓面积相同圆的直径
area = cv.contourArea(cnt)
equi_diameter = np.sqrt(4*area/np.pi)

# 方向 Orientation
(x,y),(MA,ma),angle = cv.fitEllipse(cnt)

# Mask
mask = np.zeros(imgray.shape,np.uint8)
cv.drawContours(mask,[cnt],0,255,-1)
pixelpoints = np.transpose(np.nonzero(mask))
#pixelpoints = cv.findNonZero(mask)

# 最大值最小值及其位置
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(imgray,mask = mask)

# 平均颜色或亮度
mean_val = cv.mean(im,mask = mask)

# 极值点
leftmost = tuple(cnt[cnt[:,:,0].argmin()][0])
rightmost = tuple(cnt[cnt[:,:,0].argmax()][0])
topmost = tuple(cnt[cnt[:,:,1].argmin()][0])
bottommost = tuple(cnt[cnt[:,:,1].argmax()][0])

# 多边形测试 返回点到多边形的最短距离，为正在轮廓内，为0在轮廓上，为负在轮廓外，第三参数True返回距离，False返回正负1（速度快2-3x）
dist = cv.pointPolygonTest(cnt,(50,50),True)

# 轮廓匹配 可用于OCR
ret, thresh = cv.threshold(img1, 127, 255,0)
ret, thresh2 = cv.threshold(img2, 127, 255,0)
contours,hierarchy = cv.findContours(thresh,2,1)
cnt1 = contours[0]
contours,hierarchy = cv.findContours(thresh2,2,1)
cnt2 = contours[0]
 
ret = cv.matchShapes(cnt1,cnt2,1,0.0)
print( ret )

13. 轮廓的层级

我们在使用查找轮廓的时候返回了一个hierarchy，即轮廓可能在另一轮廓之内的这种父子关系在opencv中的表达
** [Next, Previous, First_Child, Parent] **
Next 表示同一层级的下一个轮廓
Previous 同一层级的上一个轮廓
First_Child 第一个子轮廓
Parent 父轮廓

注：如果没有父子轮廓，该位置设为-1

轮廓检索模式
RETR_LIST直接生成所有轮廓，无父子关系
RETR_EXTERNAL只要最外轮廓
RETR_CCOMP 排成2级，外部轮廓为层级1，孔洞轮廓为层级2
RETR_TREE 完整的层级

14. 直方图

直方图的x轴是亮度，从0到255的（在8bit图像下，可以更改），纵高是每一亮度像素的总数量。直方图分析只用灰度图
直方图常用术语如下：
BINS 用来指定x轴上有多少个区间，如256，或者16
DIMS 采集数据的维度，如1
RANGE 需要采集数据的范围，通常[0,256]

使用OpenCV获取直方图
cv.calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]])

images : uint8或float32的源图，用方括号包含，如 “[img]”。
channels : 准备计算的通道的索引，用方括号包含，灰度图可用[0]。彩色图可用[0], [1] 或 [2] 分别计算蓝色、绿色、红色通道的直方图。
mask : 掩膜图，计算全图直方图时候置”None”即可，否则创建一个mask放到这里。
histSize : 用方括号包含的BINS，如 [256]。

ranges : 范围，通常 [0,256]。

1
2
3

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
hist = cv.calcHist([img],[0],None,[16],[0,256])

使用Numpy获取直方图，OpenCV比之快40x

1	hist,bins = np.histogram(img.ravel(),256,[0,256])

绘制直方图

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('bottle.png')
assert img is not None, "file could not be read, check with os.path.exists()"
color = ('b','g','r')
for i,col in enumerate(color):
    histr = cv.calcHist([img],[i],None,[256],[0,256])
    plt.plot(histr,color = col)
    plt.xlim([0,256])
plt.show()

获取掩膜Mask内的直方图

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('bottle.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
 
# create a mask
mask = np.zeros(img.shape[:2], np.uint8)
mask[400:800, 400:1000] = 255
masked_img = cv.bitwise_and(img,img,mask = mask)
 
# Calculate histogram with mask and without mask
# Check third argument for mask
hist_full = cv.calcHist([img],[0],None,[256],[0,256])
hist_mask = cv.calcHist([img],[0],mask,[256],[0,256])
 
plt.subplot(221), plt.imshow(img, 'gray')
plt.subplot(222), plt.imshow(mask,'gray')
plt.subplot(223), plt.imshow(masked_img, 'gray')
plt.subplot(224), plt.plot(hist_full), plt.plot(hist_mask)
plt.xlim([0,256])
 
plt.show()

直方图均衡化，提高图像对比度，统一光照条件

img = cv.imread('wiki.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
equ = cv.equalizeHist(img)
res = np.hstack((img,equ)) #stacking images side-by-side
cv.imwrite('res.png',res)

对比度有限自适应直方图均衡CLAHE
将图像分成小块做常规的直方图均衡，如果任何直方图bin超过了给定的对比度限（默认40），做直方图均衡前会将这些像素剪切均匀分散到其他bins，做完后使用双线性插值去除边界不自然。

# create a CLAHE object (Arguments are optional).
clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl1 = clahe.apply(img)
cv.imwrite('clahe_2.jpg',cl1)

2D直方图
将图像BGR转HSV，对Hue和Saturation进行绘制，还是用cv.calcHist()

channels = [0,1] 因为我们使用H 和 S 平面
bins = [180,256] 180 是 H 平面，256是 S 平面

range = [0,180,0,256] Hue范围从0到180，Saturation从0到256

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
 
img = cv.imread('view.png')
assert img is not None, "file could not be read, check with os.path.exists()"
hsv = cv.cvtColor(img,cv.COLOR_BGR2HSV)
hist = cv.calcHist( [hsv], [0, 1], None, [180, 256], [0, 180, 0, 256] )
 
plt.imshow(hist,interpolation = 'nearest')
plt.show()

直方图反向投射
用于图像分割或者识别图像中物体。通过创建一个与输入同样宽高的单色图，其每个像素点代表输入图片的对应像素输入物体的概率，也就是越亮的地方有目标物体的概率越大。
方法：计算包含目标物体的图像之直方图，该图应经可能全部都是目标物体。做颜色直方图会比灰度直方图效果更佳。然后反向投射这个直方图到需要查找目标的图片，也就是计算目标图片每个像素属于目标物体图片的概率，并显示之。在合适的阈值下，可以达到将目标分割出来的目的。

1	cv.calcBackProject( images, channels, hist, ranges, scale[, dst] ) -> dst

用法类似与cv.calcHist()

import numpy as np
import cv2 as cv
 
roi = cv.imread('trees.png')
assert roi is not None, "file could not be read, check with os.path.exists()"
hsv = cv.cvtColor(roi,cv.COLOR_BGR2HSV)
 
target = cv.imread('view.jpg')
assert target is not None, "file could not be read, check with os.path.exists()"
hsvt = cv.cvtColor(target,cv.COLOR_BGR2HSV)
 
# calculating object histogram
roihist = cv.calcHist([hsv],[0, 1], None, [180, 256], [0, 180, 0, 256] )
 
# normalize histogram and apply backprojection
cv.normalize(roihist,roihist,0,255,cv.NORM_MINMAX)
dst = cv.calcBackProject([hsvt],[0,1],roihist,[0,180,0,256],1)
 
# Now convolute with circular disc
disc = cv.getStructuringElement(cv.MORPH_ELLIPSE,(5,5))
cv.filter2D(dst,-1,disc,dst)
 
# threshold and binary AND
ret,thresh = cv.threshold(dst,50,255,0)
thresh = cv.merge((thresh,thresh,thresh))
res = cv.bitwise_and(target,thresh)
 
res = np.vstack((target,thresh,res))
cv.imwrite('res.jpg',res)

15. 图像傅里叶转换

对应正弦信号，振幅剧烈变化表示高频，缓慢变化为低频。图像类似的，边缘和噪音的亮度变化剧烈，因此算是高频信号。

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('view.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
f = np.fft.fft2(img)    # 变换
fshift = np.fft.fftshift(f) # 将低频放到中心
magnitude_spectrum = 20*np.log(np.abs(fshift))
 
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

rows, cols = img.shape
crow, ccol = rows//2, cols//2
fshift[crow-10:crow+11, ccol-10:ccol+11] = 0    # 低频删除
f_ishift = np.fft.ifftshift(fshift) # 逆偏移
img_back = np.fft.ifft2(f_ishift)   # 逆变换
img_back = np.real(img_back)
 
plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Image after HPF'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(img_back)
plt.title('Result in JET'), plt.xticks([]), plt.yticks([])
 
plt.show()

OpenCV的实现要快一些，但是没有Numpy这么直观。

16. 模板匹配

使用cv.matchTemplate() ，如果输入图像尺寸（W,H），目标图(w,h)，那么输出(W-h+1,H-h+1)
使用cv.minMaxLoc()找到极值之所在作为矩形左上角坐标，结合（w，h）绘制包含目标的矩形

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
 
img = cv.imread('view.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img2 = img.copy()
template = cv.imread('build.jpg', cv.IMREAD_GRAYSCALE)
assert template is not None, "file could not be read, check with os.path.exists()"
w, h = template.shape[::-1]
 
# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
 'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']
 
for meth in methods:
    img = img2.copy()
    method = eval(meth)

    # Apply template Matching
    res = cv.matchTemplate(img,template,method)
    min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)

    # If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
    if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
        top_left = min_loc
    else:
        top_left = max_loc
        bottom_right = (top_left[0] + w, top_left[1] + h)

    cv.rectangle(img,top_left, bottom_right, 255, 2)

    plt.subplot(121),plt.imshow(res,cmap = 'gray')
    plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
    plt.subplot(122),plt.imshow(img,cmap = 'gray')
    plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
    plt.suptitle(meth)

    plt.show()

17. 霍夫直线变换

import cv2 as cv
import numpy as np
 
img = cv.imread('opencv.png')
assert img is not None, "img loading wrong"
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
edges = cv.Canny(gray,50,150,apertureSize = 3)

# 输入： 二值图，ρ精度，θ精度，阈值
lines = cv.HoughLines(edges,1,np.pi/180,100)
for line in lines:
 rho,theta = line[0]
 a = np.cos(theta)
 b = np.sin(theta)
 x0 = a*rho
 y0 = b*rho
 x1 = int(x0 + 1000*(-b))
 y1 = int(y0 + 1000*(a))
 x2 = int(x0 - 1000*(-b))
 y2 = int(y0 - 1000*(a))
 
 cv.line(img,(x1,y1),(x2,y2),(0,0,255),2)
 
cv.imwrite('houghlines3.jpg',img)

霍夫圆变换

import numpy as np
import cv2 as cv
 
img = cv.imread('opencv_white.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img = cv.medianBlur(img,5)
cimg = cv.cvtColor(img,cv.COLOR_GRAY2BGR)
 
circles = cv.HoughCircles(img,cv.HOUGH_GRADIENT,1,20,
 param1=50,param2=30,minRadius=0,maxRadius=0)
 
circles = np.uint16(np.around(circles))
for i in circles[0,:]:
 # draw the outer circle
 cv.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
 # draw the center of the circle
 cv.circle(cimg,(i[0],i[1]),2,(0,0,255),3)
 
cv.imshow('detected circles',cimg)
cv.waitKey(0)
cv.destroyAllWindows()

2024-03-20

OpenCV 快速使用

1. 安装opencv

在windows系统中，打开cmd窗口，输入如下代码：

1	pip install opencv-contrib-python -i https://pypi.tuna.tsinghua.edu.cn/simple

使用contrib版本的功能要全面一些，后面的-i及其后是使用清华的源进行下载，会快很多。

检查是否安装成功

1 2	import cv2 as cv print(cv.__version__) # 正常则显示版本号，我的例子是'4.9.0'

2. 基础绘制功能

cv.line(), cv.circle() , cv.rectangle(), cv.ellipse(), cv.putText()分别用来在图片上绘制直线、圆形、矩形、椭圆、添加文字。他们的参数非常类似，都包含如下几个

img: 绘制图形的目标图片
color: 绘制什么颜色的图形
thickness: 线条的粗细
lineType: 线型（没发现区别）

举个例子

import numpy as np
import cv2 as cv
 
# 创建一个黑色图片
img = np.zeros((512,512,3), np.uint8)
 
# 画一条蓝色对角线，注意颜色排序是BGR，所以（255，0，0）是蓝色
cv.line(img,(0,0),(500,500),(255,0,0),2)

# 画矩形
cv.rectangle(img, (200, 200), (280, 300), (255, 255, 0), 1)

# 圆，指定圆心坐标（300，100），半径100，颜色.., 线宽2
cv.circle(img, (300, 100), 100, (0, 255, 255), 2)

#椭圆：图像，椭圆中心，（长轴长度，短轴长度），角度，弧起角度，弧结束角度，（B, G, R), 线宽
cv.ellipse(img, (100, 300), (50, 30), 60, 0, 360, (255, 0, 255), 4)


pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1, 1, 2))

# 绘制多段线，第三个isClosed 如果设为True，可以自动将多段线闭合成多边形
cv.polylines(img, [pts], False, (255, 255, 255))

# 添加文字
font = cv.FONT_HERSHEY_COMPLEX
cv.putText(img, 'hello world', (40, 460),font, 1, (0x11,0xaa,0x11),2)


cv.imshow('draw', img)

cv.waitKey(3000)

cv.destroyAllWindows()

3. 用鼠标绘制

鼠标点击左键，绘制圆形。其中使用到了cv.setMouseCallback来为图像设置回调函数，回调函数draw_cicle接收event以及事件发生时的xy坐标，函数内判断事件类型，进行处理。

import numpy as np
import cv2 as cv
 
# mouse callback function
def draw_circle(event,x,y,flags,param):
    if event == cv.EVENT_LBUTTONDOWN:
        cv.circle(img,(x,y),100,(255,0,0),-1)
 
# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
 
while(1):
    cv.imshow('image',img)
    if cv.waitKey(20) & 0xFF == 27:
        break
cv.destroyAllWindows()

更高级的例子，使用m键切换模式，鼠标点击后拖动绘制矩形和圆形。

import numpy as np
import cv2 as cv
import math
 
drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1
 
# mouse callback function
def draw_circle(event,x,y,flags,param):
    global ix,iy,drawing,mode

    if event == cv.EVENT_LBUTTONDOWN:
        drawing = True
        ix,iy = x,y

    elif event == cv.EVENT_MOUSEMOVE:
        if drawing == True:
            if mode == True:
                cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
            else:
                cv.circle(img,(ix,iy),int(abs(math.sqrt((x-ix)**2+(y-iy)**2))),(0,0,255),-1)

    elif event == cv.EVENT_LBUTTONUP:
        drawing = False
        if mode == True:
            pass
            #cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
        else:
            pass
            #cv.circle(img,(x,y),5,(0,0,255),-1)

img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
 
while(1):
    cv.imshow('image',img)
    k = cv.waitKey(1) & 0xFF
    if k == ord('m'):
        mode = not mode
    elif k == 27:
        break
 
cv.destroyAllWindows()

4. TrackBar使用

使用cv.createTrackBar()创建控制条，输入参数为：控制条名称、窗口名称、初始值、最大值、回调函数
使用cv.getTrackBarPos()获取控制条当前位置，输入参数：控制条名称、窗口名称
OpenCV里面没有按钮，因此可以创建一个最大值为1的控制条，作为开关使用

举个例子，带有开关的调色板

import numpy as np
import cv2 as cv
 
def nothing(x):
    pass
 
# Create a black image, a window
img = np.zeros((300,512,3), np.uint8)
cv.namedWindow('image')
 
# create trackbars for color change
cv.createTrackbar('R','image',128,255,nothing)
 
cv.createTrackbar('G','image',0,255,nothing)
cv.createTrackbar('B','image',0,255,nothing)
 
# create switch for ON/OFF functionality
switch = '0 : OFF \n1 : ON'
cv.createTrackbar(switch, 'image',0,1,nothing)
 
while(1):
    cv.imshow('image',img)
    k = cv.waitKey(1) & 0xFF
    if k == 27:
        break

    # get current positions of four trackbars
    r = cv.getTrackbarPos('R','image')
    g = cv.getTrackbarPos('G','image')
    b = cv.getTrackbarPos('B','image')
    s = cv.getTrackbarPos(switch,'image')

    if s == 0:
        img[:] = 0
    else:
        img[:] = [b,g,r]
 
cv.destroyAllWindows()

更复杂一些的例子，使用控制条改变绘制的颜色画笔尺寸

import numpy as np
import cv2 as cv
import math
 
drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1
 
# mouse callback function
def draw_circle(event,x,y,flags,param):
    global ix,iy,drawing,mode
    r = cv.getTrackbarPos('r','image')
    g = cv.getTrackbarPos('g','image')
    b = cv.getTrackbarPos('b','image')
    b_size = cv.getTrackbarPos('brush_size','image')

    if event == cv.EVENT_LBUTTONDOWN:
        drawing = True
        ix,iy = x,y
        if mode == True:
            pass
        else:
            cv.circle(img,(x,y),b_size,(b,g,r),-1)

    elif event == cv.EVENT_MOUSEMOVE:
        if drawing == True:
            if mode == True:
                cv.rectangle(img,(ix,iy),(x,y),(b,g,r),-1)
            else:
                cv.circle(img,(ix,iy),int(abs(math.sqrt((x-ix)**2+(y-iy)**2))),(b,g,r),-1)

    elif event == cv.EVENT_LBUTTONUP:
        drawing = False
        if mode == True:
            pass
            #cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
        else:
            pass
            #cv.circle(img,(x,y),5,(0,0,255),-1)


def nothing(x):
    pass

img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)

cv.createTrackbar('r', 'image', 0, 255, nothing)
cv.createTrackbar('g', 'image', 0, 255, nothing)
cv.createTrackbar('b', 'image', 0, 255, nothing)
cv.createTrackbar('brush_size', 'image', 0, 100, nothing)
 
while(1):
    cv.imshow('image',img)
    k = cv.waitKey(1) & 0xFF

    if k == ord('m'):
        mode = not mode
    elif k == 27:
        break
 
cv.destroyAllWindows()

5. 像素/通道/边框操作

单像素操作首选array.item()和array.itemset()
通道操作直接用numpy切片选择
roi不是复制，是view，因此roi的修改会改变原图数据

import numpy as np
import cv2 as cv


img = cv.imread('bottle.png')

assert img is not None, "file could not be read, checks with os.path.exists()"

# 索引某个像素点
px = img[100,100]
print(px)

# 索引某个像素点的蓝色值
px_blue = img[100,100,0]
print(px_blue)

# 修改某个像素点的数值
img[100,100] = [255,255,255]

# 实际上使用Numpy的array.item()和array.itemset()做以上操作会更好
print(img.item(50,50,0))
img.itemset((50,50,2),255)

# 获取图像尺寸，可以通过有无通道数判断是否彩色，返回tuple （高，宽，通道数）
print(img.shape)

# 获取图像大小，即以上尺寸的乘积
print(img.size)

# 图像的数据类型，有很多错误就是由于数据类型不匹配造成的
print(img.dtype)

# ROI选择，将一个区域移动到另外一个区域
obj = img[280:340, 330:390]
obj[:,:,:] = 0 # obj是numpy的一个选择，没有copy，修改之会影响img
img[273:333, 100:160] = obj

# 将彩色图像按通道拆分、组合
b, g, r = cv.split(img)
img = cv.merge((b,g,r))

# 注意cv.split耗时，不如使用numpy切片实现
img[:,:,2] = 255

# 给图片加边，这在卷积模型中常用到，使用cv.copyMakeBorder(),输入：目标图、上、下、左、右宽、类型、颜色（常量类型）
# 类型分为：
# cv.BORDER_CONSTANT常量填充
# cv.BORDER_REFLECT镜像填充如：fedcba|abcdefgh|hgfedcb
# cv.BORDER_REFLECT_101镜像填充如：fedcb|abcdefgh|gfedcb
# cv.BORDER_REPLICATE最后元素重复如：aaaaaa|abcdefgh|hhhhhhh
# cv.BORDER_WRAP头尾相接填充如：cdefgh|abcdefgh|abcdefg
img = cv.copyMakeBorder(img,5,5,5,5,cv.BORDER_WRAP)


cv.namedWindow('image')
cv.imshow('image',img)

cv.waitKey(2000)

cv.destroyAllWindows()

6. 图像相加与混合

与普通相加的区别，使用cv.add()相加结果大于数据类型范围会设为最大值。

import numpy as np
import cv2 as cv

x = np.array(250, dtype=np.uint8)
y = np.array(10, dtype=np.uint8)

print(cv.add(x,y)) # x + y = 260 > 255(uint8) 因此 [[255]]
print(x+y)  # 260 % 256 = 4 因此 4

使用cv.addWighted()将两个图片混合起来，

import numpy as np
import cv2 as cv

img1 = cv.imread('ml.png')
img2 = cv.imread('opencv.png')
assert img1 is not None, "file could not be read, check with os.path.exists()"
assert img2 is not None, "file could not be read, check with os.path.exists()"

#  输入 图像1，系数1，图像2， 系数2，γ
# 系数1 + 系数2 = 1
# γ是添加到每个像素的标量
dst = cv.addWeighted(img1,0.6,img2,0.4,0)
 
cv.imshow('dst',dst)
cv.waitKey(0)
cv.destroyAllWindows()

按位操作 Bitwise Operations
以下是实现将一个logo扣出来贴到另一个图片的例子

import cv2 as cv
import numpy as np

# 读两张图片
img1 = cv.imread('bottle.png')
img2 = cv.imread('opencv.png')
assert img1 is not None, "file could not be read, check with os.path.exists()"
assert img2 is not None, "file could not be read, check with os.path.exists()"
 
# 准备让logo出现在图片的左上，所以创建一个ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]
 
# 做一个logo的mask，及非logo区域的mask
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask) # 按位非操作实现非logo区域mask
 
# 将ROI中logo的图像去掉
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)
cv.imshow('img1_bg',img1_bg)

# 从logo图片中只提取logo的部分
img2_fg = cv.bitwise_and(img2,img2,mask = mask)
cv.imshow('img2_fg',img2_fg)

# 将提取的logo图形与已经去掉图形的背景图相加
dst = cv.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst
 
cv.imshow('res',img1)
cv.waitKey(0)
cv.destroyAllWindows()

7. 性能检查与优化

使用cv.getTickCount()获取时钟周期，在要判断耗时的程序前后各获取一次，求差
使用cv.getTickFrequency()获取时钟频率，耗时（秒） = 周期数量 / 频率

import cv2 as cv

e1 = cv.getTickCount()  # 使用time也是一样的

print('do something')

e2 = cv.getTickCount()
t = (e2 - e1) / cv.getTickFrequency()

print(t)

性能优化

import cv2 as cv
#cv.setUseOptimized(False)  # 默认开了优化，可以手动开关
img = cv.imread('ml.png')
e1 = cv.getTickCount()

for i in range(5, 49, 2):
    img1 = cv.medianBlur(img, i)
e2 = cv.getTickCount()
t = (e2 - e1) / cv.getTickFrequency()
print(t)
print(cv.useOptimized())

使用IPython时可以用它的命令%timeit非常方便地对每行代码用时进行分析
创建数组、单个或两个元素运算时，python直接运算以及opencv算法都比numpy快

性能优化方面的思路：首先以简单的方式实现算法，一旦算法开始工作，分析找到其瓶颈优化之

尽可能避免在Python中使用循环
最大可能地将算法向量化，因为numpy和opencv都针对向量运算进行了优化
非必要不复制array，只使用其views
如果代码还是慢，考虑用Cython

附加资源

2024-03-19

OpenCV VideoWriter

环境

Windows 11
python 3.12.2
opencv-contrib-python 4.9.0.80

保存视频问题

使用VideoWriter保存视频的时候结果只有1kb，显示文件已损坏。

解决方案

import cv2 as cv

# 创建采集object
cap = cv.VideoCapture('video.mp4') # 传入-1默认摄像头，也可是摄像头序号0/1/2，或视频文件名称

frame_height = cap.get(3)
frame_width = cap.get(4)
size = (frame_width, frame_height)  # 使用VideoWriter时输出必须与输入同尺寸

writer = cv2.VideoWriter('out.mov', cv2.VideoWriter_fourcc(*'divx'), 25.5, size, isColor=True)
# 输出格式亲测.avi .mov .mp4可用
# fourcc参数divx和mp4v都可用
# 帧率可以是浮点数
# 尺寸是个与源视频同样大小的元组则可用，顺序为（宽，高）
# 如果对图像进行了灰度处理，必须修改isColor为False

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        print('can\'t recieve frame, exiting...')
        break

    #frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    out.write(frame)

    cv2.imshow('frame', frame)
    
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()
out.release()
cv2.destroyAllWindows()

2024-03-14

pyautogui

pyautogui可以控制鼠标移动和点击，可以用来实现一些自动化的操作，非常有趣。

先看注意事项

为避免失控，快速甩动鼠标到屏幕角落以强行退出程序。

一、控制鼠标

1. 安装pyautogui

在Windows的cmd窗口输入指令

1	pip install pyautogui

2. 确认安装正常

使用pyautogui.size()获取当前屏幕的宽高，返回是一个点对象。

1
2
3

import pyautogui
wh = pyautogui.size()
print(wh.width, wh.height)

3. 移动鼠标到绝对位置

使用pyautogui.moveTo()可将光标移至屏幕任意位置，传入三个参数，前两个分别为位置坐标x与y，第三个duration=用来指定这个动作的耗时。
先了解一下计算机显示器的像素坐标系，屏幕左上为原点，x轴指向右侧，y轴指向屏幕底端。

图1 分辨率为1920x1080的屏幕坐标系图片^[1]

举个例子，让鼠标自动画长方形。

import pyautogui
for i in range(10):
    pyautogui.moveTo(100, 100, duration=0.25)
    pyautogui.moveTo(100, 400, duration=0.25)
    pyautogui.moveTo(400, 400, duration=0.25)
    pyautogui.moveTo(400, 100, duration=0.25)

4. 移动鼠标到相对位置

使用pyautogui.move()，同样是三个参数，区别是坐标为相对位移坐标。
如果要获得鼠标当前位于何处，可以使用pyautogui.positioin()，没有参数，返回一个对象包含两个坐标值。

5. 控制鼠标点击

让鼠标在坐标（100，100）处完成一次左键单击，使用pyautogui.click()，传入的第三个参数button=用来指定左键、中键和右键。

1
2
3

import pyautogui
pyautogui.click(100, 100, button='left')    # 点击左键
pytutogui.click(100, 100, button='right')   # 点击右键

完成一次点击包括两个动作，也就是按下与释放。pyautogui.click()使用了默认的时间间隔。如果我们想要自定义按下和释放之间的保持时间，就使用pyautogui.mouseDown()来按下，用pyautogui.mouseUp()来释放，参数相同。

6. 拖动鼠标

按下后保持，再移动鼠标，可用pyautogui.dragTo()或者pyautogui.drag()实现，前者是绝对位置，后者为相对位置。
以下代码是用鼠标绘图的例子

import pyautogui
import time

time.sleep(5)   # 等待切换软件窗口
pyautogui.click(800, 600)   # 点击画幅内的一点作为起点

distance = 500
change = 20
while distance > 0:
    pyautogui.drag(distance, 0, duration=0.2)   # Move right.
    distance = distance - change
    pyautogui.drag(0, distance, duration=0.2)   # Move down.
    pyautogui.drag(-distance, 0, duration=0.2)  # Move left.
    distance = distance - change
    pyautogui.drag(0, -distance, duration=0.2)

事先打开一个绘图软件，此处以windows绘图软件为例，选择好画笔后回到IDE启动脚本。程序中有一个5s的延时，因此启动程序后5s内将实现准备好的绘图软件窗口最大化。

图2 pyautogui.drag()拖动鼠标绘制图形的结果

7. 控制滚轮

使用pyautogui.scroll()输入参数为整数滚动单位。

8. 获取屏幕上某一点的像素坐标

使用snipaste软件，启动在后台运行，需要时点击F1进入截图预览模式，此时鼠标所处的位置会有坐标值显示。
或者使用pyautogui.displayMousePosition()实时显示位置和颜色信息。

9. 获取屏幕图像信息

使用pyautogui.screenshot()获取全屏截图。使用pyautogui.pixel(x, y)获取某个像素点的颜色信息输入两个坐标值。
判断某个点是否与给定颜色一致，可用pyautogui.pixelMatchesColor(x, y, (R, G, B))，输入两个坐标值，和一个包含RGB信息的元组。
判断给定的图片在屏幕上的哪个位置，可用pyautogui.locateOnScreen('img.jpg')，传入参数为图片路径，返回xywh元组，可能有多个。如果想要点击这个区域的中心，将该元组传入pyautogui.click((x, y, w, h))即可。甚至可以直接这样pyautogui.click('img.jpg')实现查找和点击，但是有可能不成功，需和try以及except一起用。

10. 获取窗口信息

import pyautogui
fw = pyautogui.getActiveWindow() # 获取活动的窗口信息
print(str(fw))  # <Win32Window left="-12", top="-12", width="2584", height="1540", title="test.py - Visual Studio Code">

pyautogui.getAllWindows() # 返回一个list，包含所有的窗口信息

pyautogui.getWindowsAt(x, y) # 输入xy坐标，返回包含坐标的所有窗口

pyautogui.getWindowsWithTitle(title) # 输入title，返回对应的窗口

pyautogui.getAllTitles() # 获取所有的串口标题，返回是一个list，元素为string

对获取到的窗口可以进行一些操作

fw.width = 1000 # 将窗口宽度调整为1000像素

fw.topleft = (200, 200) # 将窗口左上角移动到指定位置

print(fw.isMaximized) # 判断窗口是否为最大化

fw.maximize() # 最大化窗口

fw.minimize() # 最小化窗口

fw.restore()  # 恢复最大化/最小化操作

完整的使用方法，见官方文档https://pyautogui.readthedocs.io/

二、控制键盘

1. 输入内容

首先通过点击确定文字输入区域，再用pyautogui.write()输入。

import pyautogui
pyautogui.click(1080,1400)
pyautogui.write('hello world!') # pyautogui输入'!'时会自动按下shift键

# 使用list传入时可以将键盘上的所有键以名称方式描述，如'left'表←键
pyautogui.write(['a', 'b', 'left', 'left', 'X', 'Y']) # 按顺序按下a、b、←、←、shift+x、shift+y 结果 XYab

键盘对应的内容表格如下

键盘关键字	含义
‘a’, ‘b’, ‘c’, ‘A’, ‘B’, ‘C’, ‘1’, ‘2’, ‘3’, ‘!’, ‘@’, ‘#’等	单字符按键
‘enter’ (或 ‘return’ 或 ‘\n’)	回车键
‘esc’	ESC键
‘shiftleft’, ‘shiftright’	左右SHIFT键
‘altleft’, ‘altright’	左右ALT键
‘ctrlleft’, ‘ctrlright’	左右CTRL键
‘tab’ (或 ‘\t’)	TAB键
‘backspace’, ‘delete’	BACKSPACE 和 DELETE键
‘pageup’, ‘pagedown’	PAGE UP 和 PAGE DOWN 键
‘home’, ‘end’	HOME 和 END 键
‘up’, ‘down’, ‘left’, ‘right’	up, down, left 和 right 箭头键
‘f1’, ‘f2’, ‘f3’, 等	F1 到 F12 键
‘volumemute’, ‘volumedown’, ‘volumeup’	静音，音量减，音量加键（有些键盘没有这些键，但是你的电脑可以接收这些指令）
‘pause’	PAUSE键
‘capslock’, ‘numlock’, ‘scrolllock’	CAPS LOCK, NUM LOCK, 和 SCROLL LOCK 键
‘insert’	INS 或 INSERT 键
‘printscreen’	PRTSC 或 PRINT SCREEN 键
‘winleft’, ‘winright’	左右WIN键(Windows)
‘command’	Command键(macOS)
‘option’	OPTION键(on macOS)

2. 热键组合

与鼠标相似，我们可以用pyautogui.keyDown()和pyautogui.keyUp()来分开控制键盘的按下与释放。
如在windows上实现一次复制粘贴，可以如下操作

pyautogui.keyDown('ctrl')
pyautogui.keyDown('c')
pyautogui.keyUp('c')
pyautogui.keyUp('ctrl')

但是上面的操作未免有些复杂了，同样的操作可以这样

1	pyautogui.hotkey('ctrl', 'c')

3. 使用提示

屏幕分辨率保持不便
应用窗口应该最大化，因为这样才能保证按钮始终在同一个位置
根据软件反应速度多给一些延时，你不会想在上一个动作没有完成就开始点击
使用locateOnScreen()寻找按钮，尽量不依赖xy坐标，在没发现目标时停止比乱点击要好
使用getWindowsWithTitle()确保你准备控制的应用程序窗口是存在的，用activate()让窗口进入前台
增加尽可能多的检查，如弹窗、断网的情况怎么处理
第一次运行时要完整地观察是否正常

给延时的两个例子

import time
time.sleep(3)   # 等待3s

import pyautogui
pyautogui.countdown(10) # 等待10s，在命令行倒数输出10、9、8...

另外可以用pyautogui.alert('text')或pyautogui.confirm('text')来弹窗确认。如果需要使用到剪贴板里面的内容，需要用到import pyperclip然后pyperclip.paste()

关于计算机道德
为善去恶，学会这项技术的目的应该是提高自己的生产效率，勿做有损他人的事情。

引用

[1] Automate The Boring Stuff, Al Sweigart

2024-03-11

你好世界！

欢迎来到黄河水澄的技术专栏！
这里会分享我学习过程中的一些技术笔记，目的是督促自己学习、分享有用的知识。