OpenCV 特征匹配

1. 暴力匹配

暴力匹配使用一些距离计算两组特征描述之间的匹配度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

img1 = cv.imread('images/box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage

# 使用 ORB
# Initiate ORB detector
orb = cv.ORB_create()

# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)

# create BFMatcher object
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)

# Match descriptors.
matches = bf.match(des1,des2)

# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)

# Draw first 10 matches.
img3 = cv.drawMatches(img1,kp1,img2,kp2,matches[:15],None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

plt.imshow(img3),plt.title('ORB'),plt.show()

# 使用 SIFT
# Initiate SIFT detector
sift = cv.SIFT_create()

# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)

# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1,des2,k=2)

# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.35*n.distance:
good.append([m])

# cv.drawMatchesKnn expects list of lists as matches.
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

plt.imshow(img3),plt.title('SIFT'),plt.show()

2. 基于FLANN的匹配

FLANN是近似最近邻的快速库.包含了针对大型高维特征快速最近邻搜索优化的算法集.在大数据集上比BFMatcher更好.
使用FLANN,需要传入索引字典IndexParams描述所用算法及其参数.如需要使用SIFT/SURF等,传入如下内容

1
2
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)

而使用ORB,使用如下,参数要自己调整,文档中建议的不一定最符合实际使用

1
2
3
4
5
FLANN_INDEX_LSH = 6
index_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2

第二个字典是SearchParams,定义应该递归遍历索引中的树的次数,这个值越高精度越高,但是同样会消耗更多的时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

img1 = cv.imread('images/box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage

# Initiate SIFT detector
sift = cv.SIFT_create()

# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)

# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary

flann = cv.FlannBasedMatcher(index_params,search_params)

matches = flann.knnMatch(des1,des2,k=2)

# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in range(len(matches))]

# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]

draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = cv.DrawMatchesFlags_DEFAULT)

img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)

plt.imshow(img3,),plt.show()

3. 特征匹配与单应实现物体查找

前述的匹配是在另一张图片中寻找目标物体的一些部分.使用cv.findHomography(),传入两图片的点,可以找到物体在两个图片中的视角转换. 然后可以用cv.perspectiveTransform找到目标.至少需要4个正确的点来找到这个转换矩阵.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

MIN_MATCH_COUNT = 10

img1 = cv.imread('images/box.png', cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('images/box_in_scene.png', cv.IMREAD_GRAYSCALE) # trainImage

# Initiate SIFT detector
sift = cv.SIFT_create()

# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)

FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)

flann = cv.FlannBasedMatcher(index_params, search_params)

matches = flann.knnMatch(des1,des2,k=2)

# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)

if len(good)>MIN_MATCH_COUNT:
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
matchesMask = mask.ravel().tolist()

h,w = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv.perspectiveTransform(pts,M)

img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)

else:
print( "Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT) )
matchesMask = None

draw_params = dict(matchColor = (0,255,0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)

img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)

plt.imshow(img3, 'gray'),plt.show()

OpenCV相机畸变校准

理论

相机包含了径向畸变和切向畸变。
径向畸变可以让实际中的直线在图像中弯曲,这种效应离图像中心越远越强烈。
径向畸变表示为

切向畸变来自于镜片与传感器之间的平行度误差,导致有一些区域图像看起来比实际要近。 切向畸变表示为
因此需要找到如下畸变系数

此外,我们还需要获取相机的内参和外参。内参包括了焦距(f_x, f_y)和光学中心(c_x, c_y),可用来创建一个相机矩阵。而相机矩阵也是消除一个相机畸变需要的。相机矩阵是相机固有的属性,一旦求得,可以复用到同一相机的所有图片。

外参对应了将3D点转换到一个坐标系的平移向量和旋转向量

通常在立体视觉应用中,校正镜头畸变是必须的事情。校正的原理是,提供一些完好定义的样本图片(例如,棋盘图、圆点图),已知其上特征点的真实相对坐标,也知道对应点在图像上的坐标,就可以计算出来畸变系数。至少提供10张样本图片以确保好的效果。

代码

相机校正需要的输入是一系列3D真实点坐标和对应的2D图像坐标。在图像中找到2D坐标没有任何问题。但真实3D点坐标有点难了。为了简化,认为棋盘格都是在XY平面固定的,这样Z全是0,事情好办了起来。
在代码中,3D点是object points, 2D点是image points

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import cv2 as cv
import glob

criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)

objp = np.zeros((6*9, 3), np.float32)
objp[:,:2] = np.mgrid[0:9, 0:6].T.reshape(-1,2)

objpoints = [] # 真实世界的3D坐标
imgpoints = [] # 图像中的2D坐标

images = glob.glob('chessboard/*.jpg')

for fname in images:
img = cv.imread(fname)
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# 寻找棋盘的角点
ret, corners = cv.findChessboardCorners(gray, (9,6),None)

# 如果找到了,加入目标点、图像点
if ret == True:
objpoints.append(objp)
corners2 = cv.cornerSubPix(gray, corners, (11,11),(-1,-1),criteria)
imgpoints.append(corners2)

cv.drawChessboardCorners(img,(9,6),corners2,ret)
cv.imshow('img', img)
cv.waitKey(500)

# 校正 返回:ret、相机矩阵、扭曲系数、旋转向量s、平移向量s
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# 存储为npz文件,便于读取使用
np.savez('cameracalib',mtx=mtx, dist=dist, rvecs=rvecs, tvecs=tvecs)
calib_file = np.load('cameracalib.npz')
print(calib_file['mtx'])

img = cv.imread('chessboard/left12.jpg')
h, w = img.shape[:2]
newcameramtx, roi = cv.getOptimalNewCameraMatrix(calib_file['mtx'], calib_file['dist'], (w,h), 1, (w,h))

# 消除畸变
dst = cv.undistort(img,mtx, dist, None, newcameramtx)
x,y,w,h = roi
dst1 = dst[y:y+h, x:x+w]
cv.imshow('ds1',dst1)
cv.imwrite('calibresult.png', dst1)

# 另一种方法消除畸变
mapx, mapy = cv.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w,h), 5)
dst2 = cv.remap(img, mapx, mapy, cv.INTER_LINEAR)
dst2 = dst2[y:y+h, x:x+w]
cv.imshow('ds2',dst2)

cv.waitKey(0)
cv.destroyAllWindows()


OpenCV 特征提取与描述

1. 什么是特征

图像中的小区域,向周围小范围移动时变化最大,即特征。寻找到这些特征的过程叫做特征检测(Feature Detection)。
比如一个白色背景的矩形图像,位于四个角落的小区域是特征点,位于边线上的次要特征,位于纯色区域的没有特征

2. Harris 角落检测

寻找在全方向的(u,v)位移的亮度变化,形成函数,使用泰勒展开,推为矩阵M。创建一个分数R,取决于矩阵M的两个特征值之间的相对大小关系,判断为平区域、边界、角点

R = det(M) - k(trace(M))2

其中

  • det(M) = λ1λ2
  • trace(M) = λ1 + λ2
  • λ1和λ2是M的特征值

代码使用cv.cornerHarris()

  • img -输入图像,灰度float32
  • blockSize -角点检测考虑的邻域大小
  • kSize -Sobel 微分使用的Aperture参数
  • k -Harris检测公式中的自由参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import cv2 as cv

filename = 'calibresult.png'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.1)

#result is dilated for marking the corners, not important
dst = cv.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]

cv.imshow('dst',img)
if cv.waitKey(0) & 0xff == 27:
cv.destroyAllWindows()

如果要获得亚像素精度的焦点,使用cv.cornerSubPix()

1
2
3
4
5
6
# find centroids
ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)

# define the criteria to stop and refine the corners
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)

3. Shi-Tomasi 角点检测 Good Features to Track

将分数定义为了

R = min(λ1, λ2)

若R大于某个阈值,则认为是角点。

使用cv.goodFeaturesToTrack()

  • 输入图像
  • 需要寻找的角点数目
  • 0-1之间的质量等级
  • 角点之间的最小欧式距离
    1
    2
    3
    4
    5
    6
    corners = cv.goodFeaturesToTrack(gray,250,0.01,20)
    corners = np.int0(corners)

    for i in corners:
    x,y = i.ravel()
    cv.circle(img,(x,y),3,255,-1)

4. SIFT 尺度不变特征转换

小窗口中的角点图被放大后,用同样大的窗口观察看起来变得平滑了。SIFT中,分别进行尺度空间极值检测、关键点定位、方向赋值、关键点描述、关键点匹配。这个算法在2020年专利已经到期,可放心使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

filename = 'images/home.jpg'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

sift = cv.SIFT_create()
kp = sift.detect(gray,None)

img=cv.drawKeypoints(gray,kp,img)

cv.imwrite('sift_keypoints.jpg',img)

sift.detect()可以输入mask指定寻找区域
cv.drawKeyPoints()用于绘制关键点的圆圈,如果传入flag=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS,会绘制关键点直径的圆圈并显示其方向。

1
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

计算特征描述(Descriptor)

  1. 已知关键点kp,用sift.compute(),``kp, des = sift.compute(gray, kp)
  2. 一步到位,用sift.detectdAndCompute()
    1
    2
    sift = cv.SIFT_create()
    kp, des = sift.detectAndCompute(gray,None)
    kp是关键点列表,des是numpy数组形状是(关键点数)*128

通常获得了关键点和描述,我们就可以在之后的操作中匹配不同图片中的关键点了。

5. SURF 快速鲁棒特征

SURF在每个步骤增加了很多特征,达到同样效果比SIFT快3倍,适用于带有模糊和旋转的图片,但不适用视角转变和光线变化的情况。

如何在OpenCV中还处于专利保护阶段,要想使用,需要卸载当前高版本,重新安装opencv-contrib-python==3.4.2.17

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# 寻找SURF关键点和描述符并绘制
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('images/fly.png', 0)

# 创建SURF对象,可以在创建时指定参数也可以稍后设置参数
# 此处设置 Hessian阈值为 400
sift = cv2.xfeatures2d.SIFT_create()
print('sift: ', sift)
surf = cv2.xfeatures2d.SURF_create(400)
print('surf: ', surf,
' \ndefaultParameter\thessianThreshold: ', surf.getHessianThreshold(),
' upright: ', surf.getUpright(),
' extended: ', surf.getExtended(),
' descriptors: ',surf.descriptorSize())

# 寻找SURF关键点和描述符
# kp:返回的关键点列表,des:numpy数组
kp, des = surf.detectAndCompute(img, None)
# 绘制关键点在图片上
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)
plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('more keypoints'), plt.show()

print('keypoints: ', len(kp))

# 检查当前Hessian阈值
# print(surf.getHessianThreshold())

# 调整Hessian阈值,此处设置为50000,但一般最佳设置为300~500
surf.setHessianThreshold(50000)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
' upright: ', surf.getUpright(),
' extended: ', surf.getExtended(),
' descriptors: ',surf.descriptorSize())

# 再一次计算关键点和描述符
kp, des = surf.detectAndCompute(img, None)

print('keypoints: ', len(kp))

# 绘制关键点在图片上
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('less than 50 keypoints'), plt.show()

# U-SURF不会计算方向
# print(surf.getUpright())
surf.setUpright(True)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
' upright: ', surf.getUpright(),
' extended: ', surf.getExtended(),
' descriptors: ',surf.descriptorSize())

# 重新计算关键点和描述符,并绘制
kp = surf.detect(img, None)
print('keypoints: ', len(kp))
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('U-SURF'), plt.show()

# 所有方向显示在同一方向,它比以前快多了。如果您正在处理方向不成问题的情况(如全景缝合)等,使用U-SURF会更好。
# 寻找描述符的大小
# print(surf.descriptorSize())
# extended为false,默认为64D
# print(surf.getExtended())

# 设置描述符为128D
surf.setExtended(True)
print(' parameters\thessianThreshold: ', surf.getHessianThreshold(),
' upright: ', surf.getUpright(),
' extended: ', surf.getExtended(),
' descriptors: ',surf.descriptorSize())

kp, des = surf.detectAndCompute(img, None)
print('keypoints: ',len(kp))
img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4)

plt.imshow(img2), plt.xticks([]), plt.yticks([]), plt.title('128D res'), plt.show()

代码来自这

6. FAST 快速角点检测方法

比前几种方法快几倍,但对高噪音不鲁棒。有一个阈值参数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('images/blox.jpg', cv.IMREAD_GRAYSCALE) # `<opencv_root>/samples/data/blox.jpg`

# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create()

# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))

# Print all default params
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )

cv.imwrite('fast_true.png', img2)

# Disable nonmaxSuppression
fast.setNonmaxSuppression(0)
kp = fast.detect(img, None)

print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )

img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))

cv.imwrite('fast_false.png', img3)

7. BRIEF 二元鲁棒独立基本特征

一种更快的特征描述与匹配方法,需要使用其他的方法检测到关键点,适用于CenSurE(STAR)方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('images/aero1.jpg', cv.IMREAD_GRAYSCALE)

# Initiate FAST detector
fast = cv.xfeatures2d.StarDetector_create()

# Initiate BRIEF extractor
brief = cv.xfeatures2d.BriefDescriptorExtractor_create()

# find the keypoints with STAR
kp = fast.detect(img,None)


# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)

print( brief.descriptorSize() )
print( des.shape )

8. ORB(Oriented FAST and Rotated BRIEF)

没有专利,安全使用,更快更好。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('images/blox.jpg', cv.IMREAD_GRAYSCALE)

# Initiate ORB detector
orb = cv.ORB_create()

# find the keypoints with ORB
kp = orb.detect(img,None)

# compute the descriptors with ORB
kp, des = orb.compute(img, kp)

# draw only keypoints location,not size and orientation
img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0)
plt.imshow(img2), plt.show()

具体使用直接看官方教程

OpenCV 图像处理

1. 更改图像颜色空间

使用cv.cvtColor(),输入:图像,方法(比如cv.COLOR_BGR2HSV)

1
hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)
2. 几何转换操作

调整图像大小

1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
import cv2 as cv

img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"

res = cv.resize(img,None,fx=2, fy=2, interpolation = cv.INTER_CUBIC)

#OR

height, width = img.shape[:2]
res = cv.resize(img,(2*width, 2*height), interpolation = cv.INTER_CUBIC)

使用cv.warpAffine()平移图片;
使用cv.getRotationMatrix2D()获取2x3旋转矩阵,如旋转90度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
import cv2 as cv

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
rows,cols = img.shape

#M = np.float32([[1,0,100],[0,1,50]]) # 平移矩阵
M = cv.getRotationMatrix2D(((cols-1)/2.0,(rows-1)/2.0),90,1) # 旋转矩阵
dst = cv.warpAffine(img,M,(cols,rows)) # 输入:图像,转换矩阵,尺寸(宽,高)

cv.imshow('img',dst)
cv.waitKey(0)
cv.destroyAllWindows()
3. 仿射变换

在仿射变换中,原图中平行线在变换后仍然平行。
方法为使用cv.getAffineTransform()获取2x3的转换矩阵后传入cv.warpAffine()
生成操作矩阵需要3组对应坐标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"
rows,cols,ch = img.shape

pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])

M = cv.getAffineTransform(pts1,pts2)

dst = cv.warpAffine(img,M,(cols,rows))

plt.subplot(121),plt.imshow(img),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
4. 视角变换

视角变换后直线还是直线。变换矩阵是3x3的,需要4组对应已知的坐标,其中不能存在3点共线,这样便能使用cv.getPerspectiveTransform()获得转换矩阵,然后矩阵传入cv.warpPespective()即可

1
2
3
4
5
6
pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])

M = cv.getPerspectiveTransform(pts1,pts2)

dst = cv.warpPerspective(img,M,(300,300))
5. 阈值处理

使用cv.threshold(),输入:图像(灰度),阈值,最大值(超阈值的设定为此),方法
返回:使用阈值, 阈值处理的图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
ret,thresh1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
ret,thresh2 = cv.threshold(img,127,255,cv.THRESH_BINARY_INV)
ret,thresh3 = cv.threshold(img,127,255,cv.THRESH_TRUNC)
ret,thresh4 = cv.threshold(img,127,255,cv.THRESH_TOZERO)
ret,thresh5 = cv.threshold(img,127,255,cv.THRESH_TOZERO_INV)

titles = ['Original Image','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]

for i in range(6):
plt.subplot(2,3,i+1),plt.imshow(images[i],'gray',vmin=0,vmax=255)
plt.title(titles[i])
plt.xticks([]),plt.yticks([])

plt.show()

自适应阈值
cv.ADAPTIVE_THRESH_MEAN_C 阈值 = 邻域均值 - C
cv.ADAPTIVE_THRESH_GAUSSIAN_C 阈值 = 邻域高斯加权和均值 - C
blockSize决定领域大小,C是从邻域均值或加权均值中减去的常数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img = cv.medianBlur(img,5)

ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)

titles = ['Original Image', 'Global Thresholding (v = 127)',
'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [img, th1, th2, th3]

for i in range(4):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()

大津二值化 Otsu’s Binarization
可以不需要选择一个数值作为阈值,从图像直方图确定全局最优阈值,该值通过最小化加权组内方差求得

1
2
# Otsu's thresholding
ret2,th2 = cv.threshold(img,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
6. 图像平滑(卷积)

cv.filter2D()实现卷积

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"

kernel = np.ones((5,5),np.float32)/25 # 平均kernel
dst = cv.filter2D(img,-1,kernel)

blr = cv.blur()

plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.show()

图像过滤模糊的一些常用函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('ml.png')
assert img is not None, "file could not be read, check with os.path.exists()"

kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)

blr = cv.blur(img,(5,5))

g_blr = cv.GaussianBlur(img, (5,5),0) # 高斯模糊:最后参数σx和σy

m_blr = cv.medianBlur(img, 5) # 有效消除椒盐噪声

bi_blr = cv.bilateralFilter(img,9, 75, 75) # 由一个空间高斯方程和一个像素差异方程组成,可以实现去纹理而存边界

plt.subplot(321),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(322),plt.imshow(dst),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.subplot(323),plt.imshow(blr),plt.title('Bluring')
plt.xticks([]), plt.yticks([])
plt.subplot(324),plt.imshow(g_blr),plt.title('gaussian')
plt.xticks([]), plt.yticks([])
plt.subplot(325),plt.imshow(m_blr),plt.title('medianBlur')
plt.xticks([]), plt.yticks([])
plt.subplot(326),plt.imshow(bi_blr),plt.title('Bilaterial filtering')
plt.xticks([]), plt.yticks([])

plt.show()
7. 形态学操作

腐蚀:使用一个kernel扫过二值图片每一个角落,其下所有像素都1时候保留中心位置在原图对应的像素,否则被“腐蚀”为0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import cv2 as cv
import numpy as np

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
kernel = np.ones((5,5),np.uint8)
erosion = cv.erode(img,kernel,iterations = 1) # 腐蚀
dilation = cv.dilate(img,kernel,iterations = 1) # 膨胀:但凡kernel下面有个数值都将kernel中心位置像素设为1

# 开运算 腐蚀后膨胀 可用于去噪
opening = cv.morphologyEx(img, cv.MORPH_OPEN, kernel)

# 闭运算 膨胀后腐蚀 用于消除形状内部的小孔
closing = cv.morphologyEx(img, cv.MORPH_CLOSE, kernel)

# 梯度 膨胀与腐蚀的差异 结果看起来像图形的外轮廓
gradient = cv.morphologyEx(img, cv.MORPH_GRADIENT, kernel)

# Top Hat 输入图像与开运算的差异
tophat = cv.morphologyEx(img, cv.MORPH_TOPHAT, kernel)

# Black Hat 输入图像与闭运算的差异
blackhat = cv.morphologyEx(img, cv.MORPH_BLACKHAT, kernel)

使用cv.getStructuringElement()生成结构化kernel,输入形状和尺寸元组即可

1
2
3
4
kernel = cv.getStructuringElement(cv.MORPH_RECT,(5,5))  # 矩形的
print(kernel)
cv.getStructuringElement(cv.MORPH_ELLIPSE,(5,5)) # 椭圆形的
cv.getStructuringElement(cv.MORPH_CROSS,(5,5)) # 十字形的
8. 图像梯度
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"

laplacian = cv.Laplacian(img,cv.CV_64F)
sobelx = cv.Sobel(img,cv.CV_64F,1,0,ksize=5)
sobely = cv.Sobel(img,cv.CV_64F,0,1,ksize=5)

plt.subplot(2,2,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,2),plt.imshow(laplacian,cmap = 'gray')
plt.title('Laplacian'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,3),plt.imshow(sobelx,cmap = 'gray')
plt.title('Sobel X'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,4),plt.imshow(sobely,cmap = 'gray')
plt.title('Sobel Y'), plt.xticks([]), plt.yticks([])

plt.show()

特别注意:将输出的数据类型设定高一些,如cv.CV_64F等,再转为cv.CV_8U,可避免丢失信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('opencv.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"

# Output dtype = cv.CV_8U
#sobelx8u = cv.Sobel(img,cv.CV_8U,1,0,ksize=3)
sobelx8u = cv.Laplacian(img,cv.CV_8U)

# Output dtype = cv.CV_64F. Then take its absolute and convert to cv.CV_8U
#sobelx64f = cv.Sobel(img,cv.CV_64F,1,0,ksize=3)
sobelx64f = cv.Laplacian(img,cv.CV_64F)
abs_sobel64f = np.absolute(sobelx64f)
sobel_8u = np.uint8(abs_sobel64f)

plt.subplot(1,3,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,2),plt.imshow(sobelx8u,cmap = 'gray')
plt.title('Sobel CV_8U'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,3),plt.imshow(sobel_8u,cmap = 'gray')
plt.title('Sobel abs(CV_64F)'), plt.xticks([]), plt.yticks([])

plt.show()
9. 图像金字塔
1
2
3
lower_reso = cv.pyrDown(higher_reso)    # 下取样

higher_reso2 = cv.pyrUp(lower_reso) # 上取样

使用金字塔操作混合两个图片的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import cv2 as cv
import numpy as np,sys

A = cv.imread('opencv.png')
B = cv.imread('opencv_white.png')
assert A is not None, "file could not be read, check with os.path.exists()"
assert B is not None, "file could not be read, check with os.path.exists()"

# generate Gaussian pyramid for A
G = A.copy()
gpA = [G]
for i in range(6):
G = cv.pyrDown(G)
gpA.append(G)

# generate Gaussian pyramid for B
G = B.copy()
gpB = [G]
for i in range(6):
G = cv.pyrDown(G)
gpB.append(G)

# generate Laplacian Pyramid for A
lpA = [gpA[5]]
for i in range(5,0,-1):
GE = cv.pyrUp(gpA[i])
L = gpA[i-1] - GE
lpA.append(L)

# generate Laplacian Pyramid for B
lpB = [gpB[5]]
for i in range(5,0,-1):
GE = cv.pyrUp(gpB[i])
L = gpB[i-1] - GE
lpB.append(L)

# Now add left and right halves of images in each level
LS = []
for la,lb in zip(lpA,lpB):
rows,cols,dpt = la.shape
ls = np.hstack((la[:,0:cols//2], lb[:,cols//2:]))
LS.append(ls)

# now reconstruct
ls_ = LS[0]
for i in range(1,6):
ls_ = cv.pyrUp(ls_)
ls_ = cv.add(ls_, LS[i])

# image with direct connecting each half
real = np.hstack((A[:,:cols//2],B[:,cols//2:]))

cv.imshow('Pyramid_blending2.jpg',ls_)
cv.imshow('Direct_blending.jpg',real)

cv.waitKey(0)
10. Canny 边缘检测

求梯度,边缘与梯度法线垂直,判断边缘点在梯度方向是否为局部最大值,是则保留,否则置零(非极大值抑制);下一步,小于最小值不认为是边缘,大于最大值是确定边缘,位于其间而又与确定边缘相连的也予以保留,否则舍弃。
使用cv.Canny(),输入:图像,最小值,最大值,用于算梯度的Sobel kernel 尺寸(默认3),L2gradient(True 全面公式较精确,False 默认的简化公式)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
edges = cv.Canny(img,100,180)

plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])

plt.show()
11. 轮廓检测

使用cv.findCountours()获取轮廓,输入:源图,轮廓模式,近似方法
源图只能是二值图,白色为目标黑色为背景
近似方法用cv.CHAIN_APPROX_NONE将保存轮廓所有点,而用cv.CHAIN_APPROX_SIMPLE可以将轮廓用少量数据描述,如矩形仅用4个焦点等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)

# 绘制Contours或者通过提供的边界点绘图
cv.drawContours(im, contours, -1, (0,255,0), 2)
# 输入:源图,轮廓,轮廓索引(-1全部),颜色,线宽

cnt = countours[2]

M = cv.moments(cnt) # 求轮廓moments

# 求轮廓质心
c_x = M['m10'] / M['m00']
c_y = M['m01'] / M['m00']

area = cv.contourArea(cnt) # 求轮廓面积,即M['m00']

# 求轮廓周长
perimeter = cv.arcLength(cnt,True) # 输入:轮廓,轮廓是否闭合(True闭合)

# 轮廓近似,根据指定的精度要求使用更少的顶点近似轮廓。
epsilon = 0.1*cv.arcLength(cnt,True) # 精度参数:从轮廓到模拟轮廓的最大允许距离
approx = cv.approxPolyDP(cnt,epsilon,True)

凸包 Convex Hull
凸曲线都是往外凸的,只少也是平的。cv.convexHull用来修复凸性缺陷(内凹情况)

1
hull = cv.convexHull(points[, hull[, clockwise[, returnPoints]]])
  • points 传入的轮廓
  • hull 输出,通常免去
  • clockwise 方向标志 True顺时针
  • returnPoints 默认True返回凸点坐标,而False返回凸点坐标对应的轮廓点索引
    实际使用hull = cv.convexHull(cnt)就行了
1
print(cv.isContourConvex(cnt))  # 检查轮廓凸性

轮廓的直边界框

1
2
x, y, w, h = cv.boundingRect(cnt)
cv.rectangle(im, (x,y),(x+w, y+h),(0,0,255),1)

轮廓的旋转边界宽
使用cv.minAreaRect()绘制轮廓的最小面积边界框,返回(中心坐标(x,y),(宽, 高),旋转角度)。传入cv.boxPoints()绘制矩形

1
2
3
4
rect = cv.minAreaRect(cnt)
box = cv.boxPoints(rect)
box = np.int0(box)
cv.drawContours(im,[box],0,(0,0,255),2)

最小封闭圆
完全包括轮廓的最小圆

1
2
3
4
(x,y),radius = cv.minEnclosingCircle(cnt)
center = (int(x),int(y))
radius = int(radius)
cv.circle(im,center,radius,(0,255,0),2)

拟合椭圆

1
2
ellipse = cv.fitEllipse(cnt)
cv.ellipse(im,ellipse,(0,255,0),2)

拟合直线

1
2
3
4
5
rows,cols = img.shape[:2]
[vx,vy,x,y] = cv.fitLine(cnt, cv.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((cols-x)*vy/vx)+y)
cv.line(img,(cols-1,righty),(0,lefty),(0,255,0),2)
12. 轮廓属性
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 宽高比 Aspect_ratio
x,y,w,h = cv.boundingRect(cnt)
aspect_ratio = float(w)/h

# 轮廓占边界框面积比 Extent
area = cv.contourArea(cnt)
x,y,w,h = cv.boundingRect(cnt)
rect_area = w*h
extent = float(area)/rect_area

# 坚固性 Solidity
area = cv.contourArea(cnt)
hull = cv.convexHull(cnt)
hull_area = cv.contourArea(hull)
solidity = float(area)/hull_area

# 当量直径 Equivalent Diameter 与轮廓面积相同圆的直径
area = cv.contourArea(cnt)
equi_diameter = np.sqrt(4*area/np.pi)

# 方向 Orientation
(x,y),(MA,ma),angle = cv.fitEllipse(cnt)

# Mask
mask = np.zeros(imgray.shape,np.uint8)
cv.drawContours(mask,[cnt],0,255,-1)
pixelpoints = np.transpose(np.nonzero(mask))
#pixelpoints = cv.findNonZero(mask)

# 最大值最小值及其位置
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(imgray,mask = mask)

# 平均颜色或亮度
mean_val = cv.mean(im,mask = mask)

# 极值点
leftmost = tuple(cnt[cnt[:,:,0].argmin()][0])
rightmost = tuple(cnt[cnt[:,:,0].argmax()][0])
topmost = tuple(cnt[cnt[:,:,1].argmin()][0])
bottommost = tuple(cnt[cnt[:,:,1].argmax()][0])

# 多边形测试 返回点到多边形的最短距离,为正在轮廓内,为0在轮廓上,为负在轮廓外,第三参数True返回距离,False返回正负1(速度快2-3x)
dist = cv.pointPolygonTest(cnt,(50,50),True)

# 轮廓匹配 可用于OCR
ret, thresh = cv.threshold(img1, 127, 255,0)
ret, thresh2 = cv.threshold(img2, 127, 255,0)
contours,hierarchy = cv.findContours(thresh,2,1)
cnt1 = contours[0]
contours,hierarchy = cv.findContours(thresh2,2,1)
cnt2 = contours[0]

ret = cv.matchShapes(cnt1,cnt2,1,0.0)
print( ret )
13. 轮廓的层级

我们在使用查找轮廓的时候返回了一个hierarchy,即轮廓可能在另一轮廓之内的这种父子关系在opencv中的表达
** [Next, Previous, First_Child, Parent] **
Next 表示同一层级的下一个轮廓
Previous 同一层级的上一个轮廓
First_Child 第一个子轮廓
Parent 父轮廓

注:如果没有父子轮廓,该位置设为-1

轮廓检索模式
RETR_LIST直接生成所有轮廓,无父子关系
RETR_EXTERNAL只要最外轮廓
RETR_CCOMP 排成2级,外部轮廓为层级1,孔洞轮廓为层级2
RETR_TREE 完整的层级

14. 直方图

直方图的x轴是亮度,从0到255的(在8bit图像下,可以更改),纵高是每一亮度像素的总数量。直方图分析只用灰度图
直方图常用术语如下:
BINS 用来指定x轴上有多少个区间,如256,或者16
DIMS 采集数据的维度,如1
RANGE 需要采集数据的范围,通常[0,256]

使用OpenCV获取直方图
cv.calcHist(images, channels, mask, histSize, ranges[, hist[, accumulate]])

  1. images : uint8或float32的源图,用方括号包含,如 “[img]”。
  2. channels : 准备计算的通道的索引,用方括号包含,灰度图可用[0]。彩色图可用[0], [1] 或 [2] 分别计算蓝色、绿色、红色通道的直方图。
  3. mask : 掩膜图,计算全图直方图时候置”None”即可,否则创建一个mask放到这里。
  4. histSize : 用方括号包含的BINS,如 [256]。
  5. ranges : 范围,通常 [0,256]。
    1
    2
    3
    img = cv.imread('ml.png', cv.IMREAD_GRAYSCALE)
    assert img is not None, "file could not be read, check with os.path.exists()"
    hist = cv.calcHist([img],[0],None,[16],[0,256])

使用Numpy获取直方图,OpenCV比之快40x

1
hist,bins = np.histogram(img.ravel(),256,[0,256])

绘制直方图

1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('bottle.png')
assert img is not None, "file could not be read, check with os.path.exists()"
color = ('b','g','r')
for i,col in enumerate(color):
histr = cv.calcHist([img],[i],None,[256],[0,256])
plt.plot(histr,color = col)
plt.xlim([0,256])
plt.show()

获取掩膜Mask内的直方图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

img = cv.imread('bottle.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"

# create a mask
mask = np.zeros(img.shape[:2], np.uint8)
mask[400:800, 400:1000] = 255
masked_img = cv.bitwise_and(img,img,mask = mask)

# Calculate histogram with mask and without mask
# Check third argument for mask
hist_full = cv.calcHist([img],[0],None,[256],[0,256])
hist_mask = cv.calcHist([img],[0],mask,[256],[0,256])

plt.subplot(221), plt.imshow(img, 'gray')
plt.subplot(222), plt.imshow(mask,'gray')
plt.subplot(223), plt.imshow(masked_img, 'gray')
plt.subplot(224), plt.plot(hist_full), plt.plot(hist_mask)
plt.xlim([0,256])

plt.show()

直方图均衡化,提高图像对比度,统一光照条件

1
2
3
4
5
img = cv.imread('wiki.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
equ = cv.equalizeHist(img)
res = np.hstack((img,equ)) #stacking images side-by-side
cv.imwrite('res.png',res)

对比度有限自适应直方图均衡CLAHE
将图像分成小块做常规的直方图均衡,如果任何直方图bin超过了给定的对比度限(默认40),做直方图均衡前会将这些像素剪切均匀分散到其他bins,做完后使用双线性插值去除边界不自然。

1
2
3
4
# create a CLAHE object (Arguments are optional).
clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl1 = clahe.apply(img)
cv.imwrite('clahe_2.jpg',cl1)

2D直方图
将图像BGR转HSV,对Hue和Saturation进行绘制,还是用cv.calcHist()

  • channels = [0,1] 因为我们使用H 和 S 平面
  • bins = [180,256] 180 是 H 平面,256是 S 平面
  • range = [0,180,0,256] Hue范围从0到180,Saturation从0到256
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import numpy as np
    import cv2 as cv
    from matplotlib import pyplot as plt

    img = cv.imread('view.png')
    assert img is not None, "file could not be read, check with os.path.exists()"
    hsv = cv.cvtColor(img,cv.COLOR_BGR2HSV)
    hist = cv.calcHist( [hsv], [0, 1], None, [180, 256], [0, 180, 0, 256] )

    plt.imshow(hist,interpolation = 'nearest')
    plt.show()

直方图反向投射
用于图像分割或者识别图像中物体。通过创建一个与输入同样宽高的单色图,其每个像素点代表输入图片的对应像素输入物体的概率,也就是越亮的地方有目标物体的概率越大。
方法:计算包含目标物体的图像之直方图,该图应经可能全部都是目标物体。做颜色直方图会比灰度直方图效果更佳。然后反向投射这个直方图到需要查找目标的图片,也就是计算目标图片每个像素属于目标物体图片的概率,并显示之。在合适的阈值下,可以达到将目标分割出来的目的。

1
cv.calcBackProject(	images, channels, hist, ranges, scale[, dst]	) ->	dst

用法类似与cv.calcHist()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np
import cv2 as cv

roi = cv.imread('trees.png')
assert roi is not None, "file could not be read, check with os.path.exists()"
hsv = cv.cvtColor(roi,cv.COLOR_BGR2HSV)

target = cv.imread('view.jpg')
assert target is not None, "file could not be read, check with os.path.exists()"
hsvt = cv.cvtColor(target,cv.COLOR_BGR2HSV)

# calculating object histogram
roihist = cv.calcHist([hsv],[0, 1], None, [180, 256], [0, 180, 0, 256] )

# normalize histogram and apply backprojection
cv.normalize(roihist,roihist,0,255,cv.NORM_MINMAX)
dst = cv.calcBackProject([hsvt],[0,1],roihist,[0,180,0,256],1)

# Now convolute with circular disc
disc = cv.getStructuringElement(cv.MORPH_ELLIPSE,(5,5))
cv.filter2D(dst,-1,disc,dst)

# threshold and binary AND
ret,thresh = cv.threshold(dst,50,255,0)
thresh = cv.merge((thresh,thresh,thresh))
res = cv.bitwise_and(target,thresh)

res = np.vstack((target,thresh,res))
cv.imwrite('res.jpg',res)
15. 图像傅里叶转换

对应正弦信号,振幅剧烈变化表示高频,缓慢变化为低频。图像类似的,边缘和噪音的亮度变化剧烈,因此算是高频信号。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('view.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
f = np.fft.fft2(img) # 变换
fshift = np.fft.fftshift(f) # 将低频放到中心
magnitude_spectrum = 20*np.log(np.abs(fshift))

plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

rows, cols = img.shape
crow, ccol = rows//2, cols//2
fshift[crow-10:crow+11, ccol-10:ccol+11] = 0 # 低频删除
f_ishift = np.fft.ifftshift(fshift) # 逆偏移
img_back = np.fft.ifft2(f_ishift) # 逆变换
img_back = np.real(img_back)

plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Image after HPF'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(img_back)
plt.title('Result in JET'), plt.xticks([]), plt.yticks([])

plt.show()

OpenCV的实现要快一些,但是没有Numpy这么直观。

16. 模板匹配

使用cv.matchTemplate() ,如果输入图像尺寸(W,H),目标图(w,h),那么输出(W-h+1,H-h+1)
使用cv.minMaxLoc()找到极值之所在作为矩形左上角坐标,结合(w,h)绘制包含目标的矩形

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt

img = cv.imread('view.jpg', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img2 = img.copy()
template = cv.imread('build.jpg', cv.IMREAD_GRAYSCALE)
assert template is not None, "file could not be read, check with os.path.exists()"
w, h = template.shape[::-1]

# All the 6 methods for comparison in a list
methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']

for meth in methods:
img = img2.copy()
method = eval(meth)

# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)

# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)

cv.rectangle(img,top_left, bottom_right, 255, 2)

plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)

plt.show()
17. 霍夫直线变换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import cv2 as cv
import numpy as np

img = cv.imread('opencv.png')
assert img is not None, "img loading wrong"
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
edges = cv.Canny(gray,50,150,apertureSize = 3)

# 输入: 二值图,ρ精度,θ精度,阈值
lines = cv.HoughLines(edges,1,np.pi/180,100)
for line in lines:
rho,theta = line[0]
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))

cv.line(img,(x1,y1),(x2,y2),(0,0,255),2)

cv.imwrite('houghlines3.jpg',img)

霍夫圆变换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
import cv2 as cv

img = cv.imread('opencv_white.png', cv.IMREAD_GRAYSCALE)
assert img is not None, "file could not be read, check with os.path.exists()"
img = cv.medianBlur(img,5)
cimg = cv.cvtColor(img,cv.COLOR_GRAY2BGR)

circles = cv.HoughCircles(img,cv.HOUGH_GRADIENT,1,20,
param1=50,param2=30,minRadius=0,maxRadius=0)

circles = np.uint16(np.around(circles))
for i in circles[0,:]:
# draw the outer circle
cv.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
# draw the center of the circle
cv.circle(cimg,(i[0],i[1]),2,(0,0,255),3)

cv.imshow('detected circles',cimg)
cv.waitKey(0)
cv.destroyAllWindows()

OpenCV 快速使用

1. 安装opencv

在windows系统中,打开cmd窗口,输入如下代码:

1
pip install opencv-contrib-python -i https://pypi.tuna.tsinghua.edu.cn/simple

使用contrib版本的功能要全面一些,后面的-i及其后是使用清华的源进行下载,会快很多。

检查是否安装成功

1
2
import cv2 as cv
print(cv.__version__) # 正常则显示版本号,我的例子是'4.9.0'
2. 基础绘制功能

cv.line(), cv.circle() , cv.rectangle(), cv.ellipse(), cv.putText()分别用来在图片上绘制直线、圆形、矩形、椭圆、添加文字。他们的参数非常类似,都包含如下几个

  • img: 绘制图形的目标图片
  • color: 绘制什么颜色的图形
  • thickness: 线条的粗细
  • lineType: 线型 (没发现区别)

举个例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
import cv2 as cv

# 创建一个黑色图片
img = np.zeros((512,512,3), np.uint8)

# 画一条蓝色对角线,注意颜色排序是BGR,所以(255,0,0)是蓝色
cv.line(img,(0,0),(500,500),(255,0,0),2)

# 画矩形
cv.rectangle(img, (200, 200), (280, 300), (255, 255, 0), 1)

# 圆,指定圆心坐标(300,100),半径100,颜色.., 线宽2
cv.circle(img, (300, 100), 100, (0, 255, 255), 2)

#椭圆:图像,椭圆中心,(长轴长度,短轴长度),角度,弧起角度,弧结束角度,(B, G, R), 线宽
cv.ellipse(img, (100, 300), (50, 30), 60, 0, 360, (255, 0, 255), 4)


pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1, 1, 2))

# 绘制多段线,第三个isClosed 如果设为True,可以自动将多段线闭合成多边形
cv.polylines(img, [pts], False, (255, 255, 255))

# 添加文字
font = cv.FONT_HERSHEY_COMPLEX
cv.putText(img, 'hello world', (40, 460),font, 1, (0x11,0xaa,0x11),2)


cv.imshow('draw', img)

cv.waitKey(3000)

cv.destroyAllWindows()
3. 用鼠标绘制

鼠标点击左键,绘制圆形。其中使用到了cv.setMouseCallback来为图像设置回调函数,回调函数draw_cicle接收event以及事件发生时的xy坐标,函数内判断事件类型,进行处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import cv2 as cv

# mouse callback function
def draw_circle(event,x,y,flags,param):
if event == cv.EVENT_LBUTTONDOWN:
cv.circle(img,(x,y),100,(255,0,0),-1)

# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)

while(1):
cv.imshow('image',img)
if cv.waitKey(20) & 0xFF == 27:
break
cv.destroyAllWindows()

更高级的例子,使用m键切换模式,鼠标点击后拖动绘制矩形和圆形。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import cv2 as cv
import math

drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1

# mouse callback function
def draw_circle(event,x,y,flags,param):
global ix,iy,drawing,mode

if event == cv.EVENT_LBUTTONDOWN:
drawing = True
ix,iy = x,y

elif event == cv.EVENT_MOUSEMOVE:
if drawing == True:
if mode == True:
cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv.circle(img,(ix,iy),int(abs(math.sqrt((x-ix)**2+(y-iy)**2))),(0,0,255),-1)

elif event == cv.EVENT_LBUTTONUP:
drawing = False
if mode == True:
pass
#cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
pass
#cv.circle(img,(x,y),5,(0,0,255),-1)

img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)

while(1):
cv.imshow('image',img)
k = cv.waitKey(1) & 0xFF
if k == ord('m'):
mode = not mode
elif k == 27:
break

cv.destroyAllWindows()
4. TrackBar使用

使用cv.createTrackBar()创建控制条,输入参数为:控制条名称、窗口名称、初始值、最大值、回调函数
使用cv.getTrackBarPos()获取控制条当前位置,输入参数:控制条名称、窗口名称
OpenCV里面没有按钮,因此可以创建一个最大值为1的控制条,作为开关使用

举个例子,带有开关的调色板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
import cv2 as cv

def nothing(x):
pass

# Create a black image, a window
img = np.zeros((300,512,3), np.uint8)
cv.namedWindow('image')

# create trackbars for color change
cv.createTrackbar('R','image',128,255,nothing)

cv.createTrackbar('G','image',0,255,nothing)
cv.createTrackbar('B','image',0,255,nothing)

# create switch for ON/OFF functionality
switch = '0 : OFF \n1 : ON'
cv.createTrackbar(switch, 'image',0,1,nothing)

while(1):
cv.imshow('image',img)
k = cv.waitKey(1) & 0xFF
if k == 27:
break

# get current positions of four trackbars
r = cv.getTrackbarPos('R','image')
g = cv.getTrackbarPos('G','image')
b = cv.getTrackbarPos('B','image')
s = cv.getTrackbarPos(switch,'image')

if s == 0:
img[:] = 0
else:
img[:] = [b,g,r]

cv.destroyAllWindows()

更复杂一些的例子,使用控制条改变绘制的颜色画笔尺寸

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
import cv2 as cv
import math

drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1

# mouse callback function
def draw_circle(event,x,y,flags,param):
global ix,iy,drawing,mode
r = cv.getTrackbarPos('r','image')
g = cv.getTrackbarPos('g','image')
b = cv.getTrackbarPos('b','image')
b_size = cv.getTrackbarPos('brush_size','image')

if event == cv.EVENT_LBUTTONDOWN:
drawing = True
ix,iy = x,y
if mode == True:
pass
else:
cv.circle(img,(x,y),b_size,(b,g,r),-1)

elif event == cv.EVENT_MOUSEMOVE:
if drawing == True:
if mode == True:
cv.rectangle(img,(ix,iy),(x,y),(b,g,r),-1)
else:
cv.circle(img,(ix,iy),int(abs(math.sqrt((x-ix)**2+(y-iy)**2))),(b,g,r),-1)

elif event == cv.EVENT_LBUTTONUP:
drawing = False
if mode == True:
pass
#cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
pass
#cv.circle(img,(x,y),5,(0,0,255),-1)


def nothing(x):
pass

img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)

cv.createTrackbar('r', 'image', 0, 255, nothing)
cv.createTrackbar('g', 'image', 0, 255, nothing)
cv.createTrackbar('b', 'image', 0, 255, nothing)
cv.createTrackbar('brush_size', 'image', 0, 100, nothing)

while(1):
cv.imshow('image',img)
k = cv.waitKey(1) & 0xFF

if k == ord('m'):
mode = not mode
elif k == 27:
break

cv.destroyAllWindows()

5. 像素/通道/边框操作
  • 单像素操作首选array.item()array.itemset()
  • 通道操作直接用numpy切片选择
  • roi不是复制,是view,因此roi的修改会改变原图数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np
import cv2 as cv


img = cv.imread('bottle.png')

assert img is not None, "file could not be read, checks with os.path.exists()"

# 索引某个像素点
px = img[100,100]
print(px)

# 索引某个像素点的蓝色值
px_blue = img[100,100,0]
print(px_blue)

# 修改某个像素点的数值
img[100,100] = [255,255,255]

# 实际上使用Numpy的array.item()和array.itemset()做以上操作会更好
print(img.item(50,50,0))
img.itemset((50,50,2),255)

# 获取图像尺寸,可以通过有无通道数判断是否彩色,返回tuple (高,宽,通道数)
print(img.shape)

# 获取图像大小,即以上尺寸的乘积
print(img.size)

# 图像的数据类型,有很多错误就是由于数据类型不匹配造成的
print(img.dtype)

# ROI选择,将一个区域移动到另外一个区域
obj = img[280:340, 330:390]
obj[:,:,:] = 0 # obj是numpy的一个选择,没有copy,修改之会影响img
img[273:333, 100:160] = obj

# 将彩色图像按通道拆分、组合
b, g, r = cv.split(img)
img = cv.merge((b,g,r))

# 注意cv.split耗时,不如使用numpy切片实现
img[:,:,2] = 255

# 给图片加边,这在卷积模型中常用到,使用cv.copyMakeBorder(),输入:目标图、上、下、左、右宽、类型、颜色(常量类型)
# 类型分为:
# cv.BORDER_CONSTANT常量填充
# cv.BORDER_REFLECT镜像填充如:fedcba|abcdefgh|hgfedcb
# cv.BORDER_REFLECT_101镜像填充如:fedcb|abcdefgh|gfedcb
# cv.BORDER_REPLICATE最后元素重复如:aaaaaa|abcdefgh|hhhhhhh
# cv.BORDER_WRAP头尾相接填充如:cdefgh|abcdefgh|abcdefg
img = cv.copyMakeBorder(img,5,5,5,5,cv.BORDER_WRAP)


cv.namedWindow('image')
cv.imshow('image',img)

cv.waitKey(2000)

cv.destroyAllWindows()
6. 图像相加与混合

与普通相加的区别,使用cv.add()相加结果大于数据类型范围会设为最大值。

1
2
3
4
5
6
7
8
import numpy as np
import cv2 as cv

x = np.array(250, dtype=np.uint8)
y = np.array(10, dtype=np.uint8)

print(cv.add(x,y)) # x + y = 260 > 255(uint8) 因此 [[255]]
print(x+y) # 260 % 256 = 4 因此 4

使用cv.addWighted()将两个图片混合起来,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import numpy as np
import cv2 as cv

img1 = cv.imread('ml.png')
img2 = cv.imread('opencv.png')
assert img1 is not None, "file could not be read, check with os.path.exists()"
assert img2 is not None, "file could not be read, check with os.path.exists()"

# 输入 图像1,系数1,图像2, 系数2,γ
# 系数1 + 系数2 = 1
# γ是添加到每个像素的标量
dst = cv.addWeighted(img1,0.6,img2,0.4,0)

cv.imshow('dst',dst)
cv.waitKey(0)
cv.destroyAllWindows()

按位操作 Bitwise Operations
以下是实现将一个logo扣出来贴到另一个图片的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import cv2 as cv
import numpy as np

# 读两张图片
img1 = cv.imread('bottle.png')
img2 = cv.imread('opencv.png')
assert img1 is not None, "file could not be read, check with os.path.exists()"
assert img2 is not None, "file could not be read, check with os.path.exists()"

# 准备让logo出现在图片的左上,所以创建一个ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]

# 做一个logo的mask,及非logo区域的mask
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask) # 按位非操作实现非logo区域mask

# 将ROI中logo的图像去掉
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)
cv.imshow('img1_bg',img1_bg)

# 从logo图片中只提取logo的部分
img2_fg = cv.bitwise_and(img2,img2,mask = mask)
cv.imshow('img2_fg',img2_fg)

# 将提取的logo图形与已经去掉图形的背景图相加
dst = cv.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst

cv.imshow('res',img1)
cv.waitKey(0)
cv.destroyAllWindows()

7. 性能检查与优化

使用cv.getTickCount()获取时钟周期,在要判断耗时的程序前后各获取一次,求差
使用cv.getTickFrequency()获取时钟频率,耗时(秒) = 周期数量 / 频率

1
2
3
4
5
6
7
8
9
10
import cv2 as cv

e1 = cv.getTickCount() # 使用time也是一样的

print('do something')

e2 = cv.getTickCount()
t = (e2 - e1) / cv.getTickFrequency()

print(t)

性能优化

1
2
3
4
5
6
7
8
9
10
11
import cv2 as cv
#cv.setUseOptimized(False) # 默认开了优化,可以手动开关
img = cv.imread('ml.png')
e1 = cv.getTickCount()

for i in range(5, 49, 2):
img1 = cv.medianBlur(img, i)
e2 = cv.getTickCount()
t = (e2 - e1) / cv.getTickFrequency()
print(t)
print(cv.useOptimized())

使用IPython时可以用它的命令%timeit非常方便地对每行代码用时进行分析
创建数组、单个或两个元素运算时,python直接运算以及opencv算法都比numpy快

性能优化方面的思路:首先以简单的方式实现算法,一旦算法开始工作,分析找到其瓶颈优化之

  1. 尽可能避免在Python中使用循环
  2. 最大可能地将算法向量化,因为numpy和opencv都针对向量运算进行了优化
  3. 非必要不复制array,只使用其views
    如果代码还是慢,考虑用Cython

附加资源

OpenCV VideoWriter

环境

Windows 11
python 3.12.2
opencv-contrib-python 4.9.0.80

保存视频问题

使用VideoWriter保存视频的时候结果只有1kb,显示文件已损坏。

解决方案
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import cv2 as cv

# 创建采集object
cap = cv.VideoCapture('video.mp4') # 传入-1默认摄像头,也可是摄像头序号0/1/2,或视频文件名称

frame_height = cap.get(3)
frame_width = cap.get(4)
size = (frame_width, frame_height) # 使用VideoWriter时输出必须与输入同尺寸

writer = cv2.VideoWriter('out.mov', cv2.VideoWriter_fourcc(*'divx'), 25.5, size, isColor=True)
# 输出格式亲测.avi .mov .mp4可用
# fourcc参数divx和mp4v都可用
# 帧率可以是浮点数
# 尺寸是个与源视频同样大小的元组则可用,顺序为(宽,高)
# 如果对图像进行了灰度处理,必须修改isColor为False

while cap.isOpened():
ret, frame = cap.read()

if not ret:
print('can\'t recieve frame, exiting...')
break

#frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
out.write(frame)

cv2.imshow('frame', frame)

if cv2.waitKey(1) == ord('q'):
break

cap.release()
out.release()
cv2.destroyAllWindows()

pyautogui

pyautogui可以控制鼠标移动和点击,可以用来实现一些自动化的操作,非常有趣。

先看注意事项

为避免失控,快速甩动鼠标到屏幕角落以强行退出程序。

一、控制鼠标

1. 安装pyautogui

在Windows的cmd窗口输入指令

1
pip install pyautogui
2. 确认安装正常

使用pyautogui.size()获取当前屏幕的宽高,返回是一个点对象。

1
2
3
import pyautogui
wh = pyautogui.size()
print(wh.width, wh.height)
3. 移动鼠标到绝对位置

使用pyautogui.moveTo()可将光标移至屏幕任意位置,传入三个参数,前两个分别为位置坐标xy,第三个duration=用来指定这个动作的耗时。
先了解一下计算机显示器的像素坐标系,屏幕左上为原点,x轴指向右侧,y轴指向屏幕底端。

图1 分辨率为1920x1080的屏幕坐标系图片[1]

举个例子,让鼠标自动画长方形。

1
2
3
4
5
6
import pyautogui
for i in range(10):
pyautogui.moveTo(100, 100, duration=0.25)
pyautogui.moveTo(100, 400, duration=0.25)
pyautogui.moveTo(400, 400, duration=0.25)
pyautogui.moveTo(400, 100, duration=0.25)
4. 移动鼠标到相对位置

使用pyautogui.move(),同样是三个参数,区别是坐标为相对位移坐标。
如果要获得鼠标当前位于何处,可以使用pyautogui.positioin(),没有参数,返回一个对象包含两个坐标值。

5. 控制鼠标点击

让鼠标在坐标(100,100)处完成一次左键单击,使用pyautogui.click(),传入的第三个参数button=用来指定左键、中键和右键。

1
2
3
import pyautogui
pyautogui.click(100, 100, button='left') # 点击左键
pytutogui.click(100, 100, button='right') # 点击右键

完成一次点击包括两个动作,也就是按下与释放。pyautogui.click()使用了默认的时间间隔。如果我们想要自定义按下和释放之间的保持时间,就使用pyautogui.mouseDown()来按下,用pyautogui.mouseUp()来释放,参数相同。

6. 拖动鼠标

按下后保持,再移动鼠标,可用pyautogui.dragTo()或者pyautogui.drag()实现,前者是绝对位置,后者为相对位置。
以下代码是用鼠标绘图的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pyautogui
import time

time.sleep(5) # 等待切换软件窗口
pyautogui.click(800, 600) # 点击画幅内的一点作为起点

distance = 500
change = 20
while distance > 0:
pyautogui.drag(distance, 0, duration=0.2) # Move right.
distance = distance - change
pyautogui.drag(0, distance, duration=0.2) # Move down.
pyautogui.drag(-distance, 0, duration=0.2) # Move left.
distance = distance - change
pyautogui.drag(0, -distance, duration=0.2)

事先打开一个绘图软件,此处以windows绘图软件为例,选择好画笔后回到IDE启动脚本。程序中有一个5s的延时,因此启动程序后5s内将实现准备好的绘图软件窗口最大化。

图2 pyautogui.drag()拖动鼠标绘制图形的结果
7. 控制滚轮

使用pyautogui.scroll()输入参数为整数滚动单位。

8. 获取屏幕上某一点的像素坐标

使用snipaste软件,启动在后台运行,需要时点击F1进入截图预览模式,此时鼠标所处的位置会有坐标值显示。
或者使用pyautogui.displayMousePosition()实时显示位置和颜色信息。

9. 获取屏幕图像信息

使用pyautogui.screenshot()获取全屏截图。使用pyautogui.pixel(x, y)获取某个像素点的颜色信息输入两个坐标值。
判断某个点是否与给定颜色一致,可用pyautogui.pixelMatchesColor(x, y, (R, G, B)),输入两个坐标值,和一个包含RGB信息的元组。
判断给定的图片在屏幕上的哪个位置,可用pyautogui.locateOnScreen('img.jpg'),传入参数为图片路径,返回xywh元组,可能有多个。如果想要点击这个区域的中心,将该元组传入pyautogui.click((x, y, w, h))即可。甚至可以直接这样pyautogui.click('img.jpg')实现查找和点击,但是有可能不成功,需和try以及except一起用。

10. 获取窗口信息
1
2
3
4
5
6
7
8
9
10
11
12
import pyautogui
fw = pyautogui.getActiveWindow() # 获取活动的窗口信息
print(str(fw)) # <Win32Window left="-12", top="-12", width="2584", height="1540", title="test.py - Visual Studio Code">

pyautogui.getAllWindows() # 返回一个list,包含所有的窗口信息

pyautogui.getWindowsAt(x, y) # 输入xy坐标,返回包含坐标的所有窗口

pyautogui.getWindowsWithTitle(title) # 输入title,返回对应的窗口

pyautogui.getAllTitles() # 获取所有的串口标题,返回是一个list,元素为string

对获取到的窗口可以进行一些操作

1
2
3
4
5
6
7
8
9
10
11
fw.width = 1000 # 将窗口宽度调整为1000像素

fw.topleft = (200, 200) # 将窗口左上角移动到指定位置

print(fw.isMaximized) # 判断窗口是否为最大化

fw.maximize() # 最大化窗口

fw.minimize() # 最小化窗口

fw.restore() # 恢复最大化/最小化操作

完整的使用方法,见官方文档https://pyautogui.readthedocs.io/

二、控制键盘

1. 输入内容

首先通过点击确定文字输入区域,再用pyautogui.write()输入。

1
2
3
4
5
6
7
import pyautogui
pyautogui.click(1080,1400)
pyautogui.write('hello world!') # pyautogui输入'!'时会自动按下shift键

# 使用list传入时可以将键盘上的所有键以名称方式描述,如'left'表←键
pyautogui.write(['a', 'b', 'left', 'left', 'X', 'Y']) # 按顺序按下a、b、←、←、shift+x、shift+y 结果 XYab

键盘对应的内容表格如下

键盘关键字 含义
‘a’, ‘b’, ‘c’, ‘A’, ‘B’, ‘C’, ‘1’, ‘2’, ‘3’, ‘!’, ‘@’, ‘#’等 单字符按键
‘enter’ (或 ‘return’ 或 ‘\n’) 回车键
‘esc’ ESC键
‘shiftleft’, ‘shiftright’ 左右SHIFT键
‘altleft’, ‘altright’ 左右ALT键
‘ctrlleft’, ‘ctrlright’ 左右CTRL键
‘tab’ (或 ‘\t’) TAB键
‘backspace’, ‘delete’ BACKSPACE 和 DELETE键
‘pageup’, ‘pagedown’ PAGE UP 和 PAGE DOWN 键
‘home’, ‘end’ HOME 和 END 键
‘up’, ‘down’, ‘left’, ‘right’ up, down, left 和 right 箭头键
‘f1’, ‘f2’, ‘f3’, 等 F1 到 F12 键
‘volumemute’, ‘volumedown’, ‘volumeup’ 静音,音量减,音量加键(有些键盘没有这些键,但是你的电脑可以接收这些指令)
‘pause’ PAUSE键
‘capslock’, ‘numlock’, ‘scrolllock’ CAPS LOCK, NUM LOCK, 和 SCROLL LOCK 键
‘insert’ INS 或 INSERT 键
‘printscreen’ PRTSC 或 PRINT SCREEN 键
‘winleft’, ‘winright’ 左右WIN键(Windows)
‘command’ Command键(macOS)
‘option’ OPTION键(on macOS)
2. 热键组合

与鼠标相似,我们可以用pyautogui.keyDown()pyautogui.keyUp()来分开控制键盘的按下与释放。
如在windows上实现一次复制粘贴,可以如下操作

1
2
3
4
pyautogui.keyDown('ctrl')
pyautogui.keyDown('c')
pyautogui.keyUp('c')
pyautogui.keyUp('ctrl')

但是上面的操作未免有些复杂了,同样的操作可以这样

1
pyautogui.hotkey('ctrl', 'c')
3. 使用提示
  • 屏幕分辨率保持不便
  • 应用窗口应该最大化,因为这样才能保证按钮始终在同一个位置
  • 根据软件反应速度多给一些延时,你不会想在上一个动作没有完成就开始点击
  • 使用locateOnScreen()寻找按钮,尽量不依赖xy坐标,在没发现目标时停止比乱点击要好
  • 使用getWindowsWithTitle()确保你准备控制的应用程序窗口是存在的,用activate()让窗口进入前台
  • 增加尽可能多的检查,如弹窗、断网的情况怎么处理
  • 第一次运行时要完整地观察是否正常

给延时的两个例子

1
2
3
4
5
6
import time
time.sleep(3) # 等待3s

import pyautogui
pyautogui.countdown(10) # 等待10s,在命令行倒数输出10、9、8...

另外可以用pyautogui.alert('text')pyautogui.confirm('text')来弹窗确认。如果需要使用到剪贴板里面的内容,需要用到import pyperclip然后pyperclip.paste()

关于计算机道德
为善去恶,学会这项技术的目的应该是提高自己的生产效率,勿做有损他人的事情。

引用

[1] Automate The Boring Stuff, Al Sweigart

你好世界!

欢迎来到黄河水澄的技术专栏!
这里会分享我学习过程中的一些技术笔记,目的是督促自己学习、分享有用的知识。