OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV has several functionalities:
We will just scratch the surface of what OpenCV can do, but you can find plenty of tutorials online if you want to learn more.
OpenCV is a C++ library, but it has bindings for many languages, including python. We will use python to write our code, but the underlying OpenCV code will be in C++ (very fast).
We will acquire images and videos from the Raspberry Pi camera and process them with OpenCV in the Raspberry Pi. We can use this to track moving objects such as the laser spot of the Cavendish experiment or the falling spheres in the glicerine experiment.
Installing OpenCV in Python 3 is trivial. Just open a terminal and type:
pip3 install opencv-python
import cv2
Initialise the video capture from default camera 0
cap = cv2.VideoCapture(0)
Acquire a frame and release the capture stream
ret, frame = cap.read()
cap.release() # always remember to release the capture!
Show the image using matplotlib
import matplotlib.pyplot as plt
plt.imshow(frame);
Let's fix the colour. OpenCV captures the image in BGR (Blue, Green, Red) format, but plt.imshow
wants it in RGB format. We can use a function of cv2
to fix that.
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
plt.imshow(frame_rgb);
Let's define a simple function for that
def plotOpenCVImage(image):
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
fig = plt.imshow(image_rgb)
return fig
plotOpenCVImage(frame);
The NumPy array representing the image has three dimensions: the first two are the height and width of the image, the third is the colour channel.
In our case, we have three BGR channels.
print(f'The shape of the image is {frame.shape}')
The shape of the image is (720, 1280, 3)
For an image in grayscale, we have only one channel.
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # convert to greyscale
print(f'The shape of the grayscale image is {frame_gray.shape}')
plt.imshow(frame_gray, cmap='gray');
The shape of the grayscale image is (720, 1280)
Note: the pixels are indexed as image[y, x, channel]
, not as image[x, y, channel]
. Also, the y
index starts from the top, not from the bottom.
plotOpenCVImage(frame);
What corner of the image do I select with the following?
plotOpenCVImage(frame[0:100, 1000:1280]);
Save the image to disk using cv2.imwrite
# save the image to file
cv2.imwrite('test_image.png', frame)
True
Load the image from disk using cv2.imread
# load the image from file
loaded_image = cv2.imread('test_image.png')
plotOpenCVImage(loaded_image);
Acquiring a video is as simple as reading many frames from cv2.VideoCapture
. However, writing the output to a file requires us to correctly initialise a cv2.VideoWriter
object.
def RecordVideo(outfilename, record_duration_s = 3):
cap = cv2.VideoCapture(0)
# set resolution and framerate
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = 30 # getting it from cap.get(cv2.CAP_PROP_FPS) does not work for my webcam
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(outfilename, fourcc, fps, (frame_width, frame_height))
print(f'Recording video for {record_duration_s} seconds with resolution {frame_width}x{frame_height}')
for i in range(0, fps * record_duration_s):
ret, frame = cap.read()
out.write(frame)
cap.release()
out.release()
print(f'Video written to {outfilename}')
RecordVideo('test_video.avi', record_duration_s=3)
Recording video for 3 seconds with resolution 1280x720 Video written to test_video.avi
You can check the output video file created by opening it with ffplay
or vlc
.
Reading in a video file frame by frame is very similar to reading from a camera. We just need to initialise the cv2.VideoCapture
object with the path to the video file instead of the camera number.
# Open the video capture from the AVI video file
cap = cv2.VideoCapture('test_video.avi')
# Get the number of frames in the video
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f'The video contains {n_frames} frames')
# Read frames one by one
for frame_index in range(n_frames):
ret, frame = cap.read()
# Do something with the frame before moving on to the next one...
print(frame_index, frame[0,0]) # just printing the color code of the first pixel of each frame
cap.release()
The video contains 90 frames 0 [3 8 8] 1 [25 32 38] 2 [36 46 50] 3 [40 50 54] 4 [42 52 56] 5 [46 56 60] 6 [49 59 63] 7 [50 60 64] 8 [55 65 69] 9 [72 85 93] 10 [70 86 93] 11 [70 86 93] 12 [70 86 93] 13 [71 87 94] 14 [71 87 94] 15 [71 88 92] 16 [71 88 92] 17 [68 85 89] 18 [65 82 86] 19 [70 87 91] 20 [74 86 91] 21 [74 84 94] 22 [74 84 94] 23 [74 84 96] 24 [74 84 96] 25 [76 89 97] 26 [ 85 98 106] 27 [ 87 103 110] 28 [ 90 106 113] 29 [ 88 104 111] 30 [ 90 106 113] 31 [ 90 106 113] 32 [ 86 102 109] 33 [ 86 102 109] 34 [ 79 95 102] 35 [ 78 94 101] 36 [76 92 99] 37 [78 95 99] 38 [78 95 99] 39 [78 95 99] 40 [77 94 98] 41 [77 94 98] 42 [76 93 97] 43 [76 93 97] 44 [76 93 97] 45 [72 89 93] 46 [72 89 93] 47 [72 89 93] 48 [72 88 95] 49 [75 85 95] 50 [75 85 95] 51 [75 85 95] 52 [74 85 93] 53 [75 86 94] 54 [77 88 96] 55 [77 88 96] 56 [77 88 96] 57 [77 88 96] 58 [77 88 96] 59 [77 88 96] 60 [77 88 96] 61 [71 88 92] 62 [71 88 92] 63 [71 88 92] 64 [72 89 93] 65 [72 89 93] 66 [72 89 93] 67 [72 89 93] 68 [72 89 93] 69 [72 89 93] 70 [72 89 93] 71 [72 89 93] 72 [71 88 92] 73 [73 87 92] 74 [75 89 94] 75 [73 89 96] 76 [73 89 96] 77 [72 88 95] 78 [72 88 95] 79 [72 88 95] 80 [74 90 97] 81 [74 90 97] 82 [74 90 97] 83 [74 90 97] 84 [74 90 97] 85 [74 87 95] 86 [74 87 95] 87 [75 88 96] 88 [75 88 96] 89 [75 88 96]
If you want to grab a specific frame you don't need to loop. You can use the cap.set
method to set the frame number. Note that the frame number is 0-based, so the first frame is frame 0.
# get the 10th frame in the video
cap = cv2.VideoCapture('test_video.avi')
cap.set(cv2.CAP_PROP_POS_FRAMES, 9)
ret, frame = cap.read()
cap.release()
plotOpenCVImage(frame);
Let's use a simple picture of a plane with coloured dots to learn how to detect objects in an image.
image = cv2.imread('test_dots.png')
plotOpenCVImage(image);
Apply a mask to remove the part of the picture that may confuse the points identification algorithm. The mask is simply a NumPy array with values equal to 0 or 1 depending on whether the pixel should be kept or not.
# first create mask as a null matrix with the same shape as the image
import numpy as np
mask = np.zeros(image.shape[:2], dtype=np.uint8)
# then mask the region outside of these four corners
mask_corners = np.array([[180, 70], [860, 30], [840, 620], [180, 500]])
cv2.drawContours(mask, [mask_corners], -1, 255, -1)
plt.imshow(mask, cmap='gray');
We can now apply the mask to our image by using cv2.bitwise_and
, which makes a logical AND of all the pixels, and turns the ones that are black in the mask off in the masked image.
masked_image = cv2.bitwise_and(image, image, mask=mask)
plotOpenCVImage(masked_image);
hsv_image = cv2.cvtColor(masked_image, cv2.COLOR_BGR2HSV)
The we use cv2.inRange
to select the pixels that have a Hue value between the minimum and maximum values we want to select. Green has a hue value around $80$. The range we allow is $\pm 10$ but can be tweaked to be more or less inclusive.
lower_green = np.array([70, 40, 40]) # Lower bound for green hue (adjust as needed)
upper_green = np.array([90, 255, 255]) # Upper bound for green hue (adjust as needed)
green_regions = cv2.inRange(hsv_image, lower_green, upper_green)
plt.imshow(green_regions, cmap='gray');
We then use cv2.findContours
to find the contours of the green objects in the image. We can then loop over the contours and find the centroid (mean pixel position) of each object by using cv2.moments
.
plotOpenCVImage(masked_image);
contours, _ = cv2.findContours(green_regions, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
green_dots_positions = []
for contour in contours:
if cv2.contourArea(contour) > 50: # Minimum contour area threshold
M = cv2.moments(contour)
center_x = M["m10"] / M["m00"]
center_y = M["m01"] / M["m00"]
print(f"Centroid of green dot: (x, y) = ({center_x:.1f}, {center_y:.1f})")
plt.scatter(center_x, center_y, color='green', marker='o', facecolor='none', s=100);
green_dots_positions.append([center_x, center_y])
Centroid of green dot: (x, y) = (570.6, 448.1) Centroid of green dot: (x, y) = (575.2, 310.9) Centroid of green dot: (x, y) = (576.6, 171.1)
We want to measure the position of the identified dots in a given plane a set of coordinates. To calibrate and correct for perspective we need 4 points in the image and their corresponding coordinates in the plane.
calibration_pixels = np.array([[199, 488], [200, 98], [812, 570], [834, 62]], dtype=np.float32)
for x, y in calibration_pixels:
plt.scatter(x, y, color='black', marker='o', facecolor='none', s=100)
# the corresponding coordinates in the plane are:
calibration_coords = np.array([[0, 0], [0, 2], [3, 0], [3, 2]], dtype=np.float32)
plotOpenCVImage(masked_image);
We can then use OpenCV to calculate map the pixel position of the points in the image to their position in the plane.
We use a perspective transformation because the camera sensor is not parallel to the plane.
transform_matrix = cv2.getPerspectiveTransform(calibration_pixels, calibration_coords)
def get_coordinates(x_pixel, y_pixel, transform_matrix=transform_matrix):
point_pixels = np.array([[x_pixel], [y_pixel], [1]], dtype=np.float32)
point_coords = transform_matrix.dot(point_pixels)
point_coords = point_coords / point_coords[2]
return point_coords[0][0], point_coords[1][0]
Now let's try our calibration on the green dots we identified before.
Can we get their coordinates in the plane correctly?
for x, y in green_dots_positions:
x_coord, y_coord = get_coordinates(x, y)
print(f"Green dot at (x, y) = ({x_coord:.1f}, {y_coord:.1f})")
plotOpenCVImage(masked_image);
Green dot at (x, y) = (2.0, 0.4) Green dot at (x, y) = (2.0, 1.0) Green dot at (x, y) = (2.0, 1.6)
We have seen how to use OpenCV to: