Exciting times ahead as software advancements let you implement computer vision applications on single-board computers like the Raspberry Pi. Today, we explore how computer vision works by creating a deep-learning-based face detection project using OpenCV. Please note that I am a Machine Learning beginner and this post only aims to spark interest in computer vision. It will not include explanations on machine learning concepts but rather a step-by-step tutorial on the easiest way to create a face detector using Raspberry Pi.

To get started, let us proceed on understanding the terms.

Computer Vision

Computer vision is a field in computer science that deals with computers understanding digital images and videos. As we are all aware, computers only understand 1s and 0s. What Computer vision does is that it uses mathematically-based programming to map these 1s and 0s from an image or a video frame and make various image processing applications on them. Object detection, Face detection, Style transfer, and Deepfakes are just some of its popular applications.

OpenCV

OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. It is arguably the most popular tool in computer vision. You can also use OpenCV on top of other machine learning tools like TensorFlow, but that’s outside the scope of our tutorial today.

How to Install OpenCV on your Raspberry Pi

To get started, let us install OpenCV on our Raspberry Pi. It is infamous for its tedious process on Linux computers, so try to go slowly and read each line of the procedure to prevent annoying errors on completion.

Install the Dependencies

1. Like any other installations, update your Raspberry Pi first.

sudo apt update
sudo apt upgrade

2. Next, we will install OpenCV’s dependencies. There are a lot of them, so we will install them in chunks so we can monitor what gets installed and what throws an error. The first chunk are packages we need to compile OpenCV.

sudo apt install cmake build-essential pkg-config git

3. Next are packages that will add support for different image and video formats to OpenCV.

sudo apt install libjpeg-dev libtiff-dev libjasper-dev libpng-dev libwebp-dev libopenexr-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libxvidcore-dev libx264-dev libdc1394-22-dev libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev

4. Then, the packages that the OpenCV interface needs.

sudo apt install libgtk-3-dev libqtgui4 libqtwebkit4 libqt4-test python3-pyqt5

5. Next packages are crucial for OpenCV to run at a decent speed on the Raspberry Pi.

sudo apt install libatlas-base-dev liblapacke-dev gfortran

6. Then, packages related to the Hierarchical Data Format (HDF5) that OpenCV uses to manage data.

sudo apt install libhdf5-dev libhdf5-103

7. Lastly, the python-related packages.

sudo apt install python3-dev python3-pip python3-numpy

 Upsizing the Swap Space

  1. The swap space is a portion of a computer’s main memory used by the operating system when the device runs out of physical RAM. It is a lot slower than RAM, but we need it to help the process of compiling OpenCV to the Raspberry Pi. Enter the following command to access the swap file—the file that configures the swap space.
sudo nano /etc/dphys-swapfile

2. Now replace the line

CONF_SWAPSIZE=100

with

CONF_SWAPSIZE=2048

Save and exit.

3. Lastly, restart the service on your Raspberry Pi to implement the changes without rebooting the system.

sudo systemctl restart dphys-swapfile

Congratulations! You now changed your swap space from 100MB to 2GB.

Cloning the OpenCV repositories

At last, we are now ready to get the actual OpenCV files. Enter the following commands to clone the OpenCV repositories from GitHub to your computer. Hopefully, your present working directory is your home directory. It’s not required, but I suggest cloning it there, so it’s easily accessible. These files are also large, so depending on your internet connection, this may take a while.

git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

 Compiling OpenCV

1. You can see both repositories on your home directory after you successfully cloned them. Now, let’s create a subdirectory called “build” inside the OpenCV folder to contain our compilation.

mkdir ~/opencv/build
cd ~/opencv/build

2. Next, generate the makefile to prepare for compilation. Enter the following command:

cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
    -D ENABLE_NEON=ON \
    -D ENABLE_VFPV3=ON \
    -D BUILD_TESTS=OFF \
    -D INSTALL_PYTHON_EXAMPLES=OFF \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D CMAKE_SHARED_LINKER_FLAGS=-latomic \
    -D BUILD_EXAMPLES=OFF ..

3. Once the make file is done, we can now proceed to compile. The argument j$(nproc) tells the compiler to run using the available processors. So, for example, nproc is 4, it will be -j4. This will take the longest time out of all the steps. For reference, a newly set up Raspberry Pi 4 with 8GB RAM takes more than an hour to execute this command.

make -j$(nproc)

4. Hopefully, the compilation will not take more than two hours of your life. We now proceed to install the compiled OpenCV software.

sudo make install

5. Finally, we regenerate the Pi’s library link cache. The Raspberry Pi won’t be able to find OpenCV without this step.

sudo ldconfig

Reverting the Swap Space

1. Now that we are done installing OpenCV, we don’t need to have such a large swap space anymore. Reaccess the swap space with the following command.

sudo nano /etc/dphys-swapfile

2. Change the line

CONF_SWAPSIZE=2048

to

CONF_SWAPSIZE=100

Save and Exit.

3. Lastly, restart the service to implement the changes.

sudo systemctl restart dphys-swapfile

 Testing OpenCV on your Raspberry Pi

1. The easiest way to confirm our installation is by bringing up an interactive shell and try to import cv2 (OpenCV’s library name). Open python.

python

2. Now, it doesn’t matter what version you are using. If you followed the steps before, OpenCV should already be accessible in both Python 2 and Python 3. Try to import it using the line:

import cv2

3. If the import line returned nothing, then congratulations! Additionally, you can verify the OpenCV version you have in your computer using the following command:

cv2.__version__

Now that we have OpenCV in our Raspberry Pi, let us proceed on the facial detection project.

Face Detection with OpenCV and Deep Learning

There are many ways to start learning computer vision, and among them is learning by doing. That is what I always do, even with a theory-heavy topic like computer vision. I find it easier just to do the thing then research what I don’t understand after than read everything from a white paper filled with fat jargon. Fortunately, I found a great website for computer vision with Python. If you want to get deeper with computer vision, check this site here.

Our face detection project aims to distinguish faces from a video stream and then light up a green LED. We will use the Raspberry Pi as the host computer and the Raspberry Pi camera module as the primary sensor. We will use OpenCV, particularly the Caffe-based face detector from the original Github source for the software. The Caffe model needs two files to work. First is the prototxt file, which defines the deep learning model. Deep learning models are the layers of processing your input image goes through to produce your desired output—in our case, the probability of a portion of an image being a face. The other file is the Caffe model file, which contains the actual layers’ weights from the prototxt file. These weights are just predetermined numbers that are optimized for face detection.

Thankfully, PyImageSearch has these files on site, so I did not have a problem creating them from scratch. It would take me forever as I am a beginner at Machine Learning as well.

We will not go in-depth with the framework of OpenCV’s face detector. These are way out of line for a getting started article. I read one resource from PyImageSearch, and it is not for the faint of heart. However, if you have some background in machine learning, it is worth checking out.

The Code

from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2
import RPi.GPIO as GPIO

ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

print("[INFO] starting video stream...")
vs = VideoStream(usePiCamera=True).start() 
time.sleep(2.0)

while True:
	frame = vs.read()
	frame = imutils.resize(frame, width=400)
        GPIO.output(3, GPIO.LOW)
        GPIO.output(4, GPIO.HIGH)
        
 
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0,
		(300, 300), (104.0, 177.0, 123.0))
 
	net.setInput(blob)
	detections = net.forward()

	for i in range(0, detections.shape[2]):
		confidence = detections[0, 0, i, 2]
		if confidence < args["confidence"]:
			continue

                GPIO.output(3, GPIO.HIGH)
                GPIO.output(4, GPIO.LOW)

		box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
		(startX, startY, endX, endY) = box.astype("int")
 
		text = "{:.2f}%".format(confidence * 100)
		y = startY - 10 if startY - 10 > 10 else startY + 10
		cv2.rectangle(frame, (startX, startY), (endX, endY),
			(0, 255, 0), 2)
		cv2.putText(frame, text, (startX, y),
			cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
	if key == ord("q"):
		break

cv2.destroyAllWindows()
vs.stop()

First, we import the required packages. In this project, we will need OpenCV, imutils, numpy, argparse, and time. Everything besides OpenCV imutils, and numpy comes pre-installed with Python. And since we already installed OpenCV in the earlier section, we just need imutils and numpy. To install both, enter the following:

pip install imutils && pip install numpy 

The argument parser lets you add arguments when executing the code from the terminal. For instance, in the code, we have three required arguments:

  • –image : The input image path.
  • –prototxt : The prototxt file path.
  • –model : The Caffe model path.
  • –confidence: The probability threshold that determines whether an image has a face or not.

The confidence argument is optional and will be set to a default of 0.5 if not specified in the execution command.

Next, after initializing the libraries and the argument parser, I loaded the model and initialized the video stream using cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"]) and VideoStream(usePiCamera=True).start(). Note that you can change your input video to access your computer’s webcam or a single video file. You only need to replace usePiCamera=True by src=0 and VideoStream() to FileVideoStream(PathtoVideoFile), respectively. After initializing the video stream, we pause for two seconds to give the camera time to warm up and then go to the main loop.

The main loop starts by reading the current video frame and resizing the image to my desired dimensions (400 max-width). Then the next line will turn off the green LED and turn on the red LED. We then grab the current frame and create a blob. A blob is a frame that has undergone mean subtraction and normalization for image classification optimization. These methods are heavily-used in image classification pre-processing, and OpenCV’s API makes these methods so simple that I only used a single-liner cv2.dnn.blobFromImage() to implement the whole image processing.

Now, remember the pre-trained deep-learning-based Caffe model we initialized earlier? We now pass the blob into it to detect the face in the frame. It will search through the frame while returning confidence values. Since we are set to the 0.5 default, the program will only proceed when a part of the image exceeds a value of 0.5. This means it recognizes that the frame has a face! We now set the green LED on and the red LED off, indicating we have seen a face.

Then, we draw a rectangle along the detected face and put a text that shows its actual probability above it. We do this by acquiring the x and y coordinates of the detection and creating the shape and the text with cv2.rectangle and cv2.putText(). These methods let you specify the color, the shape width, the font, and the font size of the shape and the text.

But wait, we haven’t even shown the display yet! The display turns on with the first loop, whether it detects a face or not. It is initialized with cv2.imshow("Frame", frame). Additionally, we have a line that stops the stream when a certain key is pressed (Q in the code’s case). This will break the loop and exit the Python program.