Signup/Sign In

Optimization of Facial Landmark Detection (Making it Fast)

Posted in Technology   LAST UPDATED: SEPTEMBER 4, 2021

    In this tutorial we will learn how to implement fast face landmark detection using Dlib pre-trained model in OpenCV using Python. The code that we have used in this tutorial is available in the Github repository.

    In the above github repository you will get code only and the model of Dlib - 68 face features, you can download from the below link:

    https://github.com/davisking/dlib-models/blob/master/shape_predictor_68_face_landmarks.dat.bz2

    Download the code and model in the same folder and then run the command as shown below

    python fastFaceLandmarkDetection.py

    Dlib is basically good facial landmark detector but sometimes it has been found that Dlib is very slow facial landmark detection. Actually implementation of the Dlib is not bad and it works for better landmark detection. But only there are some methods with the help of that we can improve that detection fast. So in this blog, we are going to talk about only some methods which can improve the facial landmark detection which looks realistic.

    Dlib facial landmark detected written a research paper http://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf which can detect a landmark in just 1 millisecond. Because it can process the 1000 frames in 1 millisecond. The processing 1000 frames do not mean we will get very high fps but it means we are getting fast landmark detection with some techniques which you can read from this research paper.

    Sometimes to get the speed of the fast landmark detection is a very tedious job and sometimes people generally trying to change the parameter value while training the Dlib's landmark detection model. But if we change the parameter of the training model and even we get the speed of the landmark detector fast but at the end, we get the same result as previous. Even if we do more parameter optimization and we get speed to detect landmark from 1 millisecond to 0.5 milliseconds, still we will get some issues.

    Then how can we improve that by using some methods which are given below?

    So now we are going to find out the main issues of that and solve without change the model parameters.

    Landmark detection is a two-step process.

    1. Face detection

    2. Landmark detection

    1. Face Detection:

    The landmark detection done after the face detection of the image. To get the best result, we generally use the same face detector which we used during the training of the model. After getting the face detection, the output will be in the form of a rectangle (x,y,w,h) that contains the face. The landmark detection can be done as fast as according to the Dlib in 1 millisecond. But the face detection will depend upon the size of the image, if the image is large, it can take more then 60 milliseconds but as usually the face detection done from 15 milliseconds to 60 milliseconds.

    2. Landmark detection:

    The landmark detector finds the landmarks inside the face rectangle. There are two different kinds of face detectors. The face detector is based on HAAR, Machines (SVM) or LBP cascades and Histogram of Oriented Gradients.

    Speed Up Face Detection:

    As we mentioned, landmark detection is a two-step process. First, the faces are detected in an image, and then landmark detector run inside each face bounding box. The landmark detector runs in 1 millisecond. The face detector, depending upon the size of the image, can take anywhere between 15 milliseconds to 60 milliseconds or even more. Face detection is the biggest bottleneck that needs to be addressed.

    The following steps we can use to speed up the face detection but due to the below steps we might be losing some minor accuracy but it will not effects that much to the result.

    1 Resize Frame:

    The one of the method to speed up the face detection is using resize frame which is coming through the webcam. Because as we know the face landmarks depend upon the face bounding box which contains the human face. But if the frame size is very large, then the face detection will be very slow and this is also the major point which keeps detection and landmark slow. Because it has to detect the human face on the high-resolution image.

    But if we keep the image or frame size low resolution then detection of the face will be high but it will fail to detect the large face. The solution for this is, we can detect a face on a small frame and after done the detection we just change the rectangle (x, y, w, h) face values into the original frame value. This we can do after the get bounding box of the face and resized back the coordinates by the scale used for resizing the original frame. This will helps us to find the detection on the original frame which has a high resolution or also detect fast.

    2 Skip Frame:

    Generally, the webcam records the videos at 30 fps (frame per seconds). When we look in front of the camera and not moving anyway, so there is no need to detect the face on every frame as we are just sitting and all times mostly we get the same frame. So the idea behind this is we just can skip some frames like 3. If we skip frames 3 and then again detect a face on the 4th frame we can get the face detection 3 times faster then the previous methods where we not skipping any frame.

    The below is the full python code of the above discussion with proper explanation:

    #USAGE: python fastFaceLandmarkDetection.py
    
    # import the necessay libraries
    import cv2,dlib
    import sys
    import numpy as np
    
    # The below method is called by all time from the faceLandmarkPoints methods inside and then we pass here image and points,
    # on the behalf of that points we are getting x cordinates and y cordinates of the passing points value and then we draw that points
    # on the image face through the cv2.polylines.
    def drawPointsFace(image, faceLandmarks, start, end, isClosed=False):
      facePoints = []
      for i in range(start, end+1):
        point = [faceLandmarks.part(i).x, faceLandmarks.part(i).y]
        facePoints.append(point)
    
      points = np.array(facePoints, dtype=np.int32)
      cv2.polylines(image, [points], isClosed, (150, 150, 0), thickness=2, lineType=cv2.LINE_8)
    
    # the below method will first check either points exactly 68 or not. If assertion becomes true then it will call
    # one by one drawPointsFace method to draw on the image which is actually a frame which coming through the web camera.
    def faceLandmarkPoints(image, faceLandmarks):
        assert(faceLandmarks.num_parts == 68)
        drawPointsFace(image, faceLandmarks, 0, 16)           # Jaw line
        drawPointsFace(image, faceLandmarks, 17, 21)          # Left eyebrow
        drawPointsFace(image, faceLandmarks, 22, 26)          # Right eyebrow
        drawPointsFace(image, faceLandmarks, 27, 30)          # Nose bridge
        drawPointsFace(image, faceLandmarks, 30, 35, True)    # Lower nose
        drawPointsFace(image, faceLandmarks, 36, 41, True)    # Left eye
        drawPointsFace(image, faceLandmarks, 42, 47, True)    # Right Eye
        drawPointsFace(image, faceLandmarks, 48, 59, True)    # Outer lip
        drawPointsFace(image, faceLandmarks, 60, 67, True)    # Inner lip
    
    # Here we are loading the Dlib 68 face landmark model
    modelPath = "shape_predictor_68_face_landmarks.dat"
    
    # Line 13, to process the fast detection, as we told before in the above theory we have to fixed the size of the frame and run the
    # face detection and landmarks on that frame and later we scale the output co-ordinates value with the original frame. So
    # here we kept the size of the frame is 480.
    heightResize = 480
    
    # Line 17, here we are specifying how many frames it has to skipped during live so that it will deduct fast face detection. 
    framesSkipping = 2
    
    try:
      # Line no. 21, we are creating a window name
      windowName = "Detecting the Facial Landmark Fast"
    
      # Line 22, here we are creating a VideocameraObjectture object.
      cameraObject = cv2.VideoCapture(0)
    
      # Line 29, here the video object is trying to find out either it can read the frame or not which means,
      # sometimes the webcam is not working properly or webcam off, so this fucntion will care of it, so that without the frame read programme will go
      # unconditionally stop. To overcome of that, we use this method to inform user, kindly check your webcam.
      if (cameraObject.isOpened() is False):
        print("Unable to connect to camera, kindly check your web camera.")
        sys.exit()
    
      # Line 34, here we just keep a value but this is not original value. The Actual value calculated after 100 frames.
      framePerSecond = 30.0
    
      # Line 37, it will read the first frame using video object.
      ret, image = cameraObject.read()
    
    
      # Line 42 - 49, these lines first check the height of the coming frame from webcam and then resize height of that frame with the help of the
      # our defined height value. If this is not happend, then it's means frames not able to read and it will simply exit. 
      if ret == True:
        height = image.shape[0]
        # calculate resize scale
        frame_resize_scale = float(height)/heightResize
        size = image.shape[0:2]
      else:
        print("Unable to read frame")
        sys.exit()
    
      # Line 53 - 54, we loading the face detection and shape predictor models from dlib predefined models
      faceDetector = dlib.get_frontal_face_detector()
      shapePredictor = dlib.shape_predictor(modelPath)
      # Line 56, initiating the tickCounter, which we will used to calculate the actual framePerSecond (frame per second) value.
      time = cv2.getTickCount()
    
      # Line 60, this count variable we will be use to count each frame, because final value of framePerSecond will calculate on after 100 frames,
      # so this varibale track when 100 frames completed and then update the framePerSecond value.
      count = 0
    
      # Line 63 - 122 Grab the frame and process the frames until the main window is closed by the user.
      while(True):
        if count==0:
          time = cv2.getTickCount()
    
        # Line 68, here it's Grab a frame and store this in varibale imageFrame
        ret, image = cameraObject.read()
    
        # Line 71, we just converting the imageFrame from BGR to RGB format
        imageDlib = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
        # Line 74, now we are creating a frameSmall by resizing image by resize scale 
        imageSmall= cv2.resize(image, None, fx = 1.0/frame_resize_scale, fy = 1.0/frame_resize_scale, interpolation = cv2.INTER_LINEAR)
    
        # Line 77, we are now converting the frameSmall (image) to again BGR to RGB
        imageSmallDlib = cv2.cvtColor(imageSmall, cv2.COLOR_BGR2RGB)
    
        # Line 83, this will helps to increase the detection by skipping the frame. The value of skipping frame depends upon 
        # your system hardware and the camera (how much framePerSecond process). This is the main concepts, which can reduce the 
        # computation.
    
        if (count % framesSkipping == 0):
          # Line 85, detect faces on frameSmall which we resize it already.
          faces = faceDetector(imageSmallDlib,0)
    
        # Line 88, iterate over faces
        for face in faces:
          # Line 92 -96, as we run face detection on a resized image for faster detection,
          # so, now we will scale up that coordinates (x, y, w, h) value with the original frame, so that we can get face 
          # rectangle on the original frame.
          newRectValues = dlib.rectangle(int(face.left() * frame_resize_scale),
                                   int(face.top() * frame_resize_scale),
                                   int(face.right() * frame_resize_scale),
                                   int(face.bottom() * frame_resize_scale))
    
          # Line 100, now we are passing two parameters in the predictor () method. One is imDlib which is original frame, as we converted this
          # to BGR to RGB before and second parameters is newRectValues which has (x, y, w, h) rectangle values which detect face on frame. So
          # now to find face landmarks by providing reactangle for each face.
          shape = shapePredictor(imageDlib, newRectValues)
          # Draw facial landmarks
          faceLandmarkPoints(image, shape)
    
        # Put framePerSecond on the ouput screen at which we are processing camera feed
        cv2.putText(image, "{0:.2f}-framePerSecond".format(framePerSecond), (50, size[0]-50), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 4)
        # Display it all on the screen
        cv2.imshow(windowName, image)
        # Wait for keypress
        key = cv2.waitKey(1) & 0xFF
    
        # Stop the program.
        if key==27:  # ESC
          # If ESC is pressed, exit.
          sys.exit()
    
        # increment frame counter
        count = count + 1
        # calculate framePerSecond at an interval of 100 frames
        if (count == 100):
          time = (cv2.getTickCount() - time)/cv2.getTickFrequency()
          framePerSecond = 100.0/time
          count = 0
    
      #cv2.destroyAllindows() will destroy all windows which we created till now. If you want to destroy any particular window then
      # we have to use the cv2.destroyWindow() and pass the exact window name as argument inside of this function.
      cv2.destroyAllWindows()
    
      # this is basically to release the device which was used by the program, if it not release then other device not able to use that device and it will
      # raise errors.
      cameraObject.release()
    
    except Exception as e:
      print(e)

    Run the code:

    Optimization of Facial Landmark Detection (Making it Fast)

    Conclusion:

    So in this blog, we try to explore concepts which can help us to detect our face landmark *FAST*. We learn the concepts of the skipping frame and frame resize how they can help us to achieve fast landmark detection. So through the above optimization concepts, we can able to achieve 65+ fps which is more than enough. In my system, we getting 24 -35 fps as there are limited recording concepts. The fps number which we are getting that depends upon the time needed to fetch the frame from web or video.

    You may also like:

    Author:
    I am an Artificial Intelligence Engineer. Doing research work in Virtual Reality and Augmented Reality. I would like to write article or share my work with others apart from my professional life.
    OpenCV DlibPythonFace Detection
    IF YOU LIKE IT, THEN SHARE IT

    RELATED POSTS