Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, face recognition.

600 papers with code • 23 benchmarks • 64 datasets

Facial Recognition is the task of making a positive identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces.

The state of the art tables for this task are contained mainly in the consistent parts of the task : the face verification and face identification tasks.

( Image credit: Face Verification )

face recognition thesis paper

Benchmarks Add a Result

--> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
GhostFaceNetV2-1 (MS1MV3)
GhostFaceNetV2-1
MS1MV2, R100, SFace
Fine-tuned ArcFace
Fine-tuned ArcFace
ArcFace+CSFM
PIC - QMagFace
Prodpoly
Prodpoly
PIC - MagFace
PIC - ArcFace
FaceNet+Adaptive Threshold
FaceNet+Adaptive Threshold
FaceNet+Adaptive Threshold
Model with Up Convolution + DoG Filter (Aligned)
Model with Up Convolution + DoG Filter
GhostFaceNetV2-1
Model with Up Convolution + DoG Filter
GhostFaceNetV2-1
Multi-task
FaceTransformer+OctupletLoss
Partial FC
MCN

face recognition thesis paper

Most implemented papers

Facenet: a unified embedding for face recognition and clustering.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

face recognition thesis paper

Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability.

VGGFace2: A dataset for recognising faces across pose and age

The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.

SphereFace: Deep Hypersphere Embedding for Face Recognition

This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space.

A Light CNN for Deep Face Representation with Noisy Labels

This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels.

Learning Face Representation from Scratch

The current situation in the field of face recognition is that data is more important than algorithm.

Circle Loss: A Unified Perspective of Pair Similarity Optimization

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base.

DeepID3: Face Recognition with Very Deep Neural Networks

Very deep neural networks recently achieved great success on general object recognition because of their superb learning capacity.

Can we still avoid automatic face detection?

In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition?

Student Attendance Monitoring System Using Face Recognition

7 Pages Posted: 24 May 2021

E CHARAN SAI

Jain University, Faculty of Engineering & Technology, School of Engineering and Technology

SHAIK ALTHAF HUSSAIN

Amara shyam.

Date Written: May 22, 2021

There is no reason that a critical educational practise like attendance should be viewed in the old, tedious manner in this age of rapidly evolving new technologies. In the conventional method, it is difficult to manage large groups of students in a classroom. Since it takes time and has a high risk of error when entering data into a system, it is not recommended. Real-Time Face Recognition is a practical method for dealing with a large number of students' attendance on a daily basis. Many algorithms and techniques have been developed to improve face recognition performance, but our proposed model employs the Haarcascade classifier to determine the to determine the positive and negative characteristics of the face, as well as the LBPH (Local binary pattern histogram) algorithm for face recognition, all of which are implemented in Python and the OpenCV library. For user interface purposes, we use the tkinter GUI interface.

Keywords: Local Binary Pattern Histogram(LBPH), Face Detection, Face Recognition, Haarcascade Classifier, Python, Student Attendance.

Suggested Citation: Suggested Citation

E CHARAN SAI (Contact Author)

Jain university, faculty of engineering & technology, school of engineering and technology ( email ).

Bangalore, 572112 India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, artificial intelligence ejournal.

Subscribe to this fee journal for more curated articles on this topic

Computation Theory eJournal

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sensors-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Face recognition systems: a survey.

face recognition thesis paper

1. Introduction

  • We first introduced face recognition as a biometric technique.
  • We presented the state of the art of the existing face recognition techniques classified into three approaches: local, holistic, and hybrid.
  • The surveyed approaches were summarized and compared under different conditions.
  • We presented the most popular face databases used to test these approaches.
  • We highlighted some new promising research directions.

2. Face Recognition Systems Survey

2.1. essential steps of face recognition systems.

  • Face Detection : The face recognition system begins first with the localization of the human faces in a particular image. The purpose of this step is to determine if the input image contains human faces or not. The variations of illumination and facial expression can prevent proper face detection. In order to facilitate the design of a further face recognition system and make it more robust, pre-processing steps are performed. Many techniques are used to detect and locate the human face image, for example, Viola–Jones detector [ 24 , 25 ], histogram of oriented gradient (HOG) [ 13 , 26 ], and principal component analysis (PCA) [ 27 , 28 ]. Also, the face detection step can be used for video and image classification, object detection [ 29 ], region-of-interest detection [ 30 ], and so on.
  • Feature Extraction : The main function of this step is to extract the features of the face images detected in the detection step. This step represents a face with a set of features vector called a “signature” that describes the prominent features of the face image such as mouth, nose, and eyes with their geometry distribution [ 31 , 32 ]. Each face is characterized by its structure, size, and shape, which allow it to be identified. Several techniques involve extracting the shape of the mouth, eyes, or nose to identify the face using the size and distance [ 3 ]. HOG [ 33 ], Eigenface [ 34 ], independent component analysis (ICA), linear discriminant analysis (LDA) [ 27 , 35 ], scale-invariant feature transform (SIFT) [ 23 ], gabor filter, local phase quantization (LPQ) [ 36 ], Haar wavelets, Fourier transforms [ 31 ], and local binary pattern (LBP) [ 3 , 10 ] techniques are widely used to extract the face features.
  • Face Recognition : This step considers the features extracted from the background during the feature extraction step and compares it with known faces stored in a specific database. There are two general applications of face recognition, one is called identification and another one is called verification. During the identification step, a test face is compared with a set of faces aiming to find the most likely match. During the identification step, a test face is compared with a known face in the database in order to make the acceptance or rejection decision [ 7 , 19 ]. Correlation filters (CFs) [ 18 , 37 , 38 ], convolutional neural network (CNN) [ 39 ], and also k-nearest neighbor (K-NN) [ 40 ] are known to effectively address this task.

2.2. Classification of Face Recognition Systems

3. local approaches, 3.1. local appearance-based techniques.

  • Local binary pattern (LBP) and it’s variant: LBP is a great general texture technique used to extract features from any object [ 16 ]. It has widely performed in many applications such as face recognition [ 3 ], facial expression recognition, texture segmentation, and texture classification. The LBP technique first divides the facial image into spatial arrays. Next, within each array square, a 3 × 3 pixel matrix ( p 1 … … p 8 ) is mapped across the square. The pixel of this matrix is a threshold with the value of the center pixel ( p 0 ) (i.e., use the intensity value of the center pixel i ( p 0 ) as a reference for thresholding) to produce the binary code. If a neighbor pixel’s value is lower than the center pixel value, it is given a zero; otherwise, it is given one. The binary code contains information about the local texture. Finally, for each array square, a histogram of these codes is built, and the histograms are concatenated to form the feature vector. The LBP is defined in a matrix of size 3 × 3, as shown in Equation (1). LBP = ∑ p = 1 8 2 p s ( i 0 − i p ) ,      w i t h   s ( x ) = { 1 x ≥ 0 0 x < 0 , (1) where i 0 and i p are the intensity value of the center pixel and neighborhood pixels, respectively. Figure 3 illustrates the procedure of the LBP technique. Khoi et al. [ 20 ] propose a fast face recognition system based on LBP, pyramid of local binary pattern (PLBP), and rotation invariant local binary pattern (RI-LBP). Xi et al. [ 15 ] have introduced a new unsupervised deep learning-based technique, called local binary pattern network (LBPNet), to extract hierarchical representations of data. The LBPNet maintains the same topology as the convolutional neural network (CNN). The experimental results obtained using the public benchmarks (i.e., LFW and FERET) have shown that LBPNet is comparable to other unsupervised techniques. Laure et al. [ 40 ] have implemented a method that helps to solve face recognition issues with large variations of parameters such as expression, illumination, and different poses. This method is based on two techniques: LBP and K-NN techniques. Owing to its invariance to the rotation of the target image, LBP become one of the important techniques used for face recognition. Bonnen et al. [ 42 ] proposed a variant of the LBP technique named “multiscale local binary pattern (MLBP)” for features’ extraction. Another LBP extension is the local ternary pattern (LTP) technique [ 43 ], which is less sensitive to the noise than the original LBP technique. This technique uses three steps to compute the differences between the neighboring ones and the central pixel. Hussain et al. [ 36 ] develop a local quantized pattern (LQP) technique for face representation. LQP is a generalization of local pattern features and is intrinsically robust to illumination conditions. The LQP features use the disk layout to sample pixels from the local neighborhood and obtain a pair of binary codes using ternary split coding. These codes are quantized, with each one using a separately learned codebook.
  • Histogram of oriented gradients (HOG) [ 44 ]: The HOG is one of the best descriptors used for shape and edge description. The HOG technique can describe the face shape using the distribution of edge direction or light intensity gradient. The process of this technique done by sharing the whole face image into cells (small region or area); a histogram of pixel edge direction or direction gradients is generated of each cell; and, finally, the histograms of the whole cells are combined to extract the feature of the face image. The feature vector computation by the HOG descriptor proceeds as follows [ 10 , 13 , 26 , 45 ]: firstly, divide the local image into regions called cells, and then calculate the amplitude of the first-order gradients of each cell in both the horizontal and vertical direction. The most common method is to apply a 1D mask, [–1 0 1]. G x ( x ,   y ) = I ( x + 1 ,   y ) − I ( x − 1 ,   y ) , (2) G y ( x ,   y ) = I ( x ,   y + 1 ) − I ( x ,   y − 1 ) , (3) where I ( x ,   y ) is the pixel value of the point ( x ,   y ) and G x ( x ,   y ) and G y ( x ,   y ) denote the horizontal gradient amplitude and the vertical gradient amplitude, respectively. The magnitude of the gradient and the orientation of each pixel ( x , y ) are computed as follows: G ( x ,   y ) = G x ( x ,   y ) 2 + G y ( x ,   y ) 2 , (4) θ ( x ,   y ) = tan − 1 ( G y ( x ,   y ) G x ( x ,   y ) ) . (5) The magnitude of the gradient and the orientation of each pixel in the cell are voted in nine bins with the tri-linear interpolation. The histograms of each cell are generated pixel based on direction gradients and, finally, the histograms of the whole cells are combined to extract the feature of the face image. Karaaba et al. [ 44 ] proposed a combination of different histograms of oriented gradients (HOG) to perform a robust face recognition system. This technique is named “multi-HOG”. The authors create a vector of distances between the target and the reference face images for identification. Arigbabu et al. [ 46 ] proposed a novel face recognition system based on the Laplacian filter and the pyramid histogram of gradient (PHOG) descriptor. In addition, to investigate the face recognition problem, support vector machine (SVM) is used with different kernel functions.
  • Correlation filters: Face recognition systems based on the correlation filter (CF) have given good results in terms of robustness, location accuracy, efficiency, and discrimination. In the field of facial recognition, the correlation techniques have attracted great interest since the first use of an optical correlator [ 47 ]. These techniques provide the following advantages: high ability for discrimination, desired noise robustness, shift-invariance, and inherent parallelism. On the basis of these advantages, many optoelectronic hybrid solutions of correlation filters (CFs) have been introduced such as the joint transform correlator (JTC) [ 48 ] and VanderLugt correlator (VLC) [ 47 ] techniques. The purpose of these techniques is to calculate the degree of similarity between target and reference images. The decision is taken by the detection of a correlation peak. Both techniques (VLC and JTC) are based on the “ 4 f ” optical configuration [ 37 ]. This configuration is created by two convergent lenses ( Figure 4 ). The face image F is processed by the fast Fourier transform (FFT) based on the first lens in the Fourier plane S F . In this Fourier plane, a specific filter P is applied (for example, the phase-only filter (POF) filter [ 2 ]) using optoelectronic interfaces. Finally, to obtain the filtered face image F ′ (or the correlation plane), the inverse FFT (IFFT) is made with the second lens in the output plane. For example, the VLC technique is done by two cascade Fourier transform structures realized by two lenses [ 4 ], as presented in Figure 5 . The VLC technique is presented as follows: firstly, a 2D-FFT is applied to the target image to get a target spectrum S . After that, a multiplication between the target spectrum and the filter obtain with the 2D-FFT of a reference image is affected, and this result is placed in the Fourier plane. Next, it provides the correlation result recorded on the correlation plane, where this multiplication is affected by inverse FF. The correlation result, described by the peak intensity, is used to determine the similarity degree between the target and reference images. C = F F T − 1 { S ∗ ∘ P O F } , (6) where F F T − 1 stands for the inverse fast FT (FFT) operation, * represents the conjugate operation, and ∘ denotes the element-wise array multiplication. To enhance the matching process, Horner and Gianino [ 49 ] proposed a phase-only filter (POF). The POF filter can produce correlation peaks marked with enhanced discrimination capability. The POF is an optimized filter defined as follows: H P O F ( u , v ) = S ∗ ( u , v ) | S ( u , v ) | , (7) where S ∗ ( u , v ) is the complex conjugate of the 2D-FFT of the reference image. To evaluate the decision, the peak to correlation energy (PCE) is defined as the energy in the correlation peaks’ intensity normalized to the overall energy of the correlation plane. P C E = ∑ i , j N E p e a k ( i , j ) ∑ i , j M E c o r r e l a t i o n − p l a n e ( i , j ) , (8) where i , j are the coefficient coordinates; M and N are the size of the correlation plane and the size of the peak correlation spot, respectively; E p e a k is the energy in the correlation peaks; and E c o r r e l a t i o n − p l a n e is the overall energy of the correlation plane. Correlation techniques are widely applied in recognition and identification applications [ 4 , 37 , 50 , 51 , 52 , 53 ]. For example, in the work of [ 4 ], the authors presented the efficiency performances of the VLC technique based on the “4f” configuration for identification using GPU Nvidia Geforce 8400 GS. The POF filter is used for the decision. Another important work in this area of research is presented by Leonard et al. [ 50 ], which presented good performance and the simplicity of the correlation filters for the field of face recognition. In addition, many specific filters such as POF, BPOF, Ad, IF, and so on are used to select the best filter based on its sensitivity to the rotation, scale, and noise. Napoléon et al. [ 3 ] introduced a novel system for identification and verification fields based on an optimized 3D modeling under different illumination conditions, which allows reconstructing faces in different poses. In particular, to deform the synthetic model, an active shape model for detecting a set of key points on the face is proposed in Figure 6 . The VanderLugt correlator is proposed to perform the identification and the LBP descriptor is used to optimize the performances of a correlation technique under different illumination conditions. The experiments are performed on the Pointing Head Pose Image Database (PHPID) database with an elevation ranging from −30° to +30°.

3.2. Key-Points-Based Techniques

  • Scale invariant feature transform (SIFT) [ 56 , 57 ]: SIFT is an algorithm used to detect and describe the local features of an image. This algorithm is widely used to link two images by their local descriptors, which contain information to make a match between them. The main idea of the SIFT descriptor is to convert the image into a representation composed of points of interest. These points contain the characteristic information of the face image. SIFT presents invariance to scale and rotation. It is commonly used today and fast, which is essential in real-time applications, but one of its disadvantages is the time of matching of the critical points. The algorithm is realized in four steps: (1) detection of the maximum and minimum points in the space-scale, (2) location of characteristic points, (3) assignment of orientation, and (4) a descriptor of the characteristic point. A framework to detect the key-points based on the SIFT descriptor was proposed by L. Lenc et al. [ 56 ], where they use the SIFT technique in combination with a Kepenekci approach for the face recognition.
  • Speeded-up robust features (SURF) [ 29 , 57 ]: the SURF technique is inspired by SIFT, but uses wavelets and an approximation of the Hessian determinant to achieve better performance [ 29 ]. SURF is a detector and descriptor that claims to achieve the same, or even better, results in terms of repeatability, distinction, and robustness compared with the SIFT descriptor. The main advantage of SURF is the execution time, which is less than that used by the SIFT descriptor. Besides, the SIFT descriptor is more adapted to describe faces affected by illumination conditions, scaling, translation, and rotation [ 57 ]. To detect feature points, SURF seeks to find the maximum of an approximation of the Hessian matrix using integral images to dramatically reduce the processing computational time. Figure 7 shows an example of SURF descriptor for face recognition using AR face datasets [ 58 ].
  • Binary robust independent elementary features (BRIEF) [ 30 , 57 ]: BRIEF is a binary descriptor that is simple and fast to compute. This descriptor is based on the differences between the pixel intensity that are similar to the family of binary descriptors such as binary robust invariant scalable (BRISK) and fast retina keypoint (FREAK) in terms of evaluation. To reduce noise, the BRIEF descriptor smoothens the image patches. After that, the differences between the pixel intensity are used to represent the descriptor. This descriptor has achieved the best performance and accuracy in pattern recognition.
  • Fast retina keypoint (FREAK) [ 57 , 59 ]: the FREAK descriptor proposed by Alahi et al. [ 59 ] uses a retinal sampling circular grid. This descriptor uses 43 sampling patterns based on retinal receptive fields that are shown in Figure 8 . To extract a binary descriptor, these 43 receptive fields are sampled by decreasing factors as the distance from the thousand potential pairs to a patch’s center yields. Each pair is smoothed with Gaussian functions. Finally, the binary descriptors are represented by setting a threshold and considering the sign of differences between pairs.

3.3. Summary of Local Approaches

4. holistic approach, 4.1. linear techniques.

  • Eigenface [ 34 ] and principal component analysis (PCA) [ 27 , 62 ]: Eigenfaces is one of the popular methods of holistic approaches used to extract features points of the face image. This approach is based on the principal component analysis (PCA) technique. The principal components created by the PCA technique are used as Eigenfaces or face templates. The PCA technique transforms a number of possibly correlated variables into a small number of incorrect variables called “principal components”. The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to the smaller intrinsic dimensionality of feature space (independent variables), which are needed to describe the data economically. Figure 9 shows how the face can be represented by a small number of features. PCA calculates the Eigenvectors of the covariance matrix, and projects the original data onto a lower dimensional feature space, which are defined by Eigenvectors with large Eigenvalues. PCA has been used in face representation and recognition, where the Eigenvectors calculated are referred to as Eigenfaces (as shown in Figure 10 ). An image may also be considering the vector of dimension M × N , so that a typical image of size 4 × 4 becomes a vector of dimension 16. Let the training set of images be { X 1 , X 2 ,   X 3 …   X N } . The average face of the set is defined by the following: X ¯ = 1 N ∑ i = 1 N X   i . (9) Calculate the estimate covariance matrix to represent the scatter degree of all feature vectors related to the average vector. The covariance matrix Q is defined by the following: Q = 1 N ∑ i = 1 N ( X ¯ − X i ) ( X ¯ − X i ) T . (10) The Eigenvectors and corresponding Eigen-values are computed using C V = λ V ,       ( V ϵ R n ,   V ≠ 0 ) , (11) where V is the set of eigenvectors matrix Q associated with its eigenvalue λ . Project all the training images of i t h person to the corresponding Eigen-subspace: y k i = w T    ( x i ) ,       ( i = 1 ,   2 ,   3   …   N ) , (12) where the y k i are the projections of x and are called the principal components, also known as eigenfaces. The face images are represented as a linear combination of these vectors’ “principal components”. In order to extract facial features, PCA and LDA are two different feature extraction algorithms that are used. Wavelet fusion and neural networks are applied to classify facial features. The ORL database is used for evaluation. Figure 10 shows the first five Eigenfaces constructed from the ORL database [ 63 ].
  • Fisherface and linear discriminative analysis (LDA) [ 64 , 65 ]: The Fisherface method is based on the same principle of similarity as the Eigenfaces method. The objective of this method is to reduce the high dimensional image space based on the linear discriminant analysis (LDA) technique instead of the PCA technique. The LDA technique is commonly used for dimensionality reduction and face recognition [ 66 ]. PCA is an unsupervised technique, while LDA is a supervised learning technique and uses the data information. For all samples of all classes, the within-class scatter matrix S W and the between-class scatter matrix S B are defined as follows: S B = ∑ I = 1 C M i ( x i − μ ) ( x i − μ ) T , (13) S w = ∑ I = 1 C ∑ x k ϵ X i M i ( x k − μ ) ( x k − μ ) T , (14) where μ is the mean vector of samples belonging to class i , X i represents the set of samples belonging to class i with x k being the number image of that class, c is the number of distinct classes, and M i is the number of training samples in class i . S B describes the scatter of features around the overall mean for all face classes and S w describes the scatter of features around the mean of each face class. The goal is to maximize the ratio d e t | S B | / d e t | S w |, in other words, minimizing S w while maximiz ing   S B . Figure 11 shows the first five Eigenfaces and Fisherfaces obtained from the ORL database [ 63 ].
  • Independent component analysis (ICA) [ 35 ]: The ICA technique is used for the calculation of the basic vectors of a given space. The goal of this technique is to perform a linear transformation in order to reduce the statistical dependence between the different basic vectors, which allows the analysis of independent components. It is determined that they are not orthogonal to each other. In addition, the acquisition of images from different sources is sought in uncorrelated variables, which makes it possible to obtain greater efficiency, because ICA acquires images within statistically independent variables.
  • Improvements of the PCA, LDA, and ICA techniques: To improve the linear subspace techniques, many types of research are developed. Z. Cui et al. [ 67 ] proposed a new spatial face region descriptor (SFRD) method to extract the face region, and to deal with noise variation. This method is described as follows: divide each face image in many spatial regions, and extract token-frequency (TF) features from each region by sum-pooling the reconstruction coefficients over the patches within each region. Finally, extract the SFRD for face images by applying a variant of the PCA technique called “whitened principal component analysis (WPCA)” to reduce the feature dimension and remove the noise in the leading eigenvectors. Besides, the authors in [ 68 ] proposed a variant of the LDA called probabilistic linear discriminant analysis (PLDA) to seek directions in space that have maximum discriminability, and are hence most suitable for both face recognition and frontal face recognition under varying pose.
  • Gabor filters: Gabor filters are spatial sinusoids located by a Gaussian window that allows for extracting the features from images by selecting their frequency, orientation, and scale. To enhance the performance under unconstrained environments for face recognition, Gabor filters are transformed according to the shape and pose to extract the feature vectors of face image combined with the PCA in the work of [ 69 ]. The PCA is applied to the Gabor features to remove the redundancies and to get the best face images description. Finally, the cosine metric is used to evaluate the similarity.
  • Frequency domain analysis [ 70 , 71 ]: Finally, the analysis techniques in the frequency domain offer a representation of the human face as a function of low-frequency components that present high energy. The discrete Fourier transform (DFT), discrete cosine transform (DCT), or discrete wavelet transform (DWT) techniques are independent of the data, and thus do not require training.
  • Discrete wavelet transform (DWT): Another linear technique used for face recognition. In the work of [ 70 ], the authors used a two-dimensional discrete wavelet transform (2D-DWT) method for face recognition using a new patch strategy. A non-uniform patch strategy for the top-level’s low-frequency sub-band is proposed by using an integral projection technique for two top-level high-frequency sub-bands of 2D-DWT based on the average image of all training samples. This patch strategy is better for retaining the integrity of local information, and is more suitable to reflect the structure feature of the face image. When constructing the patching strategy using the testing and training samples, the decision is performed using the neighbor classifier. Many databases are used to evaluate this method, including Labeled Faces in Wild (LFW), Extended Yale B, Face Recognition Technology (FERET), and AR.
  • Discrete cosine transform (DCT) [ 71 ] can be used for global and local face recognition systems. DCT is a transformation that represents a finite sequence of data as the sum of a series of cosine functions oscillating at different frequencies. This technique is widely used in face recognition systems [ 71 ], from audio and image compression to spectral methods for the numerical resolution of differential equations. The required steps to implement the DCT technique are presented as follows.
DCT Algorithm
      where , and        

4.2. Nonlinear Techniques

Kernel PCA Algorithm
using kernel function: . and normalize with the function: . using kernel function:
  • Kernel linear discriminant analysis (KDA) [ 73 ]: the KLDA technique is a kernel extension of the linear LDA technique, in the same kernel extension of PCA. Arashloo et al. [ 73 ] proposed a nonlinear binary class-specific kernel discriminant analysis classifier (CS-KDA) based on the spectral regression kernel discriminant analysis. Other nonlinear techniques have also been used in the context of facial recognition:
  • Gabor-KLDA [ 74 ].
  • Evolutionary weighted principal component analysis (EWPCA) [ 75 ].
  • Kernelized maximum average margin criterion (KMAMC), SVM, and kernel Fisher discriminant analysis (KFD) [ 76 ].
  • Wavelet transform (WT), radon transform (RT), and cellular neural networks (CNN) [ 77 ].
  • Joint transform correlator-based two-layer neural network [ 78 ].
  • Kernel Fisher discriminant analysis (KFD) and KPCA [ 79 ].
  • Locally linear embedding (LLE) and LDA [ 80 ].
  • Nonlinear locality preserving with deep networks [ 81 ].
  • Nonlinear DCT and kernel discriminative common vector (KDCV) [ 82 ].

4.3. Summary of Holistic Approaches

5. hybrid approach, 5.1. technique presentation.

  • Gabor wavelet and linear discriminant analysis (GW-LDA) [ 91 ]: Fathima et al. [ 91 ] proposed a hybrid approach combining Gabor wavelet and linear discriminant analysis (HGWLDA) for face recognition. The grayscale face image is approximated and reduced in dimension. The authors have convolved the grayscale face image with a bank of Gabor filters with varying orientations and scales. After that, a subspace technique 2D-LDA is used to maximize the inter-class space and reduce the intra-class space. To classify and recognize the test face image, the k-nearest neighbour (k-NN) classifier is used. The recognition task is done by comparing the test face image feature with each of the training set features. The experimental results show the robustness of this approach in different lighting conditions.
  • Over-complete LBP (OCLBP), LDA, and within class covariance normalization (WCCN): Barkan et al. [ 92 ] proposed a new representation of face image based over-complete LBP (OCLBP). This representation is a multi-scale modified version of the LBP technique. The LDA technique is performed to reduce the high dimensionality representations. Finally, the within class covariance normalization (WCCN) is the metric learning technique used for face recognition.
  • Advanced correlation filters and Walsh LBP (WLBP): Juefei et al. [ 93 ] implemented a single-sample periocular-based alignment-robust face recognition technique based on high-dimensional Walsh LBP (WLBP). This technique utilizes only one sample per subject class and generates new face images under a wide range of 3D rotations using the 3D generic elastic model, which is both accurate and computationally inexpensive. The LFW database is used for evaluation, and the proposed method outperformed the state-of-the-art algorithms under four evaluation protocols with a high accuracy of 89.69%.
  • Multi-sub-region-based correlation filter bank (MS-CFB): Yan et al. [ 94 ] propose an effective feature extraction technique for robust face recognition, named multi-sub-region-based correlation filter bank (MS-CFB). MS-CFB extracts the local features independently for each face sub-region. After that, the different face sub-regions are concatenated to give optimal overall correlation outputs. This technique reduces the complexity, achieves higher recognition rates, and provides a better feature representation for recognition compared with several state-of-the-art techniques on various public face databases.
  • SIFT features, Fisher vectors, and PCA: Simonyan et al. [ 64 ] have developed a novel method for face recognition based on the SIFT descriptor and Fisher vectors. The authors propose a discriminative dimensionality reduction owing to the high dimensionality of the Fisher vectors. After that, these vectors are projected into a low dimensional subspace with a linear projection. The objective of this methodology is to describe the image based on dense SIFT features and Fisher vectors encoding to achieve high performance on the challenging LFW dataset in both restricted and unrestricted settings.
  • CNNs and stacked auto-encoder (SAE) techniques: Ding et al. [ 95 ] proposed multimodal deep face representation (MM-DFR) framework based on convolutional neural networks (CNNs) technique from the original holistic face image, rendered frontal face by 3D face model (stand for holistic facial features and local facial features, respectively), and uniformly sampled image patches. The proposed MM-DFR framework has two steps: a CNNs technique is used to extract the features and a three-layer stacked auto-encoder (SAE) technique is employed to compress the high-dimensional deep feature into a compact face signature. The LFW database is used to evaluate the identification performance of MM-DFR. The flowchart of the proposed MM-DFR framework is shown in Figure 12 .
  • PCA and ANFIS: Sharma et al. [ 96 ] propose an efficient pose-invariant face recognition system based on PCA technique and ANFIS classifier. The PCA technique is employed to extract the features of an image, and the ANFIS classifier is developed for identification under a variety of pose conditions. The performance of the proposed system based on PCA–ANFIS is better than ICA–ANFIS and LDA–ANFIS for the face recognition task. The ORL database is used for evaluation.
  • DCT and PCA: Ojala et al. [ 97 ] develop a fast face recognition system based on DCT and PCA techniques. Genetic algorithm (GA) technique is used to extract facial features, which allows to remove irrelevant features and reduces the number of features. In addition, the DCT–PCA technique is used to extract the features and reduce the dimensionality. The minimum Euclidian distance (ED) as a measurement is used for the decision. Various face databases are used to demonstrate the effectiveness of this system.
  • PCA, SIFT, and iterative closest point (ICP): Mian et al. [ 98 ] present a multimodal (2D and 3D) face recognition system based on hybrid matching to achieve efficiency and robustness to facial expressions. The Hotelling transform is performed to automatically correct the pose of a 3D face using its texture. After that, in order to form a rejection classifier, a novel 3D spherical face representation (SFR) in conjunction with the SIFT descriptor is used, which provide efficient recognition in the case of large galleries by eliminating a large number of candidates’ faces. A modified iterative closest point (ICP) algorithm is used for the decision. This system is less sensitive and robust to facial expressions, which achieved a 98.6% verification rate and 96.1% identification rate on the complete FRGC v2 database.
  • PCA, local Gabor binary pattern histogram sequence (LGBPHS), and GABOR wavelets: Cho et al. [ 99 ] proposed a computationally efficient hybrid face recognition system that employs both holistic and local features. The PCA technique is used to reduce the dimensionality. After that, the local Gabor binary pattern histogram sequence (LGBPHS) technique is employed to realize the recognition stage, which proposed to reduce the complexity caused by the Gabor filters. The experimental results show a better recognition rate compared with the PCA and Gabor wavelet techniques under illumination variations. The Extended Yale Face Database B is used to demonstrate the effectiveness of this system.
  • PCA and Fisher linear discriminant (FLD) [ 100 , 101 ]: Sing et al. [ 101 ] propose a novel hybrid technique for face representation and recognition, which exploits both local and subspace features. In order to extract the local features, the whole image is divided into a sub-regions, while the global features are extracted directly from the whole image. After that, PCA and Fisher linear discriminant (FLD) techniques are introduced on the fused feature vector to reduce the dimensionality. The CMU-PIE, FERET, and AR face databases are used for the evaluation.
  • SPCA–KNN [ 102 ]: Kamencay et al. [ 102 ] develop a new face recognition method based on SIFT features, as well as PCA and KNN techniques. The Hessian–Laplace detector along with SPCA descriptor is performed to extract the local features. SPCA is introduced to identify the human face. KNN classifier is introduced to identify the closest human faces from the trained features. The results of the experiment have a recognition rate of 92% for the unsegmented ESSEX database and 96% for the segmented database (700 training images).
  • Convolution operations, LSTM recurrent units, and ELM classifier [ 103 ]: Sun et al. [ 103 ] propose a hybrid deep structure called CNN–LSTM–ELM in order to achieve sequential human activity recognition (HAR). Their proposed CNN–LSTM–ELM structure is evaluated using the OPPORTUNITY dataset, which contains 46,495 training samples and 9894 testing samples, and each sample is a sequence. The model training and testing runs on a GPU with 1536 cores, 1050 MHz clock speed, and 8 GB RAM. The flowchart of the proposed CNN–LSTM–ELM structure is shown in Figure 13 [ 103 ].

5.2. Summary of Hybrid Approaches

6. assessment of face recognition approaches, 6.1. measures of similarity or distances.

  • Peak-to-correlation energy (PCE) or peak-to-sidelobe ratio (PSR) [ 18 ]: The PCE was introduced in (8).
  • Euclidean distance [ 54 ]: The Euclidean distance is one of the most basic measures used to compute the direct distance between two points in a plane. If we have two points P 1 and P 2 , with the coordinates ( x 1 ,   y 1 ) and ( x 2 ,   y 2 ) , respectively, the calculation of the Euclidean distance between them would be as follows: d E ( P 1 ,   P 2   ) = ( x 2 − x 1 ) 2 + ( y 2 − y 1 ) 2 . (15) In general, the Euclidean distance between two points P = ( 1 ,   p 2 ,   … ,   p n ) and Q = ( q 1 ,   q 2 , …   ,   q n ) in the n-dimensional space would be defined by the following: d E ( P , Q ) = ∑ i n ( p i − q i ) 2 . (16)
  • Bhattacharyya distance [ 104 , 105 ]: The Bhattacharyya distance is a statistical measure that quantifies the similarity between two discrete or continuous probability distributions. This distance is particularly known for its low processing time and its low sensitivity to noise. For the probability distributions p and q defined on the same domain, the distance of Bhattacharyya is defined as follows: D B ( p ,   q ) = − l n ( B C ( p ,   q ) ) , (17) B C ( p ,   q ) = ∑ x ∈ X p ( x ) q ( x )   ( a ) ;   B C ( p ,   q ) = ∫ p ( x ) q ( x ) d x   ( b ) , (18) where B C is the Bhattacharyya coefficient, defined as Equation (18a) for discrete probability distributions and as Equation (18b) for continuous probability distributions. In both cases, 0 ≤ BC ≤ 1 and 0 ≤ DB ≤ ∞. In its simplest formulation, the Bhattacharyya distance between two classes that follow a normal distribution can be calculated from a mean ( μ ) and the variance ( σ 2 ): D B ( p ,   q ) = 1 4 l n ( 1 4 ( σ p 2 σ q 2 + σ q 2 σ p 2 + 2 ) ) + 1 4 ( ( μ p − μ q ) σ q 2 + σ p 2 ) . (19)
  • Chi-squared distance [ 106 ]: The Chi-squared ( X 2 ) distance was weighted by the value of the samples, which allows knowing the same relevance for sample differences with few occurrences as those with multiple occurrences. To compare two histograms S 1 = ( u 1 , …   …   … . u m ) and S 2 = ( w 1 , …   …   … . w m ) , the Chi-squared ( X 2 ) distance can be defined as follows: ( X 2 ) = D ( S 1 , S 2 ) = 1 2 ∑ i = 1 m ( u i − w i ) 2 u i + w i . (20)

6.2. Classifiers

  • Support vector machines (SVMs) [ 13 , 26 ]: The feature vectors extracted by any descriptor are classified by linear or nonlinear SVM. The SVM classifier may realize the separation of the classes with an optimal hyperplane. To determine the last, only the closest points of the total learning set should be used; these points are called support vectors ( Figure 14 ). There is an infinite number of hyperplanes capable of perfectly separating two classes, which implies to select a hyperplane that maximizes the minimal distance between the learning examples and the learning hyperplane (i.e., the distance between the support vectors and the hyperplane). This distance is called “margin”. The SVM classifier is used to calculate the optimal hyperplane that categorizes a set of labels training data in the correct class. The optimal hyperplane is solved as follows: D = { ( x i , y i ) | x i ∈ R n ,   y i ∈ { − 1 , 1 } ,   i = 1 … … l } . (21) Given that x i are the training features vectors and y i are the corresponding set of l (1 or −1) labels. An SVM tries to find a hyperplane to distinguish the samples with the smallest errors. The classification function is obtained by calculating the distance between the input vector and the hyperplane. w x i − b = C f , (22) where w and b are the parameters of the model. Shen et al. [ 108 ] proposed the Gabor filter to extract the face features and applied the SVM for classification. The proposed FaceNet method achieves a good record accuracy of 99.63% and 95.12% using the LFW YouTube Faces DB datasets, respectively.
  • k-nearest neighbor (k-NN) [ 17 , 91 ]: k-NN is an indolent algorithm because, in training, it saves little information, and thus does not build models of difference, for example, decision trees.
  • K-means [ 9 , 109 ]: It is called K-means because it represents each of the groups by the average (or weighted average) of its points, called the centroid. In the K-means algorithm, it is necessary to specify a priori the number of clusters k that one wishes to form in order to start the process.
  • Deep learning (DL): An automatic learning technique that uses neural network architectures. The term “deep” refers to the number of hidden layers in the neural network. While conventional neural networks have one layer, deep neural networks (DNN) contain several layers, as presented in Figure 15 .
  • Convolutional layer : sometimes called the feature extractor layer because features of the image are extracted within this layer. Convolution preserves the spatial relationship between pixels by learning image features using small squares of the input image. The input image is convoluted by employing a set of learnable neurons. This produces a feature map or activation map in the output image, after which the feature maps are fed as input data to the next convolutional layer. The convolutional layer also contains rectified linear unit (ReLU) activation to convert all negative value to zero. This makes it very computationally efficient, as few neurons are activated each time.
  • Pooling layer: used to reduce dimensions, with the aim of reducing processing times by retaining the most important information after convolution. This layer basically reduces the number of parameters and computation in the network, controlling over fitting by progressively reducing the spatial size of the network. There are two operations in this layer: average pooling and maximum pooling: - Average-pooling takes all the elements of the sub-matrix, calculates their average, and stores the value in the output matrix. - Max-pooling searches for the highest value found in the sub-matrix and saves it in the output matrix.
  • Fully-connected layer : in this layer, the neurons have a complete connection to all the activations from the previous layers. It connects neurons in one layer to neurons in another layer. It is used to classify images between different categories by training.

6.3. Databases Used

  • LFW (Labeled Faces in the Wild) database was created in October 2007. It contains 13,333 images of 5749 subjects, with 1680 subjects with at least two images and the rest with a single image. These face images were taken on the Internet, pre-processed, and localized by the Viola–Jones detector with a resolution of 250 × 250 pixels. Most of them are in color, although there are also some in grayscale and presented in JPG format and organized by folders.
  • FERET (Face Recognition Technology) database was created in 15 sessions in a semi-controlled environment between August 1993 and July 1996. It contains 1564 sets of images, with a total of 14,126 images. The duplicate series belong to subjects already present in the series of individual images, which were generally captured one day apart. Some images taken from the same subject vary overtime for a few years and can be used to treat facial changes that appear over time. The images have a depth of 24 bits, RGB, so they are color images, with a resolution of 512 × 768 pixels.
  • AR face database was created by Aleix Martínez and Robert Benavente in the computer vision center (CVC) of the Autonomous University of Barcelona in June 1998. It contains more than 4000 images of 126 subjects, including 70 men and 56 women. They were taken at the CVC under a controlled environment. The images were taken frontally to the subjects, with different facial expressions and three different lighting conditions, as well as several accessories: scarves, glasses, or sunglasses. Two imaging sessions were performed with the same subjects, 14 days apart. These images are a resolution of 576 × 768 pixels and a depth of 24 bits, under the RGB RAW format.
  • ORL Database of Faces was performed between April 1992 and April 1994 at the AT & T laboratory in Cambridge. It consists of a total of 10 images per subject, out of a total of 40 images. For some subjects, the images were taken at different times, with varying illumination and facial expressions: eyes open/closed, smiling/without a smile, as well as with or without glasses. The images were taken under a black homogeneous background, in a vertical position and frontally to the subject, with some small rotation. These are images with a resolution of 92 × 112 pixels in grayscale.
  • Extended Yale Face B database contains 16,128 images of 640 × 480 grayscale of 28 individuals under 9 poses and 64 different lighting conditions. It also includes a set of images made with the face of individuals only.
  • Pointing Head Pose Image Database (PHPID) is one of the most widely used for face recognition. It contains 2790 monocular face images of 15 persons with tilt angles from −90° to +90° and variations of pan. Every person has two series of 93 different poses (93 images). The face images were taken under different skin color and with or without glasses.

6.4. Comparison between Holistic, Local, and Hybrid Techniques

7. discussion about future directions and conclusions, 7.1. discussion.

  • Local approaches: use features in which the face described partially. For example, some system could consist of extracting local features such as the eyes, mouth, and nose. The features’ values are calculated from the lines or points that can be represented on the face image for the recognition step.
  • Holistic approaches: use features that globally describe the complete face as a model, including the background (although it is desirable to occupy the smallest possible surface).
  • Hybrid approaches: combine local and holistic approaches.
  • Three-dimensional face recognition: In 2D image-based techniques, some features are lost owing to the 3D structure of the face. Lighting and pose variations are two major unresolved problems of 2D face recognition. Recently, 3D facial recognition for facial recognition has been widely studied by the scientific community to overcome unresolved problems in 2D facial recognition and to achieve significantly higher accuracy by measuring geometry of rigid features on the face. For this reason, several recent systems based on 3D data have been developed [ 3 , 93 , 95 , 128 , 129 ].
  • Multimodal facial recognition: sensors have been developed in recent years with a proven ability to acquire not only two-dimensional texture information, but also facial shape, that is, three-dimensional information. For this reason, some recent studies have merged the two types of 2D and 3D information to take advantage of each of them and obtain a hybrid system that improves the recognition as the only modality [ 98 ].
  • Deep learning (DL): a very broad concept, which means that it has no exact definition, but studies [ 14 , 110 , 111 , 112 , 113 , 121 , 130 , 131 ] agree that DL includes a set of algorithms that attempt to model high level abstractions, by modeling multiple processing layers. This field of research began in the 1980s and is a branch of automatic learning where algorithms are used in the formation of deep neural networks (DNN) to achieve greater accuracy than other classical techniques. In recent progress, a point has been reached where DL performs better than people in some tasks, for example, to recognize objects in images.

7.2. Conclusions

Author contributions, conflicts of interest.

  • Liao, S.; Jain, A.K.; Li, S.Z. Partial face recognition: Alignment-free approach. IEEE Trans. Pattern Anal. Mach. Intell. 2012 , 35 , 1193–1205. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Jridi, M.; Napoléon, T.; Alfalou, A. One lens optical correlation: Application to face recognition. Appl. Opt. 2018 , 57 , 2087–2095. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Napoléon, T.; Alfalou, A. Pose invariant face recognition: 3D model from single photo. Opt. Lasers Eng. 2017 , 89 , 150–161. [ Google Scholar ] [ CrossRef ]
  • Ouerhani, Y.; Jridi, M.; Alfalou, A. Fast face recognition approach using a graphical processing unit “GPU”. In Proceedings of the 2010 IEEE International Conference on Imaging Systems and Techniques, Thessaloniki, Greece, 1–2 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 80–84. [ Google Scholar ]
  • Yang, W.; Wang, S.; Hu, J.; Zheng, G.; Valli, C. A fingerprint and finger-vein based cancelable multi-biometric system. Pattern Recognit. 2018 , 78 , 242–251. [ Google Scholar ] [ CrossRef ]
  • Patel, N.P.; Kale, A. Optimize Approach to Voice Recognition Using IoT. In Proceedings of the 2018 International Conference on Advances in Communication and Computing Technology (ICACCT), Sangamner, India, 8–9 February 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 251–256. [ Google Scholar ]
  • Wang, Q.; Alfalou, A.; Brosseau, C. New perspectives in face correlation research: A tutorial. Adv. Opt. Photonics 2017 , 9 , 1–78. [ Google Scholar ] [ CrossRef ]
  • Alfalou, A.; Brosseau, C.; Kaddah, W. Optimization of decision making for face recognition based on nonlinear correlation plane. Opt. Commun. 2015 , 343 , 22–27. [ Google Scholar ] [ CrossRef ]
  • Zhao, C.; Li, X.; Cang, Y. Bisecting k-means clustering based face recognition using block-based bag of words model. Opt. Int. J. Light Electron Opt. 2015 , 126 , 1761–1766. [ Google Scholar ] [ CrossRef ]
  • HajiRassouliha, A.; Gamage, T.P.B.; Parker, M.D.; Nash, M.P.; Taberner, A.J.; Nielsen, P.M. FPGA implementation of 2D cross-correlation for real-time 3D tracking of deformable surfaces. In Proceedings of the 2013 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013), Wellington, New Zealand, 27–29 November 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 352–357. [ Google Scholar ]
  • Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. A comparative study of CFs, LBP, HOG, SIFT, SURF, and BRIEF techniques for face recognition. In Pattern Recognition and Tracking XXIX ; International Society for Optics and Photonics; SPIE: Bellingham, WA, USA, 2018; Volume 10649, p. 106490M. [ Google Scholar ]
  • Dehai, Z.; Da, D.; Jin, L.; Qing, L. A pca-based face recognition method by applying fast fourier transform in pre-processing. In 3rd International Conference on Multimedia Technology (ICMT-13) ; Atlantis Press: Paris, France, 2013. [ Google Scholar ]
  • Ouerhani, Y.; Alfalou, A.; Brosseau, C. Road mark recognition using HOG-SVM and correlation. In Optics and Photonics for Information Processing XI ; International Society for Optics and Photonics; SPIE: Bellingham, WA, USA, 2017; Volume 10395, p. 103950Q. [ Google Scholar ]
  • Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017 , 234 , 11–26. [ Google Scholar ] [ CrossRef ]
  • Xi, M.; Chen, L.; Polajnar, D.; Tong, W. Local binary pattern network: A deep learning approach for face recognition. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3224–3228. [ Google Scholar ]
  • Ojala, T.; Pietikäinen, M.; Harwood, D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996 , 29 , 51–59. [ Google Scholar ] [ CrossRef ]
  • Gowda, H.D.S.; Kumar, G.H.; Imran, M. Multimodal Biometric Recognition System Based on Nonparametric Classifiers. Data Anal. Learn. 2018 , 43 , 269–278. [ Google Scholar ]
  • Ouerhani, Y.; Jridi, M.; Alfalou, A.; Brosseau, C. Optimized pre-processing input plane GPU implementation of an optical face recognition technique using a segmented phase only composite filter. Opt. Commun. 2013 , 289 , 33–44. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Mousa Pasandi, M.E. Face, Age and Gender Recognition Using Local Descriptors. Ph.D. Thesis, Université d’Ottawa/University of Ottawa, Ottawa, ON, Canada, 2014. [ Google Scholar ]
  • Khoi, P.; Thien, L.H.; Viet, V.H. Face Retrieval Based on Local Binary Pattern and Its Variants: A Comprehensive Study. Int. J. Adv. Comput. Sci. Appl. 2016 , 7 , 249–258. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Zeppelzauer, M. Automated detection of elephants in wildlife video. EURASIP J. Image Video Process. 2013 , 46 , 2013. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Parmar, D.N.; Mehta, B.B. Face recognition methods & applications. arXiv 2014 , arXiv:1403.0485. [ Google Scholar ]
  • Vinay, A.; Hebbar, D.; Shekhar, V.S.; Murthy, K.B.; Natarajan, S. Two novel detector-descriptor based approaches for face recognition using sift and surf. Procedia Comput. Sci. 2015 , 70 , 185–197. [ Google Scholar ]
  • Yang, H.; Wang, X.A. Cascade classifier for face detection. J. Algorithms Comput. Technol. 2016 , 10 , 187–197. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. [ Google Scholar ]
  • Rettkowski, J.; Boutros, A.; Göhringer, D. HW/SW Co-Design of the HOG algorithm on a Xilinx Zynq SoC. J. Parallel Distrib. Comput. 2017 , 109 , 50–62. [ Google Scholar ] [ CrossRef ]
  • Seo, H.J.; Milanfar, P. Face verification using the lark representation. IEEE Trans. Inf. Forensics Secur. 2011 , 6 , 1275–1286. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shah, J.H.; Sharif, M.; Raza, M.; Azeem, A. A Survey: Linear and Nonlinear PCA Based Face Recognition Techniques. Int. Arab J. Inf. Technol. 2013 , 10 , 536–545. [ Google Scholar ]
  • Du, G.; Su, F.; Cai, A. Face recognition using SURF features. In MIPPR 2009: Pattern Recognition and Computer Vision ; International Society for Optics and Photonics; SPIE: Bellingham, WA, USA, 2009; Volume 7496, p. 749628. [ Google Scholar ]
  • Calonder, M.; Lepetit, V.; Ozuysal, M.; Trzcinski, T.; Strecha, C.; Fua, P. BRIEF: Computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 2011 , 34 , 1281–1298. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Smach, F.; Miteran, J.; Atri, M.; Dubois, J.; Abid, M.; Gauthier, J.P. An FPGA-based accelerator for Fourier Descriptors computing for color object recognition using SVM. J. Real-Time Image Process. 2007 , 2 , 249–258. [ Google Scholar ] [ CrossRef ]
  • Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. A novel face detection approach using local binary pattern histogram and support vector machine. In Proceedings of the 2018 International Conference on Advanced Systems and Electric Technologies (IC_ASET), Hammamet, Tunisia, 22–25 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 28–33. [ Google Scholar ]
  • Wang, Q.; Xiong, D.; Alfalou, A.; Brosseau, C. Optical image authentication scheme using dual polarization decoding configuration. Opt. Lasers Eng. 2019 , 112 , 151–161. [ Google Scholar ] [ CrossRef ]
  • Turk, M.; Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci. 1991 , 3 , 71–86. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Annalakshmi, M.; Roomi, S.M.M.; Naveedh, A.S. A hybrid technique for gender classification with SLBP and HOG features. Clust. Comput. 2019 , 22 , 11–20. [ Google Scholar ] [ CrossRef ]
  • Hussain, S.U.; Napoléon, T.; Jurie, F. Face Recognition Using Local Quantized Patterns ; HAL: Bengaluru, India, 2012. [ Google Scholar ]
  • Alfalou, A.; Brosseau, C. Understanding Correlation Techniques for Face Recognition: From Basics to Applications. In Face Recognition ; Oravec, M., Ed.; IntechOpen: Rijeka, Croatia, 2010. [ Google Scholar ]
  • Napoléon, T.; Alfalou, A. Local binary patterns preprocessing for face identification/verification using the VanderLugt correlator. In Optical Pattern Recognition XXV ; International Society for Optics and Photonics; SPIE: Bellingham, WA, USA, 2014; Volume 9094, p. 909408. [ Google Scholar ]
  • Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [ Google Scholar ]
  • Kambi Beli, I.; Guo, C. Enhancing face identification using local binary patterns and k-nearest neighbors. J. Imaging 2017 , 3 , 37. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Benarab, D.; Napoléon, T.; Alfalou, A.; Verney, A.; Hellard, P. Optimized swimmer tracking system by a dynamic fusion of correlation and color histogram techniques. Opt. Commun. 2015 , 356 , 256–268. [ Google Scholar ] [ CrossRef ]
  • Bonnen, K.; Klare, B.F.; Jain, A.K. Component-based representation in automated face recognition. IEEE Trans. Inf. Forensics Secur. 2012 , 8 , 239–253. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ren, J.; Jiang, X.; Yuan, J. Relaxed local ternary pattern for face recognition. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 3680–3684. [ Google Scholar ]
  • Karaaba, M.; Surinta, O.; Schomaker, L.; Wiering, M.A. Robust face recognition by computing distances from multiple histograms of oriented gradients. In Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 203–209. [ Google Scholar ]
  • Huang, C.; Huang, J. A fast HOG descriptor using lookup table and integral image. arXiv 2017 , arXiv:1703.06256. [ Google Scholar ]
  • Arigbabu, O.A.; Ahmad, S.M.S.; Adnan, W.A.W.; Yussof, S.; Mahmood, S. Soft biometrics: Gender recognition from unconstrained face images using local feature descriptor. arXiv 2017 , arXiv:1702.02537. [ Google Scholar ]
  • Lugh, A.V. Signal detection by complex spatial filtering. IEEE Trans. Inf. Theory 1964 , 10 , 139. [ Google Scholar ]
  • Weaver, C.S.; Goodman, J.W. A technique for optically convolving two functions. Appl. Opt. 1966 , 5 , 1248–1249. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Horner, J.L.; Gianino, P.D. Phase-only matched filtering. Appl. Opt. 1984 , 23 , 812–816. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Leonard, I.; Alfalou, A.; Brosseau, C. Face recognition based on composite correlation filters: Analysis of their performances. In Face Recognition: Methods, Applications and Technology ; Nova Science Pub Inc.: London, UK, 2012. [ Google Scholar ]
  • Katz, P.; Aron, M.; Alfalou, A. A Face-Tracking System to Detect Falls in the Elderly ; SPIE Newsroom; SPIE: Bellingham, WA, USA, 2013. [ Google Scholar ]
  • Alfalou, A.; Brosseau, C.; Katz, P.; Alam, M.S. Decision optimization for face recognition based on an alternate correlation plane quantification metric. Opt. Lett. 2012 , 37 , 1562–1564. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Elbouz, M.; Bouzidi, F.; Alfalou, A.; Brosseau, C.; Leonard, I.; Benkelfat, B.E. Adapted all-numerical correlator for face recognition applications. In Optical Pattern Recognition XXIV ; International Society for Optics and Photonics; SPIE: Bellingham, WA, USA, 2013; Volume 8748, p. 874807. [ Google Scholar ]
  • Heflin, B.; Scheirer, W.; Boult, T.E. For your eyes only. In Proceedings of the 2012 IEEE Workshop on the Applications of Computer Vision (WACV), Breckenridge, CO, USA, 9–11 January 2012; pp. 193–200. [ Google Scholar ]
  • Zhu, X.; Liao, S.; Lei, Z.; Liu, R.; Li, S.Z. Feature correlation filter for face recognition. In Advances in Biometrics, Proceedings of the International Conference on Biometrics, Seoul, Korea, 27–29 August 2007 ; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4642, pp. 77–86. [ Google Scholar ]
  • Lenc, L.; Král, P. Automatic face recognition system based on the SIFT features. Comput. Electr. Eng. 2015 , 46 , 256–272. [ Google Scholar ] [ CrossRef ]
  • Işık, Ş. A comparative evaluation of well-known feature detectors and descriptors. Int. J. Appl. Math. Electron. Comput. 2014 , 3 , 1–6. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Mahier, J.; Hemery, B.; El-Abed, M.; El-Allam, M.; Bouhaddaoui, M.; Rosenberger, C. Computation evabio: A tool for performance evaluation in biometrics. Int. J. Autom. Identif. Technol. 2011 , 24 , hal-00984026. [ Google Scholar ]
  • Alahi, A.; Ortiz, R.; Vandergheynst, P. Freak: Fast retina keypoint. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 510–517. [ Google Scholar ]
  • Arashloo, S.R.; Kittler, J. Efficient processing of MRFs for unconstrained-pose face recognition. In Proceedings of the 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), Rlington, VA, USA, 29 September–2 October 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1–8. [ Google Scholar ]
  • Ghorbel, A.; Tajouri, I.; Aydi, W.; Masmoudi, N. A comparative study of GOM, uLBP, VLC and fractional Eigenfaces for face recognition. In Proceedings of the 2016 International Image Processing, Applications and Systems (IPAS), Hammamet, Tunisia, 5–7 November 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [ Google Scholar ]
  • Lima, A.; Zen, H.; Nankaku, Y.; Miyajima, C.; Tokuda, K.; Kitamura, T. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans. Inf. Syst. 2004 , 87 , 2802–2811. [ Google Scholar ]
  • Devi, B.J.; Veeranjaneyulu, N.; Kishore, K.V.K. A novel face recognition system based on combining eigenfaces with fisher faces using wavelets. Procedia Comput. Sci. 2010 , 2 , 44–51. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Simonyan, K.; Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Fisher vector faces in the wild. In Proceedings of the BMVC 2013—British Machine Vision Conference, Bristol, UK, 9–13 September 2013. [ Google Scholar ]
  • Li, B.; Ma, K.K. Fisherface vs. eigenface in the dual-tree complex wavelet domain. In Proceedings of the 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kyoto, Japan, 12–14 September 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 30–33. [ Google Scholar ]
  • Agarwal, R.; Jain, R.; Regunathan, R.; Kumar, C.P. Automatic Attendance System Using Face Recognition Technique. In Proceedings of the 2nd International Conference on Data Engineering and Communication Technology ; Springer: Singapore, 2019; pp. 525–533. [ Google Scholar ]
  • Cui, Z.; Li, W.; Xu, D.; Shan, S.; Chen, X. Fusing robust face region descriptors via multiple metric learning for face recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, Portland, OR, USA, 23–28 June 2013; pp. 3554–3561. [ Google Scholar ]
  • Prince, S.; Li, P.; Fu, Y.; Mohammed, U.; Elder, J. Probabilistic models for inference about identity. IEEE Trans. Pattern Anal. Mach. Intell. 2011 , 34 , 144–157. [ Google Scholar ]
  • Perlibakas, V. Face recognition using principal component analysis and log-gabor filters. arXiv 2006 , arXiv:cs/0605025. [ Google Scholar ]
  • Huang, Z.H.; Li, W.J.; Shang, J.; Wang, J.; Zhang, T. Non-uniform patch based face recognition via 2D-DWT. Image Vision Comput. 2015 , 37 , 12–19. [ Google Scholar ] [ CrossRef ]
  • Sufyanu, Z.; Mohamad, F.S.; Yusuf, A.A.; Mamat, M.B. Enhanced Face Recognition Using Discrete Cosine Transform. Eng. Lett. 2016 , 24 , 52–61. [ Google Scholar ]
  • Hoffmann, H. Kernel PCA for novelty detection. Pattern Recognit. 2007 , 40 , 863–874. [ Google Scholar ] [ CrossRef ]
  • Arashloo, S.R.; Kittler, J. Class-specific kernel fusion of multiple descriptors for face verification using multiscale binarised statistical image features. IEEE Trans. Inf. Forensics Secur. 2014 , 9 , 2100–2109. [ Google Scholar ] [ CrossRef ]
  • Vinay, A.; Shekhar, V.S.; Murthy, K.B.; Natarajan, S. Performance study of LDA and KFA for gabor based face recognition system. Procedia Comput. Sci. 2015 , 57 , 960–969. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Sivasathya, M.; Joans, S.M. Image Feature Extraction using Non Linear Principle Component Analysis. Procedia Eng. 2012 , 38 , 911–917. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Zhang, B.; Chen, X.; Shan, S.; Gao, W. Nonlinear face recognition based on maximum average margin criterion. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 554–559. [ Google Scholar ]
  • Vankayalapati, H.D.; Kyamakya, K. Nonlinear feature extraction approaches with application to face recognition over large databases. In Proceedings of the 2009 2nd International Workshop on Nonlinear Dynamics and Synchronization, Klagenfurt, Austria, 20–21 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 44–48. [ Google Scholar ]
  • Javidi, B.; Li, J.; Tang, Q. Optical implementation of neural networks for face recognition by the use of nonlinear joint transform correlators. Appl. Opt. 1995 , 34 , 3950–3962. [ Google Scholar ] [ CrossRef ]
  • Yang, J.; Frangi, A.F.; Yang, J.Y. A new kernel Fisher discriminant algorithm with application to face recognition. Neurocomputing 2004 , 56 , 415–421. [ Google Scholar ] [ CrossRef ]
  • Pang, Y.; Liu, Z.; Yu, N. A new nonlinear feature extraction method for face recognition. Neurocomputing 2006 , 69 , 949–953. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Fei, P.; Fan, X.; Li, H. Face recognition using nonlinear locality preserving with deep networks. In Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, Hunan, China, 19–21 August 2015; ACM: New York, NY, USA, 2015; p. 66. [ Google Scholar ]
  • Li, S.; Yao, Y.F.; Jing, X.Y.; Chang, H.; Gao, S.Q.; Zhang, D.; Yang, J.Y. Face recognition based on nonlinear DCT discriminant feature extraction using improved kernel DCV. IEICE Trans. Inf. Syst. 2009 , 92 , 2527–2530. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Khan, S.A.; Ishtiaq, M.; Nazir, M.; Shaheen, M. Face recognition under varying expressions and illumination using particle swarm optimization. J. Comput. Sci. 2018 , 28 , 94–100. [ Google Scholar ] [ CrossRef ]
  • Hafez, S.F.; Selim, M.M.; Zayed, H.H. 2d face recognition system based on selected gabor filters and linear discriminant analysis lda. arXiv 2015 , arXiv:1503.03741. [ Google Scholar ]
  • Shanbhag, S.S.; Bargi, S.; Manikantan, K.; Ramachandran, S. Face recognition using wavelet transforms-based feature extraction and spatial differentiation-based pre-processing. In Proceedings of the 2014 International Conference on Science Engineering and Management Research (ICSEMR), Chennai, India, 27–29 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–8. [ Google Scholar ]
  • Fan, J.; Chow, T.W. Exactly Robust Kernel Principal Component Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2019 . [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Vinay, A.; Cholin, A.S.; Bhat, A.D.; Murthy, K.B.; Natarajan, S. An Efficient ORB based Face Recognition framework for Human-Robot Interaction. Procedia Comput. Sci. 2018 , 133 , 913–923. [ Google Scholar ]
  • Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N. Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans. Neural Netw. 2003 , 14 , 117–126. [ Google Scholar ] [ PubMed ] [ Green Version ]
  • Yang, W.J.; Chen, Y.C.; Chung, P.C.; Yang, J.F. Multi-feature shape regression for face alignment. EURASIP J. Adv. Signal Process. 2018 , 2018 , 51. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ouanan, H.; Ouanan, M.; Aksasse, B. Non-linear dictionary representation of deep features for face recognition from a single sample per person. Procedia Comput. Sci. 2018 , 127 , 114–122. [ Google Scholar ] [ CrossRef ]
  • Fathima, A.A.; Ajitha, S.; Vaidehi, V.; Hemalatha, M.; Karthigaiveni, R.; Kumar, R. Hybrid approach for face recognition combining Gabor Wavelet and Linear Discriminant Analysis. In Proceedings of the 2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS), Bhubaneswar, India, 2–3 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 220–225. [ Google Scholar ]
  • Barkan, O.; Weill, J.; Wolf, L.; Aronowitz, H. Fast high dimensional vector multiplication face recognition. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1960–1967. [ Google Scholar ]
  • Juefei-Xu, F.; Luu, K.; Savvides, M. Spartans: Single-sample periocular-based alignment-robust recognition technique applied to non-frontal scenarios. IEEE Trans. Image Process. 2015 , 24 , 4780–4795. [ Google Scholar ] [ CrossRef ]
  • Yan, Y.; Wang, H.; Suter, D. Multi-subregion based correlation filter bank for robust face recognition. Pattern Recognit. 2014 , 47 , 3487–3501. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ding, C.; Tao, D. Robust face recognition via multimodal deep face representation. IEEE Trans. Multimed. 2015 , 17 , 2049–2058. [ Google Scholar ] [ CrossRef ]
  • Sharma, R.; Patterh, M.S. A new pose invariant face recognition system using PCA and ANFIS. Optik 2015 , 126 , 3483–3487. [ Google Scholar ] [ CrossRef ]
  • Moussa, M.; Hmila, M.; Douik, A. A Novel Face Recognition Approach Based on Genetic Algorithm Optimization. Stud. Inform. Control 2018 , 27 , 127–134. [ Google Scholar ] [ CrossRef ]
  • Mian, A.; Bennamoun, M.; Owens, R. An efficient multimodal 2D-3D hybrid approach to automatic face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2007 , 29 , 1927–1943. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cho, H.; Roberts, R.; Jung, B.; Choi, O.; Moon, S. An efficient hybrid face recognition algorithm using PCA and GABOR wavelets. Int. J. Adv. Robot. Syst. 2014 , 11 , 59. [ Google Scholar ] [ CrossRef ]
  • Guru, D.S.; Suraj, M.G.; Manjunath, S. Fusion of covariance matrices of PCA and FLD. Pattern Recognit. Lett. 2011 , 32 , 432–440. [ Google Scholar ] [ CrossRef ]
  • Sing, J.K.; Chowdhury, S.; Basu, D.K.; Nasipuri, M. An improved hybrid approach to face recognition by fusing local and global discriminant features. Int. J. Biom. 2012 , 4 , 144–164. [ Google Scholar ] [ CrossRef ]
  • Kamencay, P.; Zachariasova, M.; Hudec, R.; Jarina, R.; Benco, M.; Hlubik, J. A novel approach to face recognition using image segmentation based on spca-knn method. Radioengineering 2013 , 22 , 92–99. [ Google Scholar ]
  • Sun, J.; Fu, Y.; Li, S.; He, J.; Xu, C.; Tan, L. Sequential Human Activity Recognition Based on Deep Convolutional Network and Extreme Learning Machine Using Wearable Sensors. J. Sens. 2018 , 2018 , 10. [ Google Scholar ] [ CrossRef ]
  • Soltanpour, S.; Boufama, B.; Wu, Q.J. A survey of local feature methods for 3D face recognition. Pattern Recognit. 2017 , 72 , 391–406. [ Google Scholar ] [ CrossRef ]
  • Sharma, G.; ul Hussain, S.; Jurie, F. Local higher-order statistics (LHS) for texture categorization and facial analysis. In European Conference on Computer Vision ; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–12. [ Google Scholar ]
  • Zhang, J.; Marszałek, M.; Lazebnik, S.; Schmid, C. Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. Comput. Vis. 2007 , 73 , 213–238. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Leonard, I.; Alfalou, A.; Brosseau, C. Spectral optimized asymmetric segmented phase-only correlation filter. Appl. Opt. 2012 , 51 , 2638–2650. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shen, L.; Bai, L.; Ji, Z. A svm face recognition method based on optimized gabor features. In International Conference on Advances in Visual Information Systems ; Springer: Berlin/Heidelberg, Germany, 2007; pp. 165–174. [ Google Scholar ]
  • Pratima, D.; Nimmakanti, N. Pattern Recognition Algorithms for Cluster Identification Problem. Int. J. Comput. Sci. Inform. 2012 , 1 , 2231–5292. [ Google Scholar ]
  • Zhang, C.; Prasanna, V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017; ACM: New York, NY, USA, 2017; pp. 35–44. [ Google Scholar ]
  • Nguyen, D.T.; Pham, T.D.; Lee, M.B.; Park, K.R. Visible-Light Camera Sensor-Based Presentation Attack Detection for Face Recognition by Combining Spatial and Temporal Information. Sensors 2019 , 19 , 410. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the BMVC 2015—British Machine Vision Conference, Swansea, UK, 7–10 September.
  • Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision ; Springer: Berlin/Heidelberg, Germany, 2016; pp. 499–515. [ Google Scholar ]
  • Passalis, N.; Tefas, A. Spatial bag of features learning for large scale face image retrieval. In INNS Conference on Big Data ; Springer: Berlin/Heidelberg, Germany, 2016; pp. 8–17. [ Google Scholar ]
  • Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [ Google Scholar ]
  • Amato, G.; Falchi, F.; Gennaro, C.; Massoli, F.V.; Passalis, N.; Tefas, A.; Vairo, C. Face Verification and Recognition for Digital Forensics and Information Security. In Proceedings of the 2019 7th International Symposium on Digital Forensics and Security (ISDFS), Barcelos, Portugal, 10–12 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [ Google Scholar ]
  • Taigman, Y.; Yang, M.; Ranzato, M.A. Wolf, LDeepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, Washington, DC, USA, 23–28 June 2014; pp. 1701–1708. [ Google Scholar ]
  • Ma, Z.; Ding, Y.; Li, B.; Yuan, X. Deep CNNs with Robust LBP Guiding Pooling for Face Recognition. Sensors 2018 , 18 , 3876. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Koo, J.; Cho, S.; Baek, N.; Kim, M.; Park, K. CNN-Based Multimodal Human Recognition in Surveillance Environments. Sensors 2018 , 18 , 3040. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Cho, S.; Baek, N.; Kim, M.; Koo, J.; Kim, J.; Park, K. Detection in Nighttime Images Using Visible-Light Camera Sensors with Two-Step Faster Region-Based Convolutional Neural Network. Sensors 2018 , 18 , 2995. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Koshy, R.; Mahmood, A. Optimizing Deep CNN Architectures for Face Liveness Detection. Entropy 2019 , 21 , 423. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Elmahmudi, A.; Ugail, H. Deep face recognition using imperfect facial data. Future Gener. Comput. Syst. 2019 , 99 , 213–225. [ Google Scholar ] [ CrossRef ]
  • Seibold, C.; Samek, W.; Hilsmann, A.; Eisert, P. Accurate and robust neural networks for security related applications exampled by face morphing attacks. arXiv 2018 , arXiv:1806.04265. [ Google Scholar ]
  • Yim, J.; Jung, H.; Yoo, B.; Choi, C.; Park, D.; Kim, J. Rotating your face using multi-task deep neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 676–684. [ Google Scholar ]
  • Bajrami, X.; Gashi, B.; Murturi, I. Face recognition performance using linear discriminant analysis and deep neural networks. Int. J. Appl. Pattern Recognit. 2018 , 5 , 240–250. [ Google Scholar ] [ CrossRef ]
  • Gourier, N.; Hall, D.; Crowley, J.L. Estimating Face Orientation from Robust Detection of Salient Facial Structures. Available online: venus.inrialpes.fr/jlc/papers/Pointing04-Gourier.pdf (accessed on 15 December 2019).
  • Gonzalez-Sosa, E.; Fierrez, J.; Vera-Rodriguez, R.; Alonso-Fernandez, F. Facial soft biometrics for recognition in the wild: Recent works, annotation, and COTS evaluation. IEEE Trans. Inf. Forensics Secur. 2018 , 13 , 2001–2014. [ Google Scholar ] [ CrossRef ]
  • Boukamcha, H.; Hallek, M.; Smach, F.; Atri, M. Automatic landmark detection and 3D Face data extraction. J. Comput. Sci. 2017 , 21 , 340–348. [ Google Scholar ] [ CrossRef ]
  • Ouerhani, Y.; Jridi, M.; Alfalou, A.; Brosseau, C. Graphics processor unit implementation of correlation technique using a segmented phase only composite filter. Opt. Commun. 2013 , 289 , 33–44. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Su, C.; Yan, Y.; Chen, S.; Wang, H. An efficient deep neural networks training framework for robust face recognition. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3800–3804. [ Google Scholar ]
  • Coşkun, M.; Uçar, A.; Yildirim, Ö.; Demir, Y. Face recognition based on convolutional neural network. In Proceedings of the 2017 International Conference on Modern Electrical and Energy Systems (MEES), Kremenchuk, Ukraine, 15–17 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 376–379. [ Google Scholar ]

Click here to enlarge figure

Author/Technique UsedDatabaseMatchingLimitationAdvantageResult
Local Appearance-Based Techniques
Khoi et al. [ ]LBPTDFMAPSkewness in face imageRobust feature in fontal face5%
CF199913.03%
LFW90.95%
Xi et al. [ ]LBPNetFERETCosine similarityComplexities of CNNHigh recognition accuracy97.80%
LFW94.04%
Khoi et al. [ ]PLBPTDFMAPSkewness in face imageRobust feature in fontal face5.50%
CF9.70%
LFW91.97%
Laure et al. [ ]LBP and KNNLFWKNNIllumination conditionsRobust85.71%
CMU-PIE99.26%
Bonnen et al. [ ]MRF and MLBPAR (Scream)Cosine similarityLandmark extraction fails or is not idealRobust to changes in facial expression86.10%
FERET (Wearing sunglasses) 95%
Ren et al. [ ]Relaxed LTPCMU-PIEChisquare distanceNoise levelSuperior performance compared with LBP, LTP95.75%
Yale B98.71%
Hussain et al. [ ]LPQFERET/Cosine similarityLot of discriminative informationRobust to illumination variations99.20%
LFW75.30%
Karaaba et al. [ ]HOG and MMDFERETMMD/MLPDLow recognition accuracyAligning difficulties68.59%
LFW23.49%
Arigbabu et al. [ ]PHOG and SVMLFWSVMComplexity and time of computationHead pose variation88.50%
Leonard et al. [ ]VLC correlatorPHPIDASPOFThe low number of the reference image usedRobustness to noise92%
Napoléon et al. [ ]LBP and VLCYaleBPOFIlluminationRotation + Translation98.40%
YaleB Extended95.80%
Heflin et al. [ ]correlation filterLFW/PHPIDPSRSome pre-processing steps More effort on the eye localization stage39.48%
Zhu et al. [ ]PCA–FCFCMU-PIECorrelation filterUse only linear methodOcclusion-insensitive96.60%
FRGC2.091.92%
Seo et al. [ ]LARK + PCALFWCosine similarityFace detectionReducing computational complexity78.90%
Ghorbel et al. [ ]VLC + DoGFERETPCELow recognition rateRobustness81.51%
Ghorbel et al. [ ]uLBP + DoGFERETchi-square distanceRobustnessProcessing time93.39%
Ouerhani et al. [ ]VLCPHPIDPCEPowerProcessing time77%
Lenc et al. [ ]SIFTFERETa posterior probabilityStill far to be perfectSufficiently robust on lower quality real data97.30%
AR95.80%
LFW98.04%
Du et al. [ ]SURFLFWFLANN distanceProcessing timeRobustness and distinctiveness95.60%
Vinay et al. [ ]SURF + SIFTLFWFLANNProcessing timeRobust in unconstrained scenarios78.86%
Face94distance96.67%
Calonder et al. [ ]BRIEF_KNNLow recognition rateLow processing time48%
Author/Techniques UsedDatabases MatchingLimitationAdvantage Result
Linear Techniques
Seo et al. [ ]LARK and PCALFWL2 distanceDetection accuracyReducing computational complexity85.10%
Annalakshmi et al. [ ]ICA and LDALFWBayesian ClassifierSensitivity Good accuracy88%
Annalakshmi et al. [ ]PCA and LDALFWBayesian ClassifierSensitivity Specificity59%
Hussain et al. [ ]LQP and GaborFERETCosine similarityLot of discriminative informationRobust to illumination variations99.2%
75.3%
LFW
Gowda et al. [ ]LPQ and LDAMEPCOSVM Computation timeGood accuracy99.13%
Z. Cui et al. [ ]BoWARASMOcclusionsRobust99.43%
ORL 99.50%
FERET82.30%
Khan et al. [ ]PSO and DWTCKEuclidienne distanceNoiseRobust to illumination98.60%
MMI95.50%
JAFFE98.80%
Huang et al. [ ]2D-DWTFERETKNNPoseFrontal or near-frontal facial images90.63%
97.10%
LFW
Perlibakas and Vytautas [ ]PCA and Gabor filterFERETCosine metricPrecisionPose87.77%
Hafez et al. [ ]Gabor filter and LDAORL2DNCC PoseGood recognition performance98.33%
C. YaleB99.33%
Sufyanu et al. [ ]DCTORLNCCHigh memoryControlled and uncontrolled databases93.40%
Yale
Shanbhag et al. [ ]DWT and BPSO_ __ _RotationSignificant reduction in the number of features88.44%
Ghorbel et al. [ ]Eigenfaces and DoG filterFERETChi-square distanceProcessing timeReduce the representation84.26%
Zhang et al. [ ]PCA and FFTYALESVMComplexityDiscrimination93.42%
Zhang et al. [ ]PCAYALESVMRecognition rateReduce the dimensionality 84.21%
Fan et al. [ ]RKPCAMNIST ORL RBF kernelComplexityRobust to sparse noises_
Vinay et al. [ ] ORB and KPCAORLFLANN MatchingProcessing timeRobust87.30%
Vinay et al. [ ]SURF and KPCAORLFLANN MatchingProcessing timeReduce the dimensionality80.34%
Vinay et al. [ ]SIFT and KPCAORLFLANN MatchingLow recognition rateComplexity69.20%
Lu et al. [ ]KPCA and GDAUMIST faceSVMHigh error rate Excellent performance48%
Yang et al. [ ]PCA and MSRHELEN faceESRComplexityUtilizes color, gradient, and regional information98.00%
Yang et al. [ ]LDA and MSRFRGCESRLow performancesUtilizes color, gradient, and regional information90.75%
Ouanan et al. [ ]FDDL ARCNNOcclusionOrientations, expressions98.00%
Vankayalapati and Kyamakya [ ]CNNORL_ _PosesHigh recognition rate95%
Devi et al. [ ]2FNNORL_ _ComplexityLow error rate98.5
Author/Technique UsedDatabaseMatchingLimitationAdvantage Result
Fathima et al. [ ]GW-LDAAT&Tk-NNHigh processing timeIllumination invariant and reduce the dimensionality88%
FACES9494.02%
MITINDIA88.12%
Barkan et al., [ ]OCLBP, LDA, and WCCNLFWWCCN_Reduce the dimensionality87.85%
Juefei et al. [ ]ACF and WLBPLFW ComplexityPose conditions89.69%
Simonyan et al. [ ]Fisher + SIFTLFWMahalanobis matrixSingle feature typeRobust87.47%
Sharma et al. [ ]PCA–ANFISORLANFISSensitivity-specificity 96.66%
ICA–ANFISANFISPose conditions71.30%
LDA–ANFISANFIS 68%
Ojala et al. [ ] DCT–PCAORLEuclidian distanceComplexityReduce the dimensionality92.62%
UMIST99.40%
YALE95.50%
Mian et al. [ ] Hotelling transform, SIFT, and ICPFRGCICPProcessing timeFacial expressions99.74%
Cho et al. [ ]PCA–LGBPHSExtended Yale FaceBhattacharyya distanceIllumination conditionComplexity95%
PCA–GABOR Wavelets
Sing et al. [ ]PCA–FLDCMUSVMRobustnessPose, illumination, and expression71.98%
FERET94.73%
AR68.65%
Kamencay et al. [ ]SPCA-KNNESSEXKNNProcessing timeExpression variation96.80%
Sun et al. [ ]CNN–LSTM–ELMOPPORTUNITYLSTM/ELMHigh processing timeAutomatically learn feature representations90.60%
Ding et al. [ ]CNNs and SAELFW_ _ComplexityHigh recognition rate99%
ApproachesDatabases UsedAdvantagesDisadvantagesPerformancesChallenges Handled
TDF, CF1999,
LFW, FERET,
CMU-PIE, AR,
Yale B, PHPID,
YaleB Extended, FRGC2.0, Face94.
]. , ]. , ]. ], various lighting conditions[ ], facial expressions [ ], and low resolution.
]. ]. ]. ]. ]. ].
LFW, FERET, MEPCO, AR, ORL, CK, MMI, JAFFE,
C. Yale B, Yale, MNIST, ORL, UMIST face, HELEN face, FRGC.
, ]. , , , ]. ]. ]. ]. , ], scaling, facial expressions.
, , ]. , , ]. ]. , ]. ]. , ]. , ], poses [ ], conditions, scaling, facial expressions.
AT&T, FACES94,
MITINDIA, LFW, ORL, UMIST, YALE, FRGC, Extended Yale, CMU, FERET, AR, ESSEX.
]. , , ]. ]. ]. , ].

Share and Cite

Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020 , 20 , 342. https://doi.org/10.3390/s20020342

Kortli Y, Jridi M, Al Falou A, Atri M. Face Recognition Systems: A Survey. Sensors . 2020; 20(2):342. https://doi.org/10.3390/s20020342

Kortli, Yassin, Maher Jridi, Ayman Al Falou, and Mohamed Atri. 2020. "Face Recognition Systems: A Survey" Sensors 20, no. 2: 342. https://doi.org/10.3390/s20020342

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: artificial immune system of secure face recognition against adversarial attacks.

Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenotyping, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
Journal reference: International Journal of Computer Vision (IJCV), 2024
: Focus to learn more DOI(s) linking to related resources

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 23 June 2024

Image-based facial emotion recognition using convolutional neural network on emognition dataset

  • Erlangga Satrio Agung 1 ,
  • Achmad Pratama Rifai 1 &
  • Titis Wijayanto 1  

Scientific Reports volume  14 , Article number:  14429 ( 2024 ) Cite this article

428 Accesses

Metrics details

  • Computer science
  • Human behaviour
  • Information technology

Detecting emotions from facial images is difficult because facial expressions can vary significantly. Previous research on using deep learning models to classify emotions from facial images has been carried out on various datasets that contain a limited range of expressions. This study expands the use of deep learning for facial emotion recognition (FER) based on Emognition dataset that includes ten target emotions: amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness, and neutral. A series of data preprocessing was carried out to convert video data into images and augment the data. This study proposes Convolutional Neural Network (CNN) models built through two approaches, which are transfer learning (fine-tuned) with pre-trained models of Inception-V3 and MobileNet-V2 and building from scratch using the Taguchi method to find robust combination of hyperparameters setting. The proposed model demonstrated favorable performance over a series of experimental processes with an accuracy and an average F1-score of 96% and 0.95, respectively, on the test data.

Similar content being viewed by others

face recognition thesis paper

Four-layer ConvNet to facial emotion recognition with minimal epochs and the significance of data diversity

face recognition thesis paper

A study on computer vision for facial emotion recognition

face recognition thesis paper

An enhanced speech emotion recognition using vision transformer

Introduction.

Humans use emotions to express their feelings to others and as a communication tool to convey information. Emotions reflect human mood in the form of a psychophysiological condition of a human. Emotions result from human interactions and internal or external factors 1 . Dynamic changes in emotion play an important role in human life because they directly affect most of the daily activities and habits carried out by humans 2 . Emotions are the dominant driver of decisions made by humans 3 . Positive emotions result in the formation of good communication and increase human productivity. Meanwhile, negative emotions can harm both mental and physical conditions. Therefore, an automated system based on human emotions is important for continuing to develop 4 .

Humans can express emotions through hands, voice, gestures, and facial expressions, with 55% of emotions conveyed through facial expressions 5 . The human face displays information cues that are relevant to provide expression of an emotional state or behavior. Facial expressions play an important role in human communication, as they help us understand the intentions of others. Hence, facial recognition emerges as an important domain in understanding human emotion. Among various facial recognition techniques, facial emotion recognition (FER) has seen substantial advancement 6 . Using machine learning, FER can help humans distinguish emotions through facial expressions by analyzing images or video data to obtain information about emotional states 7 , which is important for social interaction because it can help humans understand the feelings and intentions of others. FER is commonly used in various fields such as health, education, labor, robotics, communication, psychology, and others 8 .

Recent advancements in FER-based automation systems can be classified into two main parts of feature generation: conventional extraction and automatic extraction via deep neural networks 9 . While the conventional approach holds an advantage in computation time and is commonly used for real-time classification problems 10 , this approach lacks flexibility as it requires predefined feature extraction and classifiers 11 . As such, deeper knowledge of the feature extraction and classifier is required to fetch meaningful and good features for models’ input without discarding important information. This issue can be an obstacle to developing and implementing the detection models. On the other hand, an automated approach employing deep learning algorithms, such as the convolutional neural network (CNN), reduces or eliminates dependencies from other models and/or existing preprocessing techniques by carrying out end-to-end learning directly from input data 12 . However, CNNs require extensive data to obtain a higher level of classification 13 . Researchers worldwide have done much research by building the CNN model to solve the FER problem. Researchers have used various types of image data-based datasets as input data for the constructed models, such as Facial Emotion Recognition 2013 (FER 2013) and Extended Cohn-Kanade Dataset (CK +). However, the entire dataset still focuses on the 7 basic human emotions, so further development is needed to solve the FER problem with a more varied number of emotion classes.

This study is motivated by the need to address the limitations of the existing FER systems that predominantly recognize a limited set of basic emotions. Utilizing the Emognition dataset, which encompasses ten distinct emotions: neutral, amusement, anger, awe, disgust, enthusiasm, fear, liking, sadness, and surprise 14 , this study aims to develop FER models capable of handling a wider spectrum of human emotions. The Emognition dataset not only includes common emotions but also introduces four new emotion classes: awe, enthusiasm, amusement, and liking, providing a richer foundation for enhancing FER applications in areas such as mobile application development, education, product marketing, and tourism management. For example, the inclusion of the amusement emotion improves interaction with entertainment devices like games and movies 15 . Recognizing enthusiasm can help educators enhance learning environments and manage student engagement effectively 16 . In other fields, the emotion of enthusiasm can also be used to determine the suitability of the workload given to a worker, the awe emotion can significantly increase consumer willingness to share 17 , and liking emotion also has a role in shaping consumer preferences and brand affinity in marketing. Despite its potential, using the Emognition dataset in FER models is relatively unexplored, representing a significant gap in current research that this study seeks to address.

This study also explores whether CNNs, trained on the Emognition dataset 14 , can more effectively classify a more diverse range of emotions. This involves assessing the benefits of processing image data extracted from video sequences compared to direct video inputs, which could potentially allow for selecting relevant data to use and discarding irrelevant data, thus optimizing the performance of the FER models. Overall, this study aims to improve the accuracy and applicability of FER systems and broaden the scope of emotions that these systems can recognize. Such advancements in emotion recognition technology could have significant implications across various aspects of human life, including social interactions, mental health, education, and employment. This ongoing development of FER technology highlights its critical role as a necessary knowledge domain in the modern world.

The remainder of this paper is as follows. In the next section, previous related work is explained. In Sect. “ Methodology ”, the proposed method and the background theories are described. In Sect. “ Experimental setup ”, experimental works and obtained results are examined and analyzed. In the last section, conclusions and future works are discussed.

Related work

Research in FER has evolved through conventional and automated approaches involving various datasets and methodologies. Thus far, numerous studies have utilized datasets such as Cohn-Kanade (CK) 18 , Extended Cohn-Kanade (CK +) 19 , Facial Expression Recognition 2013 (FER 2013) 20 , Japanese Female Facial Expression (JAFFE) 21 , FACES 22 , and Radboud Faces Database (RaFD) 23 . These datasets primarily include six to eight basic emotion classes. For example, CK and CK + have seven emotional classes: neutral, anger, contempt, disgust, fear, happiness, and sadness 24 . Similarly, FER 2013 and JAFFE introduce seven classes with slightly different categories: neutral, angry, disgusted, fearful, happy, sad, and surprised 24 . While the datasets as mentioned earlier introduced seven classifications, FACES introduced six categories of emotions: neutral, sad, disgust, fear, anger, and happy 22 , while RaFD has eight emotional classes: anger, disgust, fear, happiness, sadness, surprise, disdain, and neutral. Albeit these datasets have provided insightful data, the emotions covered are considerably basic emotions and less complex.

Recently, Saganowski et al. 14 introduced the Emognition dataset, which includes ten distinct emotions: neutral, amusement, anger, awe, disgust, enthusiasm, fear, liking, sadness, and surprise. In addition to these emotions, the dataset offers several enhancements: it captures physiological signals, represents emotional states using both discrete and dimensional scales, and highlights differences among positive emotions. These improvements facilitate emotion recognition from both facial expression analysis and physiological perspectives, accommodating variances that might occur with specific emotions. While several studies have utilized the physiological signals from the Emognition dataset to classify emotions (e.g., 25 , 26 , 27 , 28 ), there has been relatively little research examining the facial expression data within the same dataset for emotion recognition and classification (e.g., 29 . This gap highlights a key area for further investigation, aiming to fully leverage the dataset's capabilities in enhancing FER technologies.

Concerning FER techniques, researchers use several conventional methods to extract features from input image data. Some of these methods have been applied in several publications, including cropping faces and converting them into grayscale images 30 , using an optical flow-based approach 31 , and using a histogram of oriented gradients (HOG) 32 . In addition, several publications use an automated approach in extracting features based on the CNN algorithm, including the development of a model with an architectural configuration from scratch 33 , transfer learning with MobileNet 34 , MobileNet-V2 35 , VGG19 36 , DenseNet121 37 , and others.

In the conventional approach, Gupta 30 preprocessed the CK and Extended CK + datasets by cutting the facial region using the HAAR filter from OpenCV and then converting them into grayscale images. In addition, the detection of landmark points on the face is normalized at each point. The random sample technique in the training data distribution is applied with a ratio of 80% training data and 20% validation data. The training process is carried out with the support vector machine classifier model and obtains an accuracy of 93%. Using the same type of classifier, Anthwal and Ganotra 31 performed dense optical flow calculations on facial images to extract vertical and horizontal components. Preprocessing in this study was carried out using the Viola-Jones algorithm to cut the facial area and resize it to a grayscale image with a resolution of 256 × 256. Using Extended Cohn-Kanade (CK +) as a dataset, the best results were obtained for 6 class categories (excluding the contempt class) with 90.52% accuracy. In contrast, for 7 classes, only 83.38% accuracy was achieved, indicating that the classifier model performed better when trained on 6 class categories compared to 7 class categories.

Using the JAFFE and Cohn-Kanade (CK) datasets, Supta et al. 32 built an FER system based on the HOG and support vector machine (SVM). Preprocessing is carried out on the detected parts using histogram equalization techniques and image sharpening to reduce lighting effects. Then, HOG extracts distinctive image features from faces and combines them into feature vectors. Finally, SVM is used to classify expressions using polynomial kernel functions. The proposed system is evaluated on JAFFE and CK data and shows that the proposed FER system provides up to 97.62% and 98.61% accuracy for JAFFE and CK data, respectively.

The conventional feature extraction method for FER requires complex image preprocessing and manual feature extraction, which take a long time 38 . Manually extracted features depend heavily on the previous knowledge of the researchers. This causes the resulting features to be exposed to high bias and causes the loss of implicit patterns. The effectiveness of the extracted features also depends on how well the manual feature extraction technique is used. In contrast, an automated approach based on deep learning is very good at classifying images but requires extensive data to train and perform recognition efficiently 13 .

Extensive data requirements are one of the crucial factors in the context of deep learning model development. Ramalingam and Garzia 33 developed the CNN algorithm using a combination layer of convolution, rectified linear unit (ReLU), pooling twice for feature extraction, and ending with a fully connected layer. This CNN algorithm achieved an accuracy of only 60% on the FER 2013 dataset, which included 35,887 sample images. One deep learning approach that can improve training accuracy is using transfer learning techniques. This technique takes advantage of features learned by models trained on ImageNet to overcome the problem of the lack of large datasets available online 35 .

Ramalingam and Garzia 33 also used transfer learning with the pretrained VGG16 model. In the FER2013 dataset, the accuracy of this transfer learning algorithm reaches 78%, so there is an increase in accuracy of 18% compared to the CNN model without transfer learning. With the same dataset, Raja Sekaran et al. 13 implemented transfer learning using AlexNet as a pretrained model. This research also implements an early stopping mechanism to overcome the problem of overfitting AlexNet. The proposed model only requires preprocessing in the form of image conversion to grayscale to reduce the effects of lighting and human skin color on the classification results. The model managed to achieve 70.52% accuracy for the FER dataset.

In FER 2013, Abdulsattar and Hussain 37 developed six well-known deep learning models for FER problems. These models are MobileNet, VGG16, VGG19, DenseNet121, Inception-v3, and Xception. Model performance was evaluated and compared using transfer learning and fine-tuning strategies on the FER2013 dataset. In transfer learning, all layers in the pretrained model are frozen or not retrained. However, in fine-tuning, all layers in the pretrained model are retrained. The results show that the fine-tuning strategy performs better than transfer learning, with differences ranging from 1 to 10%. The VGG16 pretrained model achieved the highest accuracy with a maximum accuracy of 0.6446.

Transfer learning using VGG16 was also carried out by Oztel et al. 39 using the RaFD dataset. The treatment of the VGG16 pretrained model was divided into two scenarios: with transfer learning and without transfer learning. The model without transfer learning was modified on the 39th and 41st layers of the model structure and a random weight was placed on the model layer, while the model with transfer learning kept the model structure intact. The VGG16 scenario with transfer learning produced the best accuracy compared to the VGG16 scenario without transfer learning. The less-optimal results in the VGG16 scenario without transfer learning are due to a lack of training data and input data imbalance.

Gondkar et al. 35 also carried out research using transfer learning on CK + . The models used in the transfer learning technique are Xception, VGG16, VGG19, ResNet50, ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, and DenseNet121. A comparative analysis of these models was performed using various evaluation metrics, such as model size, training accuracy, validation accuracy, training loss, and validation loss. The results showed that the pretrained models ResNet50V2, ResNet101V2, ResNet152V2, and MobileNet achieved training and validation accuracy of more than 90%. Most pretrained models have demonstrated outstanding performance, with ResNet101V2 achieving a training accuracy of 93.08% and a validation accuracy of 92.87%. MobileNet achieved training and validation accuracies higher than 90%. Regarding model size, MobileNet was the smallest yet more efficient than most other models.

Undeniably, the type of preprocessing used in the input data can impact the quality of the resulting model. This is evidenced by the preprocessing experiment conducted by Sajjanhar et al. 36 . In that study, three treatments were applied in the CK + , JAFFE, and FACES datasets. The first preprocessing step involved maintaining the region of interest (ROI) or by cutting the face area from the image. The second preprocessing step was performed by calculating the difference between the gray level intensities of the ROI image pixels at neutral and peak expressions. The third preprocessing step was performed by forming a local binary pattern (LBP) from the image. These three types of preprocessing were tested on the CNN algorithm and resulted that the second preprocessing succeeded in providing the best accuracy compared to other types of preprocessing with 85.19% in CK + data, 65.17% in JAFFE data, and 84.38% in FACES data.

Agobah et al. 34 applied transfer learning using MobileNetV1 across multiple datasets for training, validation, and testing. They optimize CNN training by combining center loss and softmax loss, using the FER 2013 dataset for training and the JAFFE and CK + datasets for validation and testing. This addition improved accuracies on CK + and JAFFE by 2.94% and 5.08%, respectively, with JAFFE achieving 96.43% precision and 95.24% recall and F1 score. While increasing the number of classes in the CK + dataset initially reduced accuracy due to complexity and data limitations, using a larger dataset significantly enhanced performance, raising accuracy by 4.41% over a smaller dataset. However, Agobah et al. 34 highlighted that some misclassifications occurred due to the inherent complexity of distinguishing emotions like anger and sadness.

Meena et al. 40 explored the use of CNN for sentiment identification on facial expression in the CK + and FER-2013 datasets. Several architectures were developed to evaluate the efficiency of the proposed models in those datasets. The models were categorized into two types based on the data classes: CNN-3 model considered positive, negative, and neutral expressions, while CNN-7 model considered happy, neutral, sad, surprise, fear, angry, and disgust expressions. The CNN-3 model yielded accuracy at 79% and 95% for the FER-2013 and CK + databases, respectively. Meanwhile, the CNN-7 model resulted on slightly lower accuracy at 69% and 92% for the same datasets.

Recent FER studies explored advanced models for emotion recognition from video data 29 , 41 . Bilotti et al. 41 developed a multimodal CNN approach integrating facial frames, optical flow, and Mel Spectrograms, which achieved impressive accuracies of approximately 95% on the BAUM-1 and 95.5% on the RAVDESS datasets. In contrast, Manalu and Rifai 29 focused on hybrid CNN-RNN models, comparing a custom model with transfer learning models based on InceptionV3 and MobileNetV2 architectures. Their custom model achieved a maximum accuracy of 63%, less than the 66% by InceptionV3-RNN but higher than the 59% by MobileNetV2-RNN, with the custom model also offering significantly quicker processing times. Both studies highlight the potential of combining different data inputs and model architectures to enhance the accuracy and efficiency of facial expression recognition systems.

Overall, the studies above show the effectiveness of using the transfer learning method in building models by considering the limited data availability in the FER domain. Several pre-trained models are also used to obtain optimal accuracy on the FER, JAFFE, CK + , CK, FACES, and RaFD datasets. However, from the various studies conducted, no research has discussed the application of CNN-based deep learning using Emognition datasets. Considering that the Emognition dataset covers a more significant number of emotions, some of which have never been explored before, this study develops FER models using CNN to address the Emognition dataset. Besides exploring the transfer learning and fine-tuning strategy, this study also proposes a novel network with aims to develop a more efficient model for FER. Building a CNN from scratch for facial emotion recognition provides a deep understanding of the network’s inner workings, allowing for customization and optimization of the architecture to the specifics of the task. Since the FER images have not been covered by standard pre-trained models, it is worth developing a full learning model which can be customized according to the problem specification.

Methodology

In this work, we use different types of methods for building the CNN model: transfer learning using MobileNet-V2 and Inception-V3 with fine-tuning strategies and building models from scratch by designing a new network with better efficiency. We also apply a serial type of preprocessing in the Emognition dataset to adjust with the research goal.

Data pre-processing

This process is carried out in several sequential stages: process video to frame, face cropping from frame, data cleaning, data splitting, rescaling, resizing, and data augmentation, as depicted in Fig.  1 . Following the selected deep learning method, the half-body video data is transformed into image data (frame sequences). This task is performed in the Process Video to Frame stage. Once image data in video frames is obtained, the facial region is automatically detected and then cropped within those frames. This activity took place in the Face Cropping from Frame stage.

figure 1

Stages of data pre-processing.

Upon obtaining the facial data, data cleaning is performed by retaining facial images corresponding to emotions while discarding those not accurately representing the intended emotions for their respective classes. During the data input process into the model, a dataframe is employed to enhance data management with greater flexibility and transparency. By utilizing the created data frame, shuffling is performed to the data to achieve a more balanced data distribution. The data are then split into training, validation, and test datasets after the shuffling. The training and validation data play a direct role in training the model, while the test data was solely involved in the testing process.

Subsequently, we apply a rescaling process to transform pixel values of the input images into a range between 0 and 1, and resizing was performed to standardize the input image dimensions. These rescaling and resizing procedures are undertaken to conform to the input requirements of the CNN model. The augmentation techniques are also employed to diversify the dataset, reducing the overfitting risk.

Transfer learning approach

In the transfer learning of proposed models, the pre-trained weight of the feature extraction layer from both MobileNet-V2 and Inception-V3 are called and then fine-tuning on several layers of the pre-trained model are performed. The choice of Inception-V3 is based on its high accuracy in previous studies. Inception-V3 introduces more complex and efficient architectural designs, including asymmetric convolutions. This means it uses convolutions of different sizes within the same module, allowing it to capture patterns over various spatial hierarchies more effectively. As a result, the network can learn more complex features with fewer parameters, reducing the risk of overfitting. However, Inception-V3 still has more than 24 Million parameters which may require longer training time. Meanwhile, MobileNet-V2 is selected for its small size with around 3.4 million parameters and relative good accuracy, making it suitable for application in smaller devices. It uses depthwise separable convolutions as its basic building block, significantly reducing the number of parameters and computational cost compared to traditional convolutional layers without a substantial loss in accuracy. MobileNet-V2 also uses architectural feature known as the inverted residual structure with linear bottlenecks. This design optimizes the flow of information through the network, ensuring that the model remains lightweight while still capturing essential features necessary for accurate predictions albeit not as powerfull as Inception-V3 which can capture more complex features.

Fine-tuning is achieved by deactivating all pre-trained weights and conducting training with several epochs, then we activate half of the pre-trained model and continue further training with the same number of epochs. As such, the training process of the transfer learning model is divided into two scenarios: scenario 1 and scenario 2, in which scenario 1 includes the stage before fine-tuning, and scenario 2 contains the fine-tuning process. Dense layers are also added with a certain number of unit neurons. These transfer learning and fine-tuning stages are adopted based on research from Elgendy 42 . The training process of the model using the transfer learning method conducted in two stages is described as follows:

In this scenario, all convolutional layers of the selected pretrained model are frozen (freezing), and the classification head from the pretrained model is not utilized. A new classification head is then added, tailored to the case of emotion expression classification, and training is conducted. By freezing the pretrained model, the weights in the convolutional network are not updated during the training process. This scenario aims to train only the classification head of the model while maintaining the weights that have been trained in the pretrained model.

Unlike the first scenario, in this scenario, the pretrained model is partially unfrozen at the last 50% of its layers. Training then continues from where it left off in Scenario 1. Here, the first scenario is executed in the first half of the training process, in which if there are 100 epochs, then the first 50 epochs are dedicated for the first stage. Afterward, the training process is continued using the second scenario for the rest of epochs. The main focus here is to train a portion of the layers in the convolutional network to better suit the case of emotion expression classification. The choice to unfreeze the last 50% of layers is because these layers contain features that are specific to the data trained on the previous dataset.

The use of two stage training process using scenarios 1 and 2 of the transfer learning model is to enhance the effectiveness of the transfer learning process. In scenario 1, the focus is on leveraging the pretrained model, which contains previously trained weights. However, as additional layers are added, these weights become more specific to the features of the previous dataset. Therefore, in scenario 2, there is an effort to retrain part of the pretrained model on the last 50% of its layers to align with the current case. This effort optimizes the weights of the pretrained model for better utilization in the transfer learning process.

MobileNet-V2

MobileNet is a very lightweight image classification model with minimal operations initiated by Sandler et al. 43 . MobileNet has three variants there are MobileNet-V1, MobileNet-V2, and MobileNet-V3. In this study, we use MobileNet-V2 as one of the pre-trained models in the transfer learning process. MobileNet-V2 is the smallest model in size and has the second fastest GPU processing time after MobileNet-V1 on ImageNet dataset. In addition, this model has greater both top-1 accuracy and top-5 accuracy than MobileNet-V1. This model is also composed of parameters with the smallest number compared to other existing models.

MobileNet-V2 has two residual blocks with a stride of 1 and a second block with a stride of 2. Each block consists of 3 layers of 1 × 1 pointwise layer, depthwise layer, and 1 × 1 linear convolution layer with ReLU6 activated. The MobileNet-V2 architecture is demonstrated in Fig.  2 .

figure 2

MobileNet-V2 architecture 44 .

Inception-V3

Inception-V3 is the successor of Inception-V2 and Inception-V1 35 , initiated by Szegedy et al. 45 . This model consists of five 5 × 5 convolution inception modules replaced by two 3 × 3 convolution layers and an efficient grid reduction block to reduce the number of parameters without sacrificing the overall model efficiency. In addition, four 7 × 7 convolution inception modules are replaced with two 1 × 7 and 7 × 1 convolution layers, followed by another grid reduction block.

Inception-V3 has an additional classifier connected to the ends of these 4 inception modules. At the end of the model, two inception modules used 3 × 1 and 1 × 3 convolution layers in parallel to increase dimensionality. Then it is connected to the average pooling layer, fully connected layer, dropout, and softmax to produce output. Compared to MobileNet-V2, Inception-V3 has a greater number of parameters with better top-1 accuracy and top-5 accuracy than MobileNet-V2 on ImageNet. The Inception-V3 architecture is shown in Fig.  3 .

figure 3

Inception-V3 architecture 46 .

Full learning approach

In the full learning approach, where the model is constructed from the ground up, we employ multiple feature extraction layers. Each of these layers is composed of a convolutional layer paired with a subsequent pooling layer. The convolutional layers vary in the quantity of filters employed and each has a predefined filter size. For the pooling layers, we utilize max pooling. Additionally, a flatten layer is incorporated to transform the feature matrix into a vector form. To diminish the risk of overfitting during the training phase, a dropout layer is also integrated into the model.

To optimize the result, a design of experiment (DoE) using the Taguchi method is performed to find a robust combination of the number of feature extraction layers and the number of epochs to be used in the build model of full learning technique. The stages of the design of the experiment using the Taguchi method are shown in Fig.  4 .

figure 4

Steps in Design of Experiment using Taguchi Method.

The process begins with the clear definition of the problem at hand, which in this case is to determine a robust combination of the number of feature extraction layers and epochs for the model. Following this, the output performance characteristics that will gauge the success of the model. The output characteristic to be compared is the validation accuracy at the end of each epoch since it enables direct comparisons between different models. The higher the accuracy level on the validation data, the better the model performs.

The next step involves pinpointing the control factors, which include the network architecture details (number of convolutional layers) and training parameter (number of epoch). The number of convolutional layers determines the depth of feature extraction, which can affect a model's ability to learn from complex data. More layers can capture intricate patterns but also risk overfitting. The number of epochs affects how well the model learns from the data; too few epochs can lead to underfitting, while too many may lead to overfitting. Both factors directly influence the model’s learning capacity and generalization to new data, making them critical control factors for optimizing performance.

Moving on, the levels of each control factor to test are selected, considers potential interactions between them, and ensures the appropriate degrees of freedom for the experiment’s statistical validity. Here, each factor has 3 levels (number of convolutional layers: 4, 5, 6, and number of epochs: 50, 100, 150). The most efficient combination was achieved through this by conducting 9 runs in total, hence utilizing an L9 orthogonal array. Afterward, the experiments are executed using the same platform and device to minimize noise. Statistical analysis is conducted by analyzing the value of the S/N ratio from the experimental results obtained. Since the goal of this experiment is to find a combination of factors that can maximize highest validation accuracy value of the expected output, the type of S/N ratio chosen is the S/N ratio larger is better. The output of the DoE is the best combination of number of convolutional layers and number of epoch which are then used for building and training full learning model.

Evaluation criteria

After constructing the model using both techniques, this study analyze the outcomes based on the evaluation metrics employed, namely accuracy, precision, and recall. The model creation process can be considered complete if these three metrics yield satisfactory results. However, if the outcomes are not deemed satisfactory, we undertake hyperparameter tuning for the CNN model to enhance the training outcomes. For the evaluation of several alternative models, some indicators have been set. These criteria are shown in Table 1 .

Based on Table 1 , the best model should have good accuracy, precision, and recall from training and validation datasets with the smallest loss, not show overfitting and underfitting, have a good confusion matrix, and perform the best accuracy, precision, recall, and F1-score on the test results. In addition, the computation time, especially testing (interference time) is also assessed to ensure practicability of the model to be applied in real-world implementations.

Consent for publication

We hereby provide consent for the publication of the manuscript detailed above, including any accompanying images or data contained within the manuscript.

Experimental setup

The Emognition dataset contains physiological signals and upper body video recordings from 43 participants who watch movie clips that have been emotionally validated and triggered to produce nine discrete emotions. According to Saganowski et al. 14 the Emognition dataset offers several advantages compared to other datasets, including nine discrete emotions—amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness- with one neutral emotion, emphasizing the differences between positive emotions, and enabling diverse analysis in emotion recognition (ER) from the physiology and facial expressions domain. This study only use half-body video data of the Emognition Dataset with a total of 387 videos. There are two types of frame rates: 60 FPS and 30 FPS. For a total, there are 287 video with 60 FPS, while the rest 100 videos are 30 FPS. Generally, the videos in the Emognition dataset vary in length across different videos and classes.

Data preprocessing

Converting video to frame.

At this juncture, frames from the video data are extracted by taking into account both the frame rate and the duration of the footage. The Emognition dataset encompasses two distinct frame rates: 60 frames per second (FPS) and 30 FPS. Employing a sampling method, frames at regular one-second intervals are collected throughout the length of the video. Accordingly, for videos recorded at 60 FPS, frames are extracted every 60th frame, and for those at 30 FPS, every 30th frame was selected.

Cropping face from frame

Subsequent to frame extraction, we proceed to isolate the facial region in each frame. This segmentation utilizes the Cascade Classifier function from the OpenCV library to precisely delineate the face in every sequence of frames acquired from the previous stage.

Data cleaning

According to Saganowski et al. 14 , certain high-intensity emotional expressions manifest under multiple conditions within the film sequences, and a single sequence may elicit multiple emotions. In light of these findings, a subjective data pruning is conducted. Each image is individually inspected, with those deemed suitable retained and the unsuitable ones removed. This pruning process significantly reduces the total dataset volume. Additionally, an absence of a distinct 'surprise' emotion classification lead to its exclusion, narrowing down the emotion categories to nine, including a neutral category. This reduction yields a final count of 2,535 facial images, constituting a mere 6.12% of the original image dataset.

Shuffling and splitting

The data randomization and splitting processes are executed simultaneously. Randomized shuffling is managed via the random state parameter, and partitioning is dictated by the specified test size. Random shuffling in train-test splitting is employed to ensure that the training and testing datasets are representative of the overall dataset. This method mitigates the risk of bias in the model training process and enhances the model's ability to generalize from its training to unseen data. By shuffling, the data is randomized, preventing the model from learning potential patterns that may be due to the order of the data rather than the underlying relationships between the variables.

The datasets are then segmented into 80% for training, and 10% each for validation and testing. The aim of this segmentation is to balance the need for a model to learn effectively from the data (requiring a substantial training set) against the need to prevent overfitting and to accurately estimate the model's predictive performance on new, unseen data (requiring separate validation and testing sets). A larger training set allows the model to better understand the complex patterns and relationships within the data, which is vital for developing a model that performs well. Moreover, the relatively high portion of train datasets is carefully determined by considering the limited number of available data at only 2535 facial images.

Rescale and resize

Normalization of input pixel values is accomplished through the use of a rescale parameter set at 1/255 within the image data generator, converting the pixel values to a normalized matrix ranging between 0 and 1. Concurrently, resizing of images is administered through the target size parameter, aligning with the predetermined image dimensions for this study, which are set at 300 × 300 pixels.

Data augmentation

To further enrich the dataset, we apply data augmentation strategies using an image data generator. This involves applying a series of transformations to the input images, such as height shifts, shear intensity variations, zoom alterations, and horizontal flipping. These modifications enable the generation of new image variants, thereby enhancing the diversity of the training dataset.

Parameter configuration and model implementation

There are several hyperparameters of the training process that are pre-determined. Details of these parameters are shown in Table 2 . Meanwhile, the number of epochs for full learning model is determined based on the Taguchi experiments. During the training, the same hyperparameters setting are set for both transfer learning and full learning models to ensure fair comparison, except the image size which follows the input size of respective architecture and the number of epochs. The number of epochs for transfer learning models are 100 epochs, which is uniformly divided for each 1st and 2nd scenarios.

The proposed method is developed and implemented using TensorFlow written in Python. The models are fully trained and tested using Google Colab, accessed through a computer with Intel(R) Core(TM) i5-10200H CPU with 2.40 GHz, 8 GB RAM, and GeForce GTX 1650 Ti with Max-Q Design.

Result and discussion

Model architecture, transfer learning with mobilenet-v2 and inception-v3.

In the first scenario, we freeze all the layers of pretrained models. This means that the parameters of the pretrained model were not retrained. The input data flowed either forward pass or backward pass through the feature extraction layer without performing any updates. This first scenario allowed the model to train only on the fully connected layer. In the second scenario, we perform fine-tuning by activating the last half of the pretrained layer. The total layers in MobileNet-V2 and Inception V3 are 154 and 311 layers, respectively. Therefore, we activated the last 77 layers in MobileNet-V2 and the last 155 layers in Inception-V3. The Adam optimizer was applied to prevent overfitting by adding a learning rate parameter with 0.0001. The difference in the number of parameters is shown in Table 3 , while the illustrations of transfer learning architectures are shown in Fig.  5 .

figure 5

Architectures of transfer learning models.

Full learning model

For the build-from-scratch technique, we used the Taguchi method to find a robust combination between the feature extraction layer and the epoch to be used. The Taguchi method has been widely applied to obtain the optimal architecture design and hyperparameter setting of CNN, thus avoiding time-consuming trial-and-error methods 47 . Here, the options of the number of convolution layers include three levels of 4, 5, and 6 of feature extraction layers. Three of the initial feature extraction layers were identical. The 4th, 5th, and 6th convolution layers were convolution layers with 64 filters, with each filter size being 3 × 3 and using the ReLU activation function. Each convolution layer is followed by a pooling layer with a pool size of 2 × 2. The epoch parameters have levels of 50, 100, and 150 epochs.

The validation accuracy was used for the output, while larger values are better for the type of S/N ratio used. The combination design and its results are obtained as shown in Table 4 , while the data analysis output is presented in Fig.  6 .

figure 6

Taguchi results.

Based on the main effect plot for the S/N ratio and the main effects plot for the mean in Fig.  6 , the robust design combines the number of feature extraction layers at level 5 and the number of epochs at level 150. This combination provides higher accuracy and can produce a system that is not sensitive to changes.

Next, we used five types of convolution layers, each with 16 filters on the first convolution, 32 on the second convolution, and 64 on the third, fourth, and fifth convolutions. The filter size for all convolution layers is 3 × 3. Each convolution layer was followed by a max-pooling layer with a pool size of 2 × 2. We also used the global average pooling layer to convert the feature matrix into a vector for a fully connected layer with two dense layers. The first dense layer had 512 neurons, and the second dense layer (output layer) had 9 neurons, according to the number of classes in the input data model. The activation function used for each layer was ReLU, except for the second dense layer (output layer), which uses Softmax activation because it adapts to the type of problem, which is multiclass classification.

With that architecture, the model had a total of 135,337 trainable parameters. A training process with 150 epochs carried out all these parameters. The architecture of the full learning model is depicted in Fig.  7 .

figure 7

Full learning architecture.

Training result

Training processes were executed using the hyperparameter values described in Sect. “ Experimental setup ”. Specifically, the number of convolutional layers and epochs for the full learning model was determined based on the Taguchi results. We used two scenarios approach for the transfer learning models during the training process. The training process for the MobileNet-V2, Inception-V3, and full learning models were presented in Figs. 8 , 9 and 10 , respectively.

figure 8

Training process for transfer learning MobileNet-V2 mode.

figure 9

Training process for transfer learning Inception-V3 model.

figure 10

Training process for full learning model.

The horizontal green line in Figs.  8 and 9 represents the boundary between the first and second scenarios. The training process was conducted for 100 epochs, with each scenario lasting for 50 epochs. Before and after the training process began, the model was evaluated on the validation data. It can be observed that there is a significant improvement in accuracy and a notable reduction in loss. Thus, through the training process using both the first and second scenarios, the model's performance has been successfully enhanced, as evidenced by increased accuracy and decreased loss. This indicates that the model has effectively learned from the training data and can generalize well on the validation data.

Based on the evaluation results on the accuracy, precision, and recall matrices from epoch to epoch on the training and validation data for each type of existing model, it can be concluded that the transfer learning model with Inception-V3 gives the best results until the last epoch. The transfer learning model with Inception-V3 shows a consistent positive trend in the training and validation data without significant fluctuations in the validation data, so there is no overfitting identified. The indications of overfitting and underfitting are also not found on the training graph of MobileNet-V2 and the full learning model, thus indicating proper training for all models.

Further, the training results on the matrix evaluation of accuracy, precision, recall, and loss for each at the end of the training process can be seen in Fig.  11 . Here, the transfer learning model with Inception-V3 can provide optimal results on training data and validation data, the transfer learning model with MobileNet-V2 shows less consistent results between training and validation results. In contrast, the full learning model shows conditions that are less than optimal compared to other models, scoring the lowest precision and accuracy, as well as highest loss both in training and validation dataset. In addition, the evaluation results on the validation data conducted after the training process can be seen in Fig.  12 .

figure 11

Training results comparison between all models.

figure 12

Evaluation of validation data.

Based on Fig.  12 , the transfer learning model with Inception-V3 has the highest accuracy value and the smallest loss in the evaluation process using validation data. This model shows the best performance, which indicates that the training process in the transfer learning model with Inception-V3 is more effective than the other models. A comparison of training results is also reviewed from the time of the training process for the three models. The training time comparison is shown in Fig.  13 .

figure 13

Training time (in minutes) comparison.

Based on Fig.  13 , the transfer learning model with Inception-V3 requires a shorter training time when compared to the other models with 72.25 min. Noted that, although Inception-V3 needed longer training time than MobileNet-V2 on the original training using the ImageNet, here the training times for both transfer learning models are not significantly different. Meanwhile, the training time for the full learning model is the longest at 110.57 min. This is because the model requires more time to learn from the beginning with all layers activated, and its model was trained using 150 epochs.

Overall, the transfer learning model with Inception-V3 and the fine-tuning process that has been carried out has produced optimal results in the training process. Fine-tuning in Inception-V3 successfully adapts the model effectively with a shorter training time on the input data.

Testing results

After the training and validation processes are executed, the testing process is conducted to evaluate the performance of the detection model in generalizing new data. Based on the testing process on the same test data, a comparison of the total accuracy results on the test data of the three models that have been built is carried out. The results indicate that the transfer learning model with Inception-V3 has a better total accuracy value than other models, with the accuracy of the transfer learning model with Inception-V3 at 0.96 with MobileNet-V2 at 0.89 and the full learning model at 0.87. This indicates that the ability of the transfer learning model with Inception-V3 to recognize the true class of the entire data is better than the other models. The comparison of the accuracy was also conducted for each class; the results of the comparison are shown in Fig.  14 .

figure 14

Testing accuracy.

Based on Fig.  14 , the transfer learning model with Inception-V3 performs better accuracy than other models for each class in the input data. This indicates that the transfer learning model with Inception-V3 can make predictions that generate quite a lot of true positives and true negatives. Only in the awe class does the class accuracy of the transfer learning model with Inception-V3 have the same value as the transfer learning model with MobileNet-V2.

The comparison of the testing results is also carried out on the recall matrix for the three models that have been built, as shown in Fig.  15 . The result indicates that the recall value of the transfer learning model with Inception-V3 is superior in most classes. Only in the fear class the recall of the Inception-V3 model is smaller than the recall from the transfer learning model with MobileNet-V2, whereas in the anger and awe classes, the accuracy of the transfer learning model with MobileNet-V2 and Inception-V3 shows the same value. In the amusement class, the recall of the build model from scratch is superior to the other models, and in the enthusiasm and neutral classes, both recalls are the same. In other classes, the recall of the transfer learning model with Inception-V3 is superior to other models.

figure 15

Testing recall.

A comparison of the testing results is also carried out from the perspective of the precision value of each class. The comparison results are shown in Fig.  16 . Based on Fig.  16 , the precision of the transfer learning model with Inception-V3 dominates in enthusiasm and neutral classes compared to the other models. In the amusement, awe, disgust, and liking types, the precision is the same as the transfer learning model with MobileNet-V3. While in the sadness, the model has lower precision. In the sadness, the precision is dominated by the transfer learning model with MobileNet-V2, and in the anger, precision is the same between the build model from scratch and transfer learning with Inception-V3.

figure 16

Testing precision.

Through F1-Score, each class's recall and precision values are combined so that a more comprehensive analysis can be obtained. Based on Fig.  17 , the transfer learning model with Inception-V3 performs better than other models for each class in the input data. However, in the awe class, the F1-Score value of the Inception-V3 is the same as the F1-Score value of the MobileNet-V2, which means that both have comparable performance in that class.

figure 17

F1-score in testing data.

The assessment includes a review of the testing duration for each model when evaluating a single image. The testing time is crucial since it influences the applicability of the developed model to be implemented in the real system. These findings on testing time are depicted in Fig.  18 . According to the data presented in Fig.  18 , the Inception-V3 model exhibits a longer testing period compared to the other models. This is due to the greater complexity of the transfer learning model with Inception-V3 compared to other models. In contrast to the build-from-scratch model, which has a much simpler model complexity, it has quicker time processing. Nevertheless, the testing time of all models is considerably fast enough to be implemented for real-time detection.

figure 18

Testing time (in seconds) comparison.

The testing results confirm that the transfer learning model with Inception-V3 is the best choice in dealing with the task of classifying emotions in the Emognition dataset. Better performance on these evaluation matrices indicates the model's ability to capture a balanced overall classification performance and recognize certain emotion classes. Although the transfer learning model with Inception-V3 has a longer testing time, the testing time is not significantly different from other models.

Experiments on JAFFE and KDEF datasets

To further validate the robustness and versatility of the proposed models, extensive testing was conducted using two well-established facial emotion recognition datasets: the Japanese Female Facial Expression (JAFFE) dataset 48 and the Karolinska Directed Emotional Faces (KDEF) dataset 49 . These datasets are frequently utilized in the FER field due to their diverse representation of facial emotions and have historically served as benchmarks for evaluating the effectiveness of FER algorithms.

The models used for these experiments were adapted from the best-performing models on the Emognition dataset, as detailed in Sect. “ Testing results ”. This approach leverages the sophisticated feature extraction capabilities already developed for the Emognition models, thus providing a strong foundation for recognizing emotions in JAFFE and KDEF images. The transfer learning technique was applied, utilizing the same hyperparameter configurations as outlined in Sects. “ Model architecture ” and “ Training result ”, ensuring consistency in model training and evaluation.

The performance of the adapted models on the JAFFE and KDEF datasets was assessed based on their testing accuracy and F1-scores. These metrics are critical for comparing the efficacy of the proposed models against existing models documented in recent literature. The results are systematically presented in Table 5 , which includes comparative data from other recent studies.

The analysis demonstrates that the proposed models, especially Inception-V3 transfer learning model, are effective for facial emotion recognition tasks across different datasets. When compared to other research, the proposed models are competitive, often outperforming or matching other state-of-the-art results. The model of Dada et al. 53 , Lasri et al. 55 , and Baygin et al. 56 stand out with slightly higher metrics on JAFFE. However, it should be noted that the proposed models are trained and tuned using the pre-processing and hyperparameters specifically tailored for Emognition dataset, while the existing models were developed and trained for JAFFE and KDEF datasets. The proposed models could be further refined to enhance their accuracy and F1-scores, potentially by incorporating techniques from the best-performing models in the literature or by further tuning their architectures and training parameters according to the respective datasets.

To provide a clearer visual representation of model effectiveness, Figs.  19 and 20 display the confusion matrices for the JAFFE and KDEF datasets, respectively. These matrices illustrate the precision of emotion classification across different emotions provided by the datasets, highlighting the models' strengths and areas for improvement in recognizing specific emotional expressions.

figure 19

Confusion matrix of the models on JAFFE dataset.

figure 20

Confusion matrix of the models on KDEF dataset.

The comparative analysis of these models on two different datasets provides important insights into their respective strengths and weaknesses. InceptionV3 emerges as the most consistent and reliable model, particularly advantageous in settings where high accuracy across a diverse range of emotions is required. The MobileNetV2 and Full Learning models, while effective, demonstrate specific areas where enhancement is needed, particularly in the accurate classification of negative emotions. The lower accuracy of MobileNetV2 transfer learning models was notable in sad expression where there is a 25% misclassification rate primarily involving confusion with angry and disgust on the JAFFE dataset. It also showed less precision with angry expression on the KDEF dataset. The full learning model showed some limitations with disgust where it only achieves 75% accuracy with notable misclassifications involving fear and sad on the JAFFE dataset. Meanwhile on the KDEF dataset, the misclassifications primarily involve angry and disgust, indicating a recurring challenge in distinguishing between closely related negative emotions. Further improvement may include fine-tuning and hyperparameter optimization which could offer significant benefits, particularly for models showing potential yet inconsistent performance across emotional categories.

Analyzing the training and testing outcomes shows that the transfer learning model utilizing Inception-V3 outperforms the other models. The transfer learning model with MobileNet-V2 shows less optimal performance compared to the transfer learning model with Inception-V3. This is caused by overfitting after the fine-tuning technique is applied (scenario 2) during the training process. As such, the fine-tuning technique is less suitable when applied to MobileNet-V2 with an Emognition dataset. In addition, the number of parameters used in the training process in the transfer learning model with MobileNet-V2 is less when compared to the transfer learning model with Inception-V3. MobileNet-V2 requires more data to be used in the training process. This is also in line with the finding of Abdulsattar and Hussain 37 , who found that the transfer learning model with MobileNet-V2 had less than optimal results compared to other models.

Although the results indicate that full learning model shows less optimal performance when compared to the overall transfer learning model, the difference on testing accuracy with MobileNet-V2 is not significant. Moreover, the full learning model consumes the lowest testing time among the three models. The full learning model has 135,337 trainable parameters, while the MobileNet-V2 transfer learning model has 1,320,969 trainable parameters in scenario 1 and 3,578,953 in scenario 2. This stark difference in complexity results in notably faster training and testing times for the full learning model. Hence, this results indicates that full learning model is still promising enough to be developed for specific task such as for FER. Several improvements in the datasets, architectural design, and training setting can be made to increase the performance of full learning model.

One of the possible reason for full learning model’s ineffectiveness is due to the lack of training data used by the model to start the fitting process from scratch. The limited data for each category within the Emognition dataset may have hindered the full learning model from reaching a high level of accuracy. Transfer learning models, which benefit from pre-training on extensive and varied datasets, are better equipped for identifying and learning new features and patterns. They come with a foundational understanding of class features, simplifying the classification task. To overcome the data scarcity issue, subsequent research could create an expansive database specifically for FER images to train full learning models more efficiently. This is also in line with what was conveyed by Raja Sekaran et al. 13 , where the use of large amounts of data is needed to support success in creating a CNN model from scratch. Moreover, future research could aggregate existing datasets to form a comprehensive and sizable dataset for this purpose.

Another possible reason could be that the hyperparameters employed were not ideally suited for the problem addressed in this research. Although this study has considered determining the optimal value of number of convolutional layers and number of epoch, there are several other hyperparameters that the optimal values can be explored, such as number of filters, filter sizes, number of neurons, activation functions, regularization, learning rate, and batch size. As such, comprehensive experimentation on these hyperparameters would be necessary.

In addition, improving architectural design of full learning model can be explored by several approach. Normalization techniques like batch or layer normalization can be instrumental in stabilizing the training process, thereby accelerating convergence. To combat overfitting and foster generalization, dropout layers or regularization methods can be strategically integrated into the model. The use of residual connections, inspired by architectures such as ResNet, can be added for allowing the training of deeper networks by enabling the direct flow of gradients. Ensemble methods, which combine the strengths of various models or architectures, can also be employed to improve overall accuracy. Incorporating the aforementioned enhancements into subsequent research has the potential to pave the way for the development of FER models that are not only more sophisticated and precise but also offer greater clarity in their decision-making processes.

Further, this study posseses several limitations that could provide opportunities for future development. The developed models are based on static images for making predictions. The use of static images has several disadvantages as it lacks of temporal context in which static images do not capture the dynamic nature of facial expressions, missing out on temporal cues and changes over time that can be critical for accurately recognizing emotions. In addition, real-world emotional expressions often involve subtle movements and transitions. Static images cannot fully represent these micro-expression, potentially leading to oversimplified models that struggle with the complexity of real facial expressions. A single static image may also not adequately represent the range of expressions associated with a particular emotion, leading to reduced generalizability of the model. Therefore, the future studies can be directed for developing video-based deep learning model for FER. Utilizing video datasets offers a significant benefit by recording facial expressions as they evolve over several frames, delivering a richer, more detailed basis for emotion classification than what static image datasets can provide. This dynamic capture of facial changes enhances the model's ability to accurately recognize a wider range of emotions 57 . Based on the analysis on the JAFFE and KDEF datasets, the proposed models offer promising performance. Hence, future studies can further implement and evaluate the proposed architecures in several other datasets to assess the robustness of the model, especially its ability to learn different emotions.

This study utilizes a form of CNN that is unexplainable, meaning the internal decision-making process of the model is not transparent to the researchers. Future research could focus on explainable artificial intelligence (XAI), focusing on interpretability enhancement to understand how the model processes and classifies input images. The implementation of Explainable Artificial Intelligence (XAI) for Facial Emotion Recognition (FER) holds particular significance for critical and sensitive sectors like police investigations, psychology, judiciary, and healthcare due to several key reasons. In sensitive applications, understanding how decisions are made is crucial for establishing trust. XAI provides transparency into the decision-making process of FER systems, enabling stakeholders to comprehend why a particular emotion was recognized, which is vital for building confidence in the system’s outputs. In environments like judiciary or police investigations, the accuracy of emotion recognition can have profound implications. XAI helps ensure that the conclusions drawn are based on valid, understandable reasoning, which is essential for accountability, especially in legal contexts where decisions can affect the outcomes of cases or investigations. The ethical use of FER in psychology and healthcare requires careful consideration of privacy and consent, as well as the potential consequences of misinterpretation. XAI enables a deeper scrutiny of the ethical implications of deploying such technology by making the operational logic accessible and comprehensible.

Conclusions

This study introduces an image-based computer vision approach for developing deep learning techniques to automate Facial Emotion Recognition (FER) using the Emognition dataset. The dataset consists of ten expression—amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness, and neutral. At first, the dataset is subjected to a series of preprocessing steps to transform it into a clean dataset containing 2535 facial images. Subsequently, this dataset is split into 2028 images for training, 253 for validation, and 254 for testing. The development of CNN models involves two distinct methods: transfer learning with fine-tuning using pre-trained models Inception-V3 and MobileNet-V2 and the creation of a CNN model from scratch.

The experimental results demonstrate that all three proposed CNN models perform admirably in classifying emotions across the 9 emotion classes. The training and testing outcomes consistently support the conclusion that the transfer learning model, specifically Inception-V3, exhibits superior performance compared to the other models. This finding also underscores the effectiveness of the fine-tuning process applied to Inception-V3 in adapting the model to the input data. The detail analysis also indicates that the developed models succesfully predicted several new emotions unique in Emognition Datasets, which are amusement, enthusiasm, awe, and liking with high accuracy. Furthermore, this research holds promise for practical implementation in various domains, including marketing, mental health, education, application development, and beyond.

Data availability

The datasets in this study are provided by the Emognition team. Please refer to the following article for the datasets: S. Saganowski, J. Komoszyńska, M. Behnke, B. Perz, D. Kunc, B. Klich, L. D. Kaczmarek, and P. Kazienko, “Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables,” Sci Data, vol. 9, no. 1, pp. 1–9, Dec. 2022, https://doi.org/10.1038/s41597-022-01262-0 .

Krishna, A. H., Sri, A. B., Priyanka, K. Y. V. S., Taran, S. & Bajaj, V. Emotion classification using EEG signals based on tunable-Q wavelet transform. IET Sci. Meas. Technol. 13 (3), 375–380. https://doi.org/10.1049/iet-smt.2018.5237 (2019).

Article   Google Scholar  

Ismael, A. M., Alçin, Ö. F., Abdalla, K. H. & Şengür, A. Two-stepped majority voting for efficient EEG-based emotion classification. Brain Inform. 7 (1), 1–12. https://doi.org/10.1186/s40708-020-00111-3 (2020).

Lerner, J. S., Li, Y., Valdesolo, P. & Kassam, K. S. Emotion and decision making. Annu. Rev. Psychol. 66 , 799–823. https://doi.org/10.1146/annurev-psych-010213-115043 (2015).

Article   PubMed   Google Scholar  

Aslan, M. CNN based efficient approach for emotion recognition. J King Saud Univ. Comput. Inf. Sci. 34 (9), 7335–7346. https://doi.org/10.1016/j.jksuci.2021.08.021 (2022).

Mehrabian, A. Nonverbal Communication 1st edn. (Routledge, 1972).

Google Scholar  

Gautam, C. & Seeja, K. R. Facial emotion recognition using Handcrafted features and CNN. Procedia Comput. Sci. 218 , 1295–1303. https://doi.org/10.1016/j.procs.2023.01.108 (2023).

Andalibi, N. & Buss, J. The human in emotion recognition on social media: attitudes, outcomes, risks. In: Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, (2020). https://doi.org/10.1145/3313831.3376680 .

Jacintha, V., Simon, J., Tamilarasu, S., Thamizhmani, R., Thanga Yogesh, K. & Nagarajan, J. A review on facial emotion recognition techniques. In Proceedings of the 2019 IEEE International Conference on Communication and Signal Processing, ICCSP 2019, Institute of Electrical and Electronics Engineers Inc., pp. 517–521 (2019). https://doi.org/10.1109/ICCSP.2019.8698067 .

Ko, B. C. A brief review of facial emotion recognition based on visual information. Sensors 18 (2), 1–20. https://doi.org/10.3390/s18020401 (2018).

Article   MathSciNet   Google Scholar  

Suk, M. & Prabhakaran, B.: Real-time mobile facial expression recognition system-a case study In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 132–137 (2014). https://doi.org/10.1109/CVPRW.2014.25

Tian, Y., Luo, P., Wang, X. & Tang, X. Pedestrian Detection aided by deep learning semantic tasks. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5079–5087, 2015. https://doi.org/10.1109/CVPR.2015.7299143

Walecki, R., Ognjen, R., Pavlovic, V., Schuller, B., Pantic, M. Deep structured learning for facial action unit intensity estimation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3405–3414 (2017). https://doi.org/10.1109/CVPR.2017.605

Raja Sekaran, S.A.P.C., Lee, P., Lim, K.M. Facial emotion recognition using transfer learning of AlexNet. In 2021 9th International Conference on Information and Communication Technology, ICoICT 2021, pp. 170–174, 2021. https://doi.org/10.1109/ICoICT52021.2021.9527512 .

Saganowski, S. et al. Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables. Sci Data 9 (1), 1–9. https://doi.org/10.1038/s41597-022-01262-0 (2022).

Eilenberger, S.D. Amusement device for registering emotion (1943). https://patents.google.com/patent/US2379955 .

Piroozfar, P., Farooqi, I., Judd, A., Boseley, S., Farr, E.R.P. VR-enabled participatory design of educational spaces: an experiential approach. In: International Conference on Construction in the 21st Century, pp. 496–502 (2022).

Zhu, H., Duan, X. & Su, Y. Is the sense of awe an effective emotion to promote product sharing: Based on the type of awe and tie strength. J. Contemp. Mark. Sci. 4 (3), 325–340. https://doi.org/10.1108/jcmars-10-2021-0036 (2021).

Kanade, T., Cohn, J.F., Tian, Y. Comprehensive database for facial expression analysis. In Proceedings - 4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000 (2000). https://doi.org/10.1109/AFGR.2000.840611 .

Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z. & Matthews, I. (2010) The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010 (2010). https://doi.org/10.1109/CVPRW.2010.5543262 .

Agostinelli, F., Anderson, M.R. & Lee, H. Adaptive multi-column deep neural networks with application to robust image denoising. Adv. Neural Inf. Process. Syst. (2013).

Lyons, M., Akamatsu, S., Kamachi, M. & Gyoba, J. Coding facial expressions with Gabor wavelets. In: Proceedings - 3rd IEEE International Conference on Automatic Face and Gesture Recognition, FG 1998 (1998). https://doi.org/10.1109/AFGR.1998.670949 .

Ebner, N. C., Riediger, M. & Lindenberger, U. FACES-a database of facial expressions in young, middle-aged, and older women and men: Development and validation. Behav. Res. Methods 42 (1), 351–362. https://doi.org/10.3758/BRM.42.1.351 (2010).

Langner, O. et al. Presentation and validation of the radboud faces database. Cogn. Emot. 24 (8), 1377. https://doi.org/10.1080/02699930903485076 (2010).

Li, S. & Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 13 (3), 1195–1215. https://doi.org/10.1109/TAFFC.2020.2981446 (2022).

Kunc, D., Komoszyńska, J., Perz, B., Kazienko, P. & Saganowski, S. Real-life validation of emotion detection system with wearables. Lecture Notes Comput. Sci. https://doi.org/10.1007/978-3-031-06527-9_5 (2022).

Kune, D. Unsupervised learning for physiological signals in real-life emotion recognition using wearables. In 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022 (2022). https://doi.org/10.1109/ACIIW57231.2022.10086004 .

Perz, B. Personalization of emotion recognition for everyday life using physiological signals from wearables. In 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022 (2022). https://doi.org/10.1109/ACIIW57231.2022.10086031 .

Kunc, D., Komoszynska, J., Perz, B., Saganowski, S., Kazienko, P. Emognition system—Wearables, physiology, and machine learning for real-life emotion capturing. In 2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2023 (2023). https://doi.org/10.1109/ACIIW59127.2023.10388097 .

Manalu, H. V. & Rifai, A. P. Detection of human emotions through facial expressions using hybrid convolutional neural network-recurrent neural network algorithm. Intell. Syst. Appl. 21 , 200339. https://doi.org/10.1016/j.iswa.2024.200339 (2024).

Gupta, S. Facial emotion recognition in real-time and static images. In Proceedings of the 2nd International Conference on Inventive Systems and Control, ICISC 2018 (2018). https://doi.org/10.1109/ICISC.2018.8398861 .

Anthwal, S. & Ganotra, D. An optical flow based approach for facial expression recognition. In: 2019 International Conference on Power Electronics, Control and Automation, ICPECA 2019—Proceedings, pp. 1–5, 2019. https://doi.org/10.1109/ICPECA47973.2019.8975442 .

Supta, S.R., Sahriar, M.R., Rashed, M.G., Das, D. & Yasmin, R. An effective facial expression recognition system. In Proceedings of 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering, WIECON-ECE 2020, pp. 66–69, 2020. https://doi.org/10.1109/WIECON-ECE52138.2020.9397965 .

Ramalingam, S. & Garzia, F. Facial expression recognition using transfer learning. In: Proceedings—International Carnahan Conference on Security Technology, pp. 1–5 (2018). https://doi.org/10.1109/CCST.2018.8585504 .

Agobah, H., Bamisile, O., Cai, D., Bensah Kulevome, D.K., Darkwa Nimo, B. & Huang, Q. Deep facial expression recognition using transfer learning and fine-tuning techniques. In: EI2 2022 - 6th IEEE Conference on Energy Internet and Energy System Integration, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 1856–1860. https://doi.org/10.1109/EI256261.2022.10116540 .

Gondkar, A., Gandhi, R., Jadhav, N. Facial emotion recognition using transfer learning: A comparative study. In: 2021 2nd Global Conference for Advancement in Technology, GCAT 2021 (2021) https://doi.org/10.1109/GCAT52182.2021.9587608

Sajjanhar, A., Wu, Z., Wen, Q. Deep learning models for facial expression recognition. In 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018, pp. 1–6 (2019). https://doi.org/10.1109/DICTA.2018.8615843 .

Abdulsattar, N.S. & Hussain, M.N. Facial expression recognition using transfer learning and fine-tuning strategies: a comparative study. In: Proceedings of the 2nd 2022 International Conference on Computer Science and Software Engineering, CSASE 2022, pp. 101–106 (2022). https://doi.org/10.1109/CSASE51777.2022.9759754 .

Shi, W. & Jiang, M. Fuzzy wavelet network with feature fusion and LM algorithm for facial emotion recognition. In: Proceedings of 2018 IEEE International Conference of Safety Produce Informatization, IICSPI 2018, pp. 582–586 (2019). https://doi.org/10.1109/IICSPI.2018.8690353 .

Oztel, I., Yolcu, G., Oz, C. Performance comparison of transfer learning and training from scratch approaches for deep facial expression recognition. In: UBMK 2019 - Proceedings, 4th International Conference on Computer Science and Engineering, pp. 1–6 (2019) https://doi.org/10.1109/UBMK.2019.8907203 .

Meena, G., Mohbey, K. K., Indian, A., Khan, M. Z. & Kumar, S. Identifying emotions from facial expressions using a deep convolutional neural network-based approach. Multimed. Tools Appl. 83 (6), 15711–15732. https://doi.org/10.1007/s11042-023-16174-3 (2024).

Bilotti, U., Bisogni, C., De Marsico, M. & Tramonte, S. Multimodal Emotion recognition via convolutional neural networks: comparison of different strategies on two multimodal datasets. Eng. Appl. Artif. Intell. https://doi.org/10.1016/j.engappai.2023.107708 (2024).

Elgendy, M. Deep learning for vision systems (2020).

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.C. MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018) https://doi.org/10.1109/CVPR.2018.00474 .

Shaees, S., Naeem, H., Arslan, M, Naeem, M.R., Ali, S.H. & Aldabbas, H. Facial emotion recognition using transfer learning. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), pp. 1–5, 2021. https://doi.org/10.1109/GCAT52182.2021.9587608 .

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens J. & Wojna, Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308 .

Kurama, V. A Review of Popular Deep Learning Architectures: ResNet, InceptionV3, and SqueezeNet. [Online]. Available: https://blog.paperspace.com/popular-deep-learning-architectures-resnet-inceptionv3-squeezenet/ (2020).

Lin, C. J., Li, Y. C. & Lin, H. Y. Using convolutional neural networks based on a Taguchi method for face gender recognition. Electron 9 (8), 1227. https://doi.org/10.3390/electronics9081227 (2020).

Article   CAS   Google Scholar  

Lyons, M., Kamachi, M. & Gyoba, J. The Japanese female facial expression (JAFFE) dataset. [Online]. Available: https://zenodo.org/records/3451524 (1998).

Lundqvist, D., Flykt, A. & Öhman, A. The Karolinska Directed Emotional Faces—KDEF, CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet. ISBN 91-630-7164-9 (1998)

Yang, B., Cao, J., Ni, R. & Zhang, Y. Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6 , 4630–4640. https://doi.org/10.1109/ACCESS.2017.2784096 (2017).

Ullah, Z., Qi, L., Hasan, A. & Asim, M. Improved deep CNN-based two stream super resolution and hybrid deep model-based facial emotion recognition. Eng. Appl. Artif. Intell. 116 , 105486. https://doi.org/10.1016/j.engappai.2022.105486 (2022).

Reddi, P. S. & Krishna, A. S. CNN implementing transfer learning for facial emotion recognition. Int. J. Intell. Syst. Appl. Eng. 11 (4s), 35–45 (2023).

Dada, E. G., Oyewola, D. O., Joseph, S. B., Emebo, O. & Oluwagbemi, O. O. Facial emotion recognition and classification using the convolutional neural network-10 (CNN-10). Appl. Comput. Intell. Soft Comput. https://doi.org/10.1155/2023/2457898 (2023).

Sari, M., Moussaoui, A. & Hadid, A. A simple yet effective convolutional neural network model to classify facial expressions. In: Modelling and Implementation of Complex Systems: Proceedings of the 6th International Symposium. Springer International Publishing, Cham, pp. 188–202 (2021)

Lasri, I., Riadsolh, A. & Elbelkacemi, M. Facial emotion recognition of deaf and hard-of-hearing students for engagement detection using deep learning. Educ. Inf. Technol. 28 (4), 4069–4092. https://doi.org/10.1007/s10639-022-11370-4 (2023).

Baygin, M. et al. Automated facial expression recognition using exemplar hybrid deep feature generation technique. Soft Comput. 27 (13), 8721–8737. https://doi.org/10.1007/s00500-023-08230-9 (2023).

Duncan, D., Shine, G. & English, C. Facial emotion recognition in real time. Comput. Sci . 1–7 (2016).

Download references

Acknowledgements

We would like to express our sincere thanks to Prof. Stanislaw Saganowski and Dr. Joanna Komoszyńska for granting us access to the Emognition Wearable Dataset.

Author information

Authors and affiliations.

Department of Mechanical and Industrial Engineering, Universitas Gadjah Mada, Yogyakarta, Indonesia

Erlangga Satrio Agung, Achmad Pratama Rifai & Titis Wijayanto

You can also search for this author in PubMed   Google Scholar

Contributions

Erlangga Satrio Agung: Methodology, software, formal analysis, investigation, visualization, writing—original draft, writing—review and editing; Achmad Pratama Rifai: Conceptualization, methodology, formal analysis, investigation, resources, validation, visualization, writing—original draft, writing—review and editing; Titis Wijayanto: Data curation, conceptualization, writing—original draft, writing—review and editing.

Corresponding author

Correspondence to Achmad Pratama Rifai .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Agung, E.S., Rifai, A.P. & Wijayanto, T. Image-based facial emotion recognition using convolutional neural network on emognition dataset. Sci Rep 14 , 14429 (2024). https://doi.org/10.1038/s41598-024-65276-x

Download citation

Received : 13 December 2023

Accepted : 18 June 2024

Published : 23 June 2024

DOI : https://doi.org/10.1038/s41598-024-65276-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Facial emotion recognition
  • Convolutional neural network
  • Deep learning
  • Emognition dataset

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

face recognition thesis paper

IMAGES

  1. Final

    face recognition thesis paper

  2. (PDF) DESIGN OF A FACE RECOGNITION SYSTEM

    face recognition thesis paper

  3. Thesis on Face Recognition using Matlab (PDF)

    face recognition thesis paper

  4. (PDF) A Review Paper on Face Recognition Techniques

    face recognition thesis paper

  5. (PDF) A Case Study on Analysis of Face Recognition Techniques

    face recognition thesis paper

  6. Top 10 PhD Research Topics in Face Recognition [Innovative Ideas]

    face recognition thesis paper

VIDEO

  1. Face Recognition using Tensor Flow, Open CV, FaceNet, Transfer Learning

  2. Preparing for thesis paper

  3. PhD Defence: Chinmaya Mishra

  4. What is Thesis Paper?

  5. Working for a THESIS PAPER

  6. # 11 Facerecognition

COMMENTS

  1. (PDF) A Review of Face Recognition Technology

    The paper introduces the related researches of face recognition from different perspectives. The paper describes the development stages and the related technologies of face recognition.

  2. A Face Recognition Method Using Deep Learning To Identify Mask And

    facial recognition is known as the Karhunen-Loeve method. It is the most thoroughly studied. method for face recognition, with its main usability being a reduction in the dimensionality of the image. This method was first applied for face recognition and then subsequently used for facial. reconstruction.

  3. (PDF) Face Recognition: A Literature Review

    Abstract and Figures. The task of face recognition has been actively researched in recent years. This paper provides an up-to-date review of major human face recognition research. We first present ...

  4. (PDF) DEVELOPMENT OF A FACE RECOGNITION SYSTEM

    A face recognition system is designed, implemented and tested in this thesis study. The system utilizes a combination of techniques in two topics; face detection and recognition.

  5. Design and Evaluation of a Real-Time Face Recognition System using

    In this paper, design of a real-time face recognition using CNN is proposed, followed by the evaluation of the system on varying the CNN parameters to enhance the recognition accuracy of the system. An overview of proposed real-time face recognition system using CNN is shown in Fig. 1. The organization of the paper is as follows.

  6. Past, Present, and Future of Face Recognition: A Review

    Face recognition is one of the most active research fields of computer vision and pattern recognition, with many practical and commercial applications including identification, access control, forensics, and human-computer interactions. However, identifying a face in a crowd raises serious questions about individual freedoms and poses ethical issues. Significant methods, algorithms, approaches ...

  7. DEEP LEARNING FOR FACE RECOGNITION: A CRITICAL ANALYSIS

    face recognition relate to occlusion, illumination and pose invariance, which causes a notable decline in ... this paper will review all relevant literature for the period from 2003-2018 focusing on the contribution of deep neural networks in drastically improving accuracy. Furthermore, it will

  8. PDF Real-Time Face Detection and Recognition Based on Deep Learning

    etworks for face recognition improves the speed of recognition. The contributions of this thesis are: (1) The use of elliptical ma. kers can identify a human face including rotation and position. (2) The confidence of human face recognition is m. Keywords: CNNs, face recognition, data augmentation, SSD, Inception v2.

  9. Face Recognition

    600 papers with code • 23 benchmarks • 64 datasets. Facial Recognition is the task of making a positive identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces.

  10. Face Detection in Extreme Conditions: A Machine-learning Approach

    Face detectors are equipped with a photo of 2500 left or right eyes and the snapshots of the eyestrain terrible sets. Overall advantageous 94 percent and fake-fantastic thirteen percent are detected in facial detection. Eyes are detected at a fee of 88 percentages with the simplest 1 percent false nice outcome.

  11. PDF Face Recognition Student Attendance System

    Bachelor's Thesis 10 April 2021. Author Title Number of Pages Date Anil Shrestha Face recognition student attendance system 35 pages 10 April 2021 Degree Bachelor of Engineering Degree Programme Information Technology ... Facial recognition records this biometrics of the face. Different face recognition methods measure the biometric of the face.

  12. A comprehensive study on face recognition: methods and challenges

    Illumination, pose variation, facial expressions, occlusions, aging, etc. are the key challenges to the success of face recognition. Pre-processing, Face Detection, Feature Extraction, Optimal Feature Selection, and Classification are primary steps in any face recognition system. This paper provides a detailed review of each.

  13. Student Attendance Monitoring System Using Face Recognition

    Keywords: Local Binary Pattern Histogram(LBPH), Face Detection, Face Recognition, Haarcascade Classifier, Python, Student Attendance. Suggested Citation: Suggested Citation SAI, E CHARAN and HUSSAIN, SHAIK ALTHAF and KHAJA, SYED and SHYAM, AMARA, Student Attendance Monitoring System Using Face Recognition (May 22, 2021).

  14. PDF AI Facial Recognition System

    This thesis project aimed to build a facial recognition system that could recognize people through the camera and unlock the door locks. Recognized results were sent to the database and could be analyzed by users after the successful login. The project consists of building a facial recognition, electronics operation, and

  15. (PDF) Face detection and Recognition: A review

    Thesis. Full-text available. Jun 2023; ... An efficient algorithm and a database which consists of face images are needed to solve the face recognition problem. In this paper, Eigenfaces method is ...

  16. PDF 2010:040 CIV MASTER'S THESIS Face Recognition in Mobile Devices

    MASTER'S THESIS Face Recognition in Mobile Devices Mattias Junered Luleå University of Technology MSc Programmes in Engineering Media Technology Department of Computer Science and Electrical Engineering Division of Signal Processing 2010:040 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--10/040--SE.

  17. Sensors

    Over the past few decades, interest in theories and algorithms for face recognition has been growing rapidly. Video surveillance, criminal identification, building access control, and unmanned and autonomous vehicles are just a few examples of concrete applications that are gaining attraction among industries. Various techniques are being developed including local, holistic, and hybrid ...

  18. PDF Setting Intentions: Considering Racial Justice Implications of Facial

    This thesis. brings attention to facial recognition technology's (FRT) intersection with systems of race, power, and surveillance. Little research is reported on this specific technology's relationship with. these systems, and even more so, as it relates to public safety when both law enforcement.

  19. PDF What'S in Your Face? Discrimination in Facial Recognition Technology

    DISCRIMINATION IN FACIAL RECOGNITION TECHNOLOGY. A Thesis submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Masters of Arts In Communication, Culture, and Technology. By. Jieshu Wang, M. Eng.

  20. Deep Face Recognition for Biometric Authentication

    Face is one of the most widely used biometrics for human identity authentication. Facial recognition has remained an interesting and active research area in the past several decades due to its ever growing applications in biometric authentication, content based data retrieval, video surveillance, access control and social media. Unlike other biometric systems, facial recognition based systems ...

  21. (PDF) Face Recognition using Deep Learning

    Face Recognition using Dee p Learning. Banumalar Koodalsamy1*,Manikandan Bairavan Veerayan1, and Vanaja Narayanasamy1. 1 Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India. Abstract ...

  22. Facial Emotion Recognition Using Machine Learning

    Human emotions can be classified as: fear, contempt, disgust, anger, surprise, sad, happy, and neutral. These emotions are very subtle. Facial muscle contortions are very minimal and. detecting these differences can be very challenging as even a small difference results in different. expressions [4].

  23. [2406.18144] Artificial Immune System of Secure Face Recognition

    Artificial Immune System of Secure Face Recognition Against Adversarial Attacks. Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan. Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future.

  24. (PDF) Face Recognition and Face Detection Benefits and ...

    personal computers ( PC ), or traffic surveillance. Facial. recognition computer progra mming first started i n 1964, which measur ed the size of the mo uth and e yes [1]. A n. additional 21 ...

  25. Image-based facial emotion recognition using convolutional neural

    This study expands the use of deep learning for facial emotion recognition (FER) based on Emognition dataset that includes ten target emotions: amusement, awe, enthusiasm, liking, surprise, anger ...