Untitled

\documentclass[conference]{IEEEtran}
\IEEEoverridecommandlockouts
% The preceding line is only needed to identify funding in the first footnote. If that is unneeded, please comment it out.
\usepackage{cite}
\usepackage{amsmath,amssymb,amsfonts}
\usepackage{algorithmic}
\usepackage{graphicx}
\usepackage{float}
\usepackage{textcomp}
\usepackage{xcolor}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\begin{document}

\title{Interpreting Facial Expressions and Emotional States Using Machine Learning\\
{\footnotesize \textsuperscript{*}}}


\author{\IEEEauthorblockN{\textsuperscript{}Adrian Del Bosque}
\IEEEauthorblockA{\textit{Computer Science Department} \\
\textit{University of Texas Rio Grande Valley}\\
Edinburg, United States \\
[email protected]}
\and
\IEEEauthorblockN{\textsuperscript{}Kevin Jackson}
\IEEEauthorblockA{\textit{Computer Science Department} \\
\textit{University of Texas Rio Grande Valley}\\
Edinburg, United States \\
[email protected]}
}

\maketitle

\begin{abstract}
In interpersonal interactions, body language accounts for 60 percent of non verbal communication. Facial Expressions, one form of nonverbal communication, are one of the biggest metrics in determining the emotional state of a person. Therefore modeling such expressions can aid in the understanding  of human behavior. Currently there exist models  that can determine the emotional state of a person using facial recognition. In this report the authors will share and discuss their rendition as well as explore different machine learning techniques to further improve their model.
\end{abstract}


\section{Introduction}
It is a human characteristic to be able to pick up on facial expressions being conveyed by others in order to determine how they feel, what they request or their intent. An interesting idea is that these expressions (fear, anger, happy, sad, etc) may be biologically hardwired within all of us. Facts that support this claim are that these very same expressions can be found from people all over the world and from different cultures. Which contradicts the views that emotional expressions are a product of social learning.

With the current advancement in technology it is no surprise that researchers have been trying to push the "limitations" of computers today. A popular practice right now in the computer science field is Machine Learning. Which allows the program to learn and improve from experience on its own without being explicitly programmed.

In order to allow the program to learn. We first needed to understand for ourselves the features on the face that make up one of the seven basic emotions. [Anger, Fear, Disgust, Happy, Sad, Surprised, Neutral]
\section{Background}

\subsection{Facial Landmarks}

The facial action coding system (FACS) defined by Ekman and Friesen in 1978, is a system used to characterize facial expressions of human emotions. Changes in the facial landmarks (FLs) the ends of the eyebrows, bridge of the nose, eyes and points of the mouth by action units (AUs) can be used to determine the expression of an emotion[1](Fig. 1).

\begin{figure}[h!]
\centering
\includegraphics[scale=0.8]{facialLandmarks.jpg}
\caption{Facial Landmarks}
\label{fig:landmarks}
\end{figure}


\subsection{Existing Research}
Conventional FER systems use geometric features, appearance features or a mix between the two. The geometrix approach utilizes a feature vector based on facial components in image sequences using multi-class AdaBoost. Where as appearance features are extracted from the global facial region and recognized using stepwise linear discriminant analysis. The overall recognition performance for Conventional FER systems average around 59 to 70 percent accuracy for still frame images whereas Deep Learning based FERs average between 69 to 77 percent. The highest average Deep Learning FER uses a hybrid approach by utilizing the spatial image characteristics using CNN and spatial temporal features learned using LSTM. Even though  Deep Learning FER shows great success there are still limitations keeping it from a higher success rate such as computing power, datasets and solid algorithm theory.

\section{Motivation and Goals}
After bouncing ideas off of one another. We came to the conclusion that we wanted to base our project off of image recognition. Although, to ensure that our project wasn't "generic" and as original as possible. We continued brainstorming until we came up with the idea of creating a model that could detect human emotions. After searching online we had stumbled upon the existing research mentioned in the previous section.
Even though our "original idea" already had years of research established, we still decided to continue forth with the project. In our eyes, if two university students with no previous experience in machine learning could create a model that could achieve a minimum of 45 percent test accuracy. We would, by our personal standards, consider our project a success. The only thing left to do was to scour the internet in search of a data we could use.

\section{Dataset}
Thankfully, the data we collected was from a previous machine learning competition hosted by the website kaggle[2], back in 2013. Which consisted of roughly 32,000 grey scale images. Luckily for us, there was no need to preprocess the images in any way. Since each image came in the dimensions 48 x 48 and were already aligned with one another, meaning that every test subjects facial emotion was centered. The contents of the csv file contained three columns: emotion, pixel, usage. Emotion corresponded with the images facial expression ranging from a value 0 to 6 (0 = sad, 1 = happy, 2 = anger, etc). The next column pixel represented the grey scale value for the each pixel (2,304 sub-columns) of the image (refer to Fig. 2). Lastly, usage determined whether the image was to be used for training or testing. In total there was around 27,000 training and 5,000 testing images.

\begin{figure}[h!]
\centering
\includegraphics[scale = 0.3]{pixelvalues.jpg}
\caption{pixel values}
\label{fig:pixelvalue}
\end{figure}


\section{Methods}
Our goal was to find a way to connect an emotion to a facial expression using facial landmarks. The two things that came to mind was using a CNN or an RNN for our project. While both are very respected methods, each have their own uses. An RNN model would be good for a series of data that would feed back into itself, while a CNN is only based off the current input.

Therefore it was determined that a CNN would best work for our needs since neural networks are able to correlate nonlinear relations based off an input. In our case the input would be a photo and the output would be a classification number.

\begin{figure}[H]
\centering
\includegraphics[scale = 1.0]{cnnimage.jpg}
\caption{CNN Example}
\label{fig:cnnimage}
\end{figure}


In the creation of the neural network, factors such as the number of convolution and pooling layers, activation functions, window size, and optimizer all affected the accuracy. To grow on the complexity of the project, we broke down the problem into sub problems in the form of the number of classes. For example we started with two classes and a 3 layer network[]. We would then continue to increase the number of classes and make changes to the network after each iteration.

\subsection{Libraries Used}
\begin{itemize}
\item Tensorflow is an open-source math library that handled our data-flow.
\item Keras is an open-source neural-network library that we used in order to create our CNN as well as compile our results.
\item Numpy is a library that adds support for large, multi-dimensional arrays and matrices.
\item Matplotlib is a graphical plotting library that we used in order to graph our results after each iteration.
\end{itemize}

\subsection{Appending Data}
In order to find the results after each iteration of classes, we first needed to append our data accordingly for each pass. This was achieved by creating a variable that would store the number of classes we were going to test at the time. Then running a loop to select and append our desired number of classes into four separate numpy arrays, two for training and two for testing. The two sets of two contained the same format, one array for emotions and the other for pixels. Our arrays were then reshaped into 48 x 48 two dimensional arrays and were ready to pass through the CNN.

\subsection{CNN}
The model we designed was a fully connected three layer Convolutional Neural Network. Utilizing ReLU (Rectified Linear Unit) which is a linear function that outputs the input directly if it is positive, or else it ouputs a zero. The reason we chose ReLU was because it has became the default activation function for different types of neural networks. Which achieves a higher performance and is easier for training. Max pooling was used on the first layer to select the maximum pixel value of the batch and Average Pooling was used on the remaining two layers. We also found that the best number of Epochs to run our model was 25. Increasing the number of Epochs didn't change the training accuracy or testing accuracy significantly. We also didn't want to run too many epochs cause we feared that the model would over-fit the training data, essentially it would just memorize the data. We also decided to use a batch size for training rather than feeding it all the data at once. This helped us achieve a faster training time as well as save some memory in the process.

\section{Results}
Here are the results for each class iteration we ran through our finished model. Each figure (4 through 9) contains two images: a graph for the training(train) and testing accuracy(val) as well as a graph of the model loss for the corresponding class total.

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph2.jpg}
\caption{2 Classes}
\label{fig:graph2}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph3.jpg}
\caption{3 Classes}
\label{fig:graph3}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph4.jpg}
\caption{4 Classes}
\label{fig:graph4}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph5.jpg}
\caption{5 Classes}
\label{fig:graph5}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph6.jpg}
\caption{6 Classes}
\label{fig:graph6}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[scale = 0.7]{graph7.jpg}
\caption{7 Cases}
\label{fig:graph7}
\end{figure}

As you can clearly see by the graphs, the training accuracy and testing accuracy dropped after each class iteration, which proved our original hypothesis. We were also able to surpass our original goal of 45 percent test accuracy for all 7 classes with around 56 percent.

\section{Future Goals}
We would really like to achieve an accuracy of around 70 percent, since that's what the original winner of the kaggle competition produced. In the next iteration of the project we would like to attempt adding far more layers. As well as including batch normalization on the layers and some random rotations on test images so the model can better learn. During our poster presentation we had the chance to talk to some professors and students about our project results. One of the key things we left with was the idea of gathering and comparing the different results we would get if we were to round robin (for lack of a better term) different emotions for our 2 class iteration. Since, our 2 class pass was only testing the emotions corresponding to 0 and 1.

\section*{Acknowledgment}
We would like to thank Dr. Dongchul Kim for his help and guidance throughout this project. As well as giving us the idea of improving upon our model after each class iteration and understanding the results we got. We would also like to thank the students and professors who came during the poster presentation and took the time to talk to us and suggest ideas for our future project goals.


\begin{thebibliography}{00}
\bibitem{b1} Foley GN, Gentile JP. Nonverbal communication in psychotherapy. Psychiatry (Edgmont). 2010;7(6):38-44

\bibitem{b2} Challenges in Representation Learning: Facial Expression Recognition Challenge.” Kaggle, https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data

\bibitem{b3} Sohail, and Prabir Bhattacharya. “Classifying Facial Expressions Using Point-Based Analytic Face Model and Support Vector Machines: Semantic Scholar.” Undefined, 1 Jan. 1970

\bibitem{b4} Hinz, Tobias and Barros, Pablo and Wermter, Stefan (2016). The Effects of Regularization on Learning Facial Expressions with Convolutional Neural Networks.

\bibitem{b5} “Keras: The Python Deep Learning Library.” Home - Keras Documentation, https://keras.io/.

\bibitem{b6} Peixeiro, Marco. “Step-by-Step Guide to Building Your Own Neural Network From Scratch.” Medium, Towards Data Science, 21 Feb. 2019

\bibitem{b7} Algorithmia, February 28, 2018, accessed September 30, 2019 https://blog.algorithmia.com/introduction-to-emotion-recognition
\bibitem{b8} NCBI, Byoung Chul Ko, Jan 30, 2018, accessed September 30, 2019
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856145/B72-sensors-18-00401


\end{thebibliography}


\end{document}