Scientists at Russia’s National Research University Higher School of Economics have found a way around a problem that’s been plaguing programmers for years: how to teach a computer to hear emotions. The results show a 71 per cent success rate in correctly identifying a person’s emotional state just by the sound of their voice.
Admit it, it’s been a while now since you’ve been impressed by your phone’s virtual assistant. Siri, Alexa and the bunch are great for predicting the weather but much less so in comforting you because it won’t stop raining. Clever they may be, their emotional intelligence is still pretty low.
Emotion-detection itself has been around for some time now, of the facial recognition variety. There are now dozens of application programming interfaces (APIs) that can triangulate the relationship between various points on your face and conclude with a strong degree of accuracy how you’re feeling —and not just fake feeling, either.
A study back in 2014 pitted facial recognition software against regular humans in a test to see who was better at telling when someone in a video was feigning emotions such as pain or really feeling them. We mere mortals guessed right about 50 per cent of the time, basically a coin flip, but the program, which could zero in on 20 different facial muscles at a time, turned out to be right in 85 per cent of cases.
“Human facial expressions sometimes convey genuinely felt emotions, and some other times convey emotions not felt but required by a particular social context, for example expressing gratitude after receiving a terrible gift or sadness at a funeral,” said the University of Toronto’s Kang Lee, co-author of the study. “We can envisage in the very near future a widely available and inexpensive computer vision system that is capable of recognizing subtle emotions,” Lee
So that’s visual processing, but aurally-fed emotional recognition? So far, it’s been more of a challenge, say the researchers behind the new work, which was presented at the InternationalConference on Neuroinformatics in Moscow last month.
“The emotion classification problem has great potential for use in many applied industries such as robotics, tracking systems and other interactive [systems],” say the study’s authors. “Solving this problem allows for the reception of users’ feedback in a natural way …simplifying and accelerating the interaction between computer and person.”
Researchers concentrated on the peaks and valleys of a person’s voice and converted the aural pattern into a digital image (a spectrogram) through which they were able to apply methods similar to those used in image recognition. Working with a dataset of 24 actors depicting eight different emotions: neutral, calm, happy, sad, angry, fearful, disgust and surprised, the neural network was eventually able to correctly identify the emotion in 71 per cent of cases.
The program did exceptionally well in guessing when a person was expressing both neutral and calm emotions, while it had more difficulty separating happy from angry emotions. “Most likely the reason for this is that they are the strongest emotions and as a result, their spectrograms are slightly similar,” say the study’s authors.