A.I. Voice Recognition Can Only Get Better

More videos

New Advances In A.I. Voice Recognition.

Speech recognition technology has been around for many years starting first as a way to reduce the number of humans needed to answer and forward phone calls. As a consumer this has been a source of irritation after being directed and redirected from one place to another when you really want to take your frustrations out on a human.

As you can see from this funny video, human voice recognition is far from an easy thing for a computer.

As computers become smaller and more powerful and artificial intelligence programs improve their ability to recognize words with all the nuances of human speech has greatly improved. A.I. will help these systems recognize speech even through strong and varied accents and slang.

Speech recognition technology really started in the 1970s, once computers development reached the required level. As for many computer based research programs speech recognition owes it’s beginnings with funding from the U.S. Department of Defense. The DoD’s DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition.

Today it is not just the military that is investing time and money to improve computer voice recognition systems. Companies from Google to Microsoft have project teams actively working on this science.

However if computers are ever going to be truly able to communicate with humans these systems are going to become even smarter. So much of our verbal communication relies on the emotion embedded in voice. New research is making strides in this challenge using the latest developments in supercomputing and artificial intelligence.

Experts of the Faculty of Informatics, Mathematics, and Computer Science at the Higher School of Economics have created an automatic system capable of identifying emotions in the sound of a voice.

Computer recognizes emotion in human speech

Computer recognizes emotion in human speech

For a long time, computers have successfully converted speech into text. However, the emotional component, which is important for conveying meaning, has been neglected. For example, for the same question ‘Is everything okay?’, people can answer ‘Of course it is!’ with different intonations: calm, provoking, cheerful, etc. And the reactions will be completely different.

Recognizing emotion in speech is a huge challenge

Recognizing emotion in speech is a huge challenge

Neural networks are processors connected with each other and capable of learning, analysis and synthesis. This smart system surpasses traditional algorithms in that the interaction between a person and computer becomes more interactive.

HSE researchers Anastasia Popova, Alexander Rassadin and Alexander Ponomarenko have trained a neural network to recognize eight different emotions: neutral, calm, happy, sad, angry, scared, disgusted, and surprised. In 70% of cases the computer identified the emotion correctly, say the researchers.

The researchers have transformed the sound into images – spectrograms – which allowed them to work with sound using the methods applied for image recognition. A deep learning convolutional neural network with VGG-16 architecture was used in the research.

The researchers note that the programme successfully distinguishes neutral and calm tones, while happiness and surprise are not always recognized well. Happiness is often perceived as fear and sadness, and surprise is interpreted as disgust.


Their report was presented at a major international conference – Neuroinformatics-2017. https://link.springer.com/chapter/10.1007/978-3-319-66604-4_18

(Visited 38 times, 1 visits today)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>