Why Does Computer Speech Recognition Suck?

Please Make Speech Recognition Work Right!

We recently bought a new car from Toyota. We love the car and it is perfect for our lifestyle but we have one major complaint. The voice recognition system for the Navigation cannot recognize what I am saying or what my wife is saying. It is so frustrating. In fact it is so useless that we turn to my iPhone for directions and search.

So what is the problem? Why does the this voice recognition system work so poorly while my previous system in a Jeep Grand Cherokee worked great? This is still a major stumbling block for voice to be used as the primary method of communication with a computer and the internet. Home systems made by Google and Amazon like the Echo, rely on being able to understand the commands of a user. Advanced artificial intelligence programs are at the heart of successful voice recognition.

Speech synthesis on the other hand seems to be well advanced and getting better everyday. If we are ever going to be able to completely interact with computers and robots, two way communication needs to be seamless. an example is the cover picture we chose to use from Hanson Robotics of their latest A.I. named Sophia. Honestly her voice recognition powered by the latest in A.I. science is pretty darn good.

Here some of the latest research into machine learning of human speech and language or speech recognition.

A group of researchers at Osaka University has developed a new method for dialogue systems*1. This new method, lexical acquisition through implicit confirmation, is a method for a computer to acquire the category of an unknown word over multiple dialogues by confirming whether or not its predictions are correct in the flow of conversation.

Teaching a computer to understand the spoken word

Example of implicit confirmation: 1. Predict category of unknown word. 2. Generate implicit confirmation request with category c. 3. Determine if the category c is correct from user response.

Many conversation robots, chatbots, and voice assistant apps have appeared in recent years; however, in these systems, computers basically answer questions based on what has been preprogrammed. There is another method in which a computer learns from humans by asking simple repetitive questions; however, if the computer asks only questions such as “What is xyz?” in order to acquire knowledge, users will lose interest in talking with the computer.

The group led by Professor Komatani developed an implicit confirmation method by which the computer acquires the category of an unknown word during conversation with humans. This method aims for the system to predict the category of an unknown word from user input during conversation, to make implicit confirmation requests to the user, and to have the user respond to these requests. In this way, the system acquires knowledge about words during dialogues.

In this method, the system decides whether the prediction is correct or not by using the user response following each request, its context, by using machine learning*2 techniques. In addition, this system’s decision performance improved by taking the classification results gained from dialogues with other users into consideration.

More about computer speech recognition science from Microsoft

Chatbots in the market speak to anyone in the same manner. However, as dialogue systems become popular in the future, computers will be required to speak by learning from a conversational partner according to the situation. This group’s research results are a new approach towards the realization of dialogue systems in which a computer can become smarter through conversation with humans and will lead to the development of dialogue systems with the ability to customize responses to the user’s situation.


*1 Dialogue system

A dialogue system (or conversational system), a part of artificial intelligence, is a computer system intended to converse with a human in natural language. Many speech-enabled IVR (interactive voice response) applications, humanoid robots, and text-based chatbots have been developed in recent years.

*2 Machine learning (AI)

Machine Learning is a method using algorithms to analyze data, learn from it, and then make a determination or prediction. Machine learning includes supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the computer is trained with a set of examples (dataset) that contains the correct answer; through which it becomes able to make judgements in different situations.

(Visited 111 times, 1 visits today)

Why Does Computer Speech Recognition Suck?

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.