Feb 4, 2019
Neuro-engineers from Columbia University have developed an Artificial Intelligence based system which can translate thoughts into recognizable speech.
Sounds interesting?? Yes… The technology is based on Artificial Intelligence, which has made it possible for computers to communicate directly with the brain. This is done by monitoring someone’s brain activity and constructing the words even when they didn’t speak.
Few days back, while I was taking Artificial Intelligence lecture, one of my students asked me whether it is possible to capture brain signals of a person and understand it before he speaks anything? I was astonished with his question and we discussed a lot about the research going on in this field. Researchers have been working in this domain since decades. They have found that when people speak or listen someone speaking, some patterns of signals are formed in their brain. A lot of work has been done in past to record and decode these patterns, in order to understand them. Researchers believed that the thoughts inside the brain can be translated into verbal speech. But, this has always remained a challenging task to achieve.
This was not the first time Dr. Mesgarani along with his team tried to accomplish the goal of speech synthesis. Their previous attempts were around typical computer models to analyze spectrograms, which could not produce resembling intelligible speech. This made them to try a vocoder to achieve better accuracy.
Vocoder
Vocoder is an audio processor which is capable of synthesizing speech by characterizing the elements of an audio signal. In the experiment above, a vocoder algorithm is used which works by synthesizing speech based on its training on recordings of people. According to Dr. Mesgarani, associate professor of electrical engineering at Columbia’s Fu Foundation School of Engineering and Applied Science, “This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions.” To accomplish the goal of training the vocoder to interpret the brain activity, Dr. Mesgarani’s team took the advantage of expertise of Dr. Dinesh Mehta, who is a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute.
To train the vocoder, epilepsy patients undergoing brain surgery were asked to listen to sentences spoken by different people while their neural patterns were measured. The same patients were also asked to listen to digits between 0 to 9 by speakers, while recording their brain signals through the vocoder. The sound produced by the vocoder in response to those signals was analyzed and processed by artificial neural networks that mimics biological neural network.
As a result of the experiment, a robotic voice recites the signals captured and decoded. The accuracy of the system is tested by listening to the speech recorded and actually heard.
Applications
The team behind the speech synthesis system see a variety of applications. It can prove to be helpful for people who cannot speak by giving them capability to communicate with the world.
Findings
Switching to vocoder produced improved results as compared to spectrogram-based attempts. The sensitivity of vocoder and power of neural networks brought surprising accuracy to the team. The findings of experiment as published in Scientific Reports.
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” said Nima Mesgarani, PhD, the paper’s author and principal investigator at Mind Brain Behavior Institute. “With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”