Researchers at Northeastern University have developed a way to extract audio from both still photos and muted videos using artificial intelligence.
The research project is called Side Eye.
“Most of the cameras today have what’s called image stabilization hardware,” said Kevin Fu, a professor of electrical and computer engineering at Northeastern University. “It turns out that when you speak near a camera lens that has some of these functions, a camera lens will move every so slightly, what’s called modulating your voice, onto the image and it changes the pixels.”
Basically, these small movements can be interpreted into rudimentary audio that Side Eye artificial intelligence can then interpret into individual words with high accuracies, according to the research team.
“You’re able to get thousands of samples per second. What does this mean? It means you basically get a very rudimentary microphone,” Fu said.
Even though the recovered audio sounds muffled, some pieces of information can be extracted.
“Things like understanding what is the gender of the speaker, not on camera but in the room while the photograph or video is being taken, that’s nearly 100% accurate,” he said.
So what can technology like this be used for?
“For instance in legal cases or in investigations of either proving or disproving somebody’s presence, it gives you evidence that can be backed up by science of whether somebody was likely in the room speaking or not,” Fu said.
“This is one more tool we can use to bring authenticity to evidence, potentially to investigations, but also trying to solve criminal applications,” he said.