Voice controlled hardware requires four capabilities: (1) vocal response to trigger events (sensors/calculations-to-brain), (2) speech generation (brain-to-mouth), (3) speech recognition (ear-to-brain), and (4) speech understanding (brain-to-database, aka learning). These capabilities can increasingly be implemented using off-the-shelf modules, due to progress in advanced low-cost silicon capable of digital signal processing (DSP) and statistical learning/machine learning/AI.
In this article we look at the value chain involved in building voice control into hardware. We cover highlights in the history of artificial speech. And we show how to convert an ordinary sensor into a talking sensor for less than £5. We demonstrate this by building a Talking Passive Infra-Red (PIR) motion sensor deployed as part of an April Fool’s Day prank (jump to the design video and demonstration video).
The same design pattern can be used to create any talking sensor, with applications abounding around home, school, work, shop, factory, industrial site, mass-transit, public space, or interactive art/engineering/museum display.