Voice-controlled Hardware: Making Sensors Talk for Under £5

By Assad Ebrahim, on April 18th, 2021 (7,036 views) | 2 comments - Leave a comment!
Topic: Building Technology, Education, Electronics, Life Hacks, SWEng--Programming, Technology

Spread the love

Voice controlled hardware requires four capabilities: (1) vocal response to trigger events (sensors/calculations-to-brain), (2) speech generation (brain-to-mouth), (3) speech recognition (ear-to-brain), and (4) speech understanding (brain-to-database, aka learning). These capabilities can increasingly be implemented using off-the-shelf modules, due to progress in advanced low-cost silicon capable of digital signal processing (DSP) and statistical learning/machine learning/AI.

In this article we look at the value chain involved in building voice control into hardware. We cover highlights in the history of artificial speech. And we show how to convert an ordinary sensor into a talking sensor for less than £5. We demonstrate this by building a Talking Passive Infra-Red (PIR) motion sensor deployed as part of an April Fool’s Day prank (jump to the design video and demonstration video).

The same design pattern can be used to create any talking sensor, with applications abounding around home, school, work, shop, factory, industrial site, mass-transit, public space, or interactive art/engineering/museum display.

Bringing Junk Model Robots to life with Talking Motion Sensors (April Fools Prank, 2021)

1. The rise of low-cost voice controlled hardware

Three technology streams have converged in the past 10 years to enable low-cost voice-controlled devices:

(1) An explosion of highly capable silicon for sensors and data processing. Mass-produced application-specific integrated circuits (ASICs) and ultra low-cost microcontrollers (including multimedia processors required for voice generation) now cost 3-30 cents apiece at scale (typically 10k+ pieces). This has led to the creation of dozens of niche sensors available for 50p to £5.00 ea at retail cost.

(2) Full stack WiFi has been added to hobbyist microcontrollers while retaining a cost-per-unit between $3-$5. Integrated WiFi allows devices to transfer input data into the cloud and let cloud servers handle the computationally intensive DSP/AI tasks, collecting and executing the outputs. This computational load sharing allows sophisticated voice capability on low-cost devices that are able to maintain continuous connection.

(3) Advances in statistical learning/machine learning/AI have enabled more effective approaches to voice processing and speech understanding including using deep learning models to boost training/accuracy. Digital voice assistants have been embedded into a slew of devices, initially phones and computers, and then smart speakers, TVs, home appliances, and even cars: Apple’s Siri (2011), Amazon’s Alexa and Microsoft’s Cortona (2014), Google’s Assistant (2016), and Samsung’s Bixby (2017). We’ve gotten used to interacting with technology through voice.

These trends above have been amplified by four additional developments: (a) the rise of the Maker movement since 2005 (Dale Dougherty), (b) the widespread adoption of Arduino since 2005 to become the world’s largest open source hardware platform, (c) the acceleration of e-commerce in consumer electronics and components (Amazon and Ebay), and (d) the flattening of global supply chains and greater pricing efficiency with manufacturers selling directly to consumer (B2C) via the e-commerce outlets. The result is that voice integration and voice control are now within reach using low-cost devices.

Let’s take a look at how this has come about.

2. A Short History of Speech Synthesis

The artificial generation of speech has gone through five stages: (0) mechanical generation of sounds pre 1930s, (1) electromechanical generation through combination of hisses and buzzes passed through various filters and pitch adjustors from 1939 to 1970s, (2) the digital combination of pre-recorded phonemes stored in databases and combined to form speech, kicking off a genre of toys-that-talk, alongside other talking devices (3) formant synthesis in the late 1990s in which 3 frequencies are mixed to create speech elements and to control pitch, prosody, and duration (formants are the spectral patterns making up the sounds of human speech), (4) machine learning / AI / neural networks / deep learning models applied to speech technology.

Below are some highlights:

1779-1930s – Prior to 1939, advances in mechanical speech were in 1779 when Kratzenstein built a system capable of producing the 5 long vowel sounds, followed by a bellows-operated device in 1791 by von Kempelen which added consonants. Advances continued with 1837 device by Wheatstone, and 1846 the Euphonia talking machine by Faber, and 1923 by Paget.
1939 – Homer Dudley of Bell Labs produced the Voder, an electromechanical keyboard that combined hissing and buzzing through 10 filters and 2 energy sources capable of producing 20 independent sounds which a trained operator could use to generate sound sequences resembling speech. Demonstration of singing Auld Lang Syne. Voder operator Helen Harper explains how the sounds are made: “to produce the word ‘concentration’, I have to form 13 different sounds in succession, make 5 up/down movements of the wrist bar, and vary the pitch using 3-5 presses of the foot pedal to provide intonation/expression.” Because the Voder models the human vocal tract, any spoken language is possible to generate, and additionally animal sounds. High-school student Vaughn Rhinehart built a working replica system which he demonstrates here
1939 – The Voder, Bell Labs, Schematic Illustrating how the synthetic modal of the human vocal tract were to be preserved, with only the mechanism for operating them changing. On the Left, the human brain causes the muscles in the human vocal tract to adjust and sound comes out of the mouth, which is analyzed into constituent parts, and the reconstructed on the right with switches and pedals.

1939 – The Voder, Bell Labs, an operator would train for a year or more, and then could create 20 sounds from which, with perfect timing, could form words, and then sentences, and songs.
1961 – John Kelly and Carol Lochbaum of Bell Labs programmed the IBM 704 mainframe computer to speak and to sing ‘Daisy Bell’. This was heard by Arthur C. Clarke who made that the final scene of his epic Space Odyssey featuring HAL, the futuristic computer.
The next phase was using Linear Predictive Coding and then in the 1990s Formant synthesis which built upon earlier FM (frequency modulated) synthesis and other synthesis models.
1978 – Larry Brantingham of Texas Instruments develops the TMS5100 IC chip which generated speech on-the-fly using a 10-pole Linear Predictive Coding (LPC) filter with a lattice filter as its Vocal Tract Model (VTM). This earned the chip the distinction of marking the first time that the human vocal tract has been modeled on a single piece of silicon. The chip was used in the educational toy ‘Speak & Spell’. Pre-recorded phonemes stored on a separate ROM were combined through the chip. In 2013, ‘Talkie’, a software emulator for the TMS5100’s speech capability, was developed for Arduino. Technical details in the TI99 speech synthesizer using the TMS5200. The TMS chips were used in various versions to 1986 when an on-chip 8-bit microcontroller was added, which culminated in the MSP50C6xxx line of low data rate speech synthesis chips which were sold to Sensory Inc. in 2001.
1978 – Texas Instruments TMS5100 chip that modelled speech using phonemes to be strung together to form words and sentences.
1980s – Competing LPC based speech synthesizer ICs were developed by competitors, eg General Instruments’ SPO256 IC.
1984 – DECtalk was a famous speech synthesizer and text-to-speech system developed in 1984 by DEC corporation.
1994 – gnuspeech released containing the research from Steve Jobs NeXT corporation (before Apple) and Trillium corporation, which incorporated the research from University of Calgary.
1998 – Yamaha produces the first Formant based Synthesizer, the FS1R. Example of a female voice patch on the FS1R. This is after the GS-1 and DX7 FM synthesizer releases in 1980 and 1983 respectively. (The latest FM synth is F’em.)
The current stage is focused on statistical learning techniques (neural networks, AI, machine learning, deep learning), and the delivery of voice interactive digital assistants.
1995 – Sensory Inc. produced RSC-164 (databook PDF) the first commercially successful speech recognition IC. The chip featured an integrated 8-bit microcontroller and used neural networks to recognize speech with 96% accuracy in speaker independent mode (any speaker) and 99% accuracy in speaker dependent mode (a voice on which the system has been pre-trained) using onboard processing. The RSC-164 was used in the Mars Microphone project deployed on the Mars Polar Lander which lost contact with the Earth during descent into Mars orbit in 1999.
2000s – Human realistic computer generated voices appear in Windows 2000 and Windows XP: Microsoft Mike, Microsoft Mary
2009 – Sensory Inc. produces smallest text-to-speech (TTS) system
2011 – Apple introduces Siri
2014 – Amazon launches Alexa. William Tunstall-Pedoe and his company Evi founded in 2005 developed the underlying AI to enable speech understanding. It was acquired by Amazon in 2012, and Alexa was released in 2014.
2016 – Adobe experiments with Voice Conversion (VoCo), dubbed the photoshop of speech, and allows with 20 minutes of training on a person’s speech, to produce any speech through a text-to-speech (TTS) interface, using their voice. AI-based methods for generating speech to match human samples.
2016 – Google’s Deep Mind produces WaveNet that uses convolutive neural networks (CNN) to learn how to speak by being trained by human speech.
2018 – Resemble AI (current website) develops voice-cloning capability, picking up where Adobe’s VoCo stopped. Once the voice is cloned, text-to-speech can be used to generate arbitrary speech. There are ethics concerns on the ability to create voice clones that are unable to be distinguished from the real voices/utterances. Resemble AI embeds technologically detectable markers into the voice signal to tag them as manufactured. The concerns at the capability remain nonetheless, joining similar concerns at the ability to make hyper-realistic 3D edits to objects in photos.

3. Converting Ordinary Sensors to Talking Sensor for under £5

Vocal response to trigger events can be added to an existing sensor for less than £5 using off-the-shelf modules that offer record/playback functionality in either single-track (e.g. YS41F) or multi-track (e.g. ISD1760) mode. There are also modules to which pre-recorded MP3s can be uploaded for play, again single-track or multi-track variants (e.g. WT2003S multi-message MP3 chip). Typical cost for record/playback modules is £3-£4 ea., with prices dropping to 50p ($0.85) ea at scale (100,000+ units). There are also OTP (one-time programmable) systems commonly found in talking plush toys (e.g. this one for £8.75, 20s record time), or 2pc for £12.25 (£6.12 ea)), cars, action figures, dolls, and greeting cards, and feature pre-recorded material that can be played back following a trigger event. The price at scale for OTP chips drops to $0.15-$0.22 per unit (500-10,000 units).

Record/playback chips offer single-track and multi-track modes which can be used to turn ordinary sensors into talking sensors for less than £5 ea (less than 50p at scale).

4. Examining the Value Chain, from Vocal Response to Voice Generation, Voice Recognition and Voice Understanding

Vocal Response
The YS41F module can record up to 30s of high quality audio recording and play it back with hi-fidelity on an 8R 0.5W speaker integrated into the module. What enables the sophisticated functionality cheaply is the ultra-low cost 16-bit multimedia processor Shenzhen TR16F064B (datasheet), which functions as a mini voice recorder with the ability to reprogram the recording. This highly capable microprocessor costs $0.25-$0.35 per unit at scale (500-10,000 units) and provides built-in DSP operations, 10-bit ADC, I2C/SPI, Audio PWM output support, 64K of SRAM memory (for data) and 1MB of Flash ROM memory (for program). (What’s cool about this controller chip is that it’s a von Neumann, not Harvard architecture, meaning all memory spaces are visible to program counter, which means it is possible to implement a real 3-instruction Forth without wearing out Flash memory, or requiring pre-compiled instructions as with my 3iForth for Arduino. But of course, all of the low cost MCUs have more complicated development systems requiring specialized hardware for their programming.)

If we needed to play multiple tracks, we could use the ISD1760 chip (pictured above right) with a programmer module that has microphone and buttons to individually record the tracks into onboard Flash memory. The programmer board conveniently has a socket (DIP-28) so that an ISD1760 chip can be physically inserted just for the recording to sounds, and then removed for installation as a bare chip into the application circuit, which allows the chips to be bought independently (£3 ea in single quantities, 40% cheaper than with the programming module). The special capability of the ISD1760 provides an SPI interface for a microcontroller to select at runtime which of the multiple tracks will be played. See Nuvoton’s ISD ChipCorder range and NuVoice range.

Speech Generation on the Cheap
The ISD1760 multi-track recorder with SPI playback interface is perhaps the simplest approach to speech synthesis/generation using a technique called concetenative speech synthesis. The user pre-records phonemes and the microcontroller creates speech by seleting dynamically which phonemes to play, at what pitch, and for how long. With more memory space, one could in principle record whole words or phrases. Examples are talking clocks and other number readers.

Text-to-Speech parsing front-ends
A step-up in capability would have a Text-to-speech front-end parsing user inputted text into phonemes and annotating for pitch, intonation (prosody) and duration, and then generating speech waveforms using the pre-recorded phonemes. This is how the more expensive EMIC2 module (c.£65) works (demo video), or the latest S1V30120 from Epson, at £39

Formant Synthesis for ultra-low memory requirements
To get away from pre-recorded phonemes, the next step is so-called Formant Synthesis, which creates the phonemes of human speech through mixing frequencies and judiciously selecting the band-pass filters. Vowels and consonants can be created. More sophisticated control is required to generate the speech, but only frequency data and lookup tables need to be stored instead of waveforms, significantly reducing the amount of memory needed.

Formant Synthesis – using 3 or more frequencies passed through band-pass filters of varying widths (Q factors) to shape sound and create vowels (left). Adding control over gain (volume) and Q (filter width) makes the sound more human (right).

Speech Understanding through the Cloud
The above sections have been about speech generation. But what about the other end of the value chain, speech understanding? The most advanced approach is to leverage the cutting edge statistical/machine learning/AI available in the cloud which underpins high performance voice processing, speech understanding, and speech generation capabilities (think Alexa’s vocal clarity, exceptional speech recognition and understanding). This is now accessible using low-cost WiFi-enabled microcontrollers that can be continuously connected to the cloud, e.g. Espressif’s ESP family of WiFi capable microcontrollers intended for internet-of things devices. Building in Alexa integration via Alexa’s cloud-based Skills API gives true voice-controlled hardware. Example: Microchip’s Alexa Development Kit sold for £480ea (MOQ20)

Offline Speech Understanding
Can acceptable speech understanding be performed offline? There are a few off-the-shelf low-cost ASICs that do offline speech recognition and speech understanding and appear to have reasonable performance. Examples:

LD3320, avail for c.£18
ISD-3000
Speak-up Click board from Mikroelectronika (£39) which does offline speech recognition by running a Dynamic Time Warping (DTW) comparison algorithm against up to 200 pre-recorded keywords with instant recognition, high accuracy, and immediate response. The core engine are the Epson voice chips, e.g. the S1V30120 chip.

One especially interesting example is Peter Balch’s speech understanding experiments using an Arduino Nano. The Nano is an 8-bit microcontroller with up to 32kB of FLASH RAM and processing of up to 10 MIPS, whichi s akin to that of a 1970s mainframe computer (2-32KB, 0.5 – 1 MIPS). Using this, he was able to obtain a recognition rate of 90-95% on simple single words uttered by a single speaker against a training set of the same words from the same speaker. This is comparable to capabilities of 1970s speech recognition.

Several microcontroller design and manufacturing companies are bringing to market voice solutions on chip. With the price of custom ASICs continuing to fall, the next few years may see these state-of-the-art devices that currently retail from £15-£40 follow the same trend and reach the sub £10 and sub £5 price points.

5. Building a Talking PIR Motion Detection Sensor

In the last part of this article, we’ll look at implementing the first step: pre-recorded vocal response to event triggers. Let’s build!

What we’re after is a talking version of a Passive Infra-Red (PIR) motion detection sensor. Here’s what we need:

Bill of Materials (catalog here)
Total: £8.55

HC-SR501 PIR motion sensor (£1.10)
Record/Playback Module: YS41F or ISD1760 (c.£3.50)
Boost Converter: Super Micro CE012 provides regulated 5VDC (£1.30)
AAA battery & holder (65p)
ATTiny85 8-bit microcontroller provides decision logic and good usability (£1.30)
a dozen or so auxilliary electronic components, resistors, caps, etc. (30p)
solderless breadboard, 170 tie-points (40p)

Notice that, at scale, the price could be reduced by £3 (on the record/playback module), and £1 (on the boost converter), dropping the prototype price to £4.55.

Making and testing the talking PIR motion sensor should take approx. 2 hours (excluding tool setup) assuming intermediate electronics experience (breadboarding components, some light soldering, and a little Arduino programming).

Schematics and Prototype layout. [link]

Let’s step through the key components for the build.

Design Video

5.1. Infra-Red Motion Sensor

We start with a low-cost but effective PIR motion sensor, the HC-SR501, available for £2.45, or £1.10 ea. in a 10-pack.

PIR Motion sensors are complicated devices but easy to interface to. Simply provide 5V DC power (60mA of continuous current, 300mW), and after 60 seconds of painting the IR background, the sensor begins real-time signal processing, differentially comparing IR levels received on its A and B sensors. When the pattern suggests motion (left-to-right or right-to-left), the sensor outputs an active HIGH signal (3.3V logic level).

Building a Passive Infra-Red (PIR) Motion Sensor interfaced through Arduino Nano – Sep 13th, 2020

Especially note the left-right axis of the sensor to ensure positioning for maximum detection sensitivity.

HC-SR501 Passive Infra-Red (PIR) Motion Sensor Detail. The directions for turning the potentiometers are when they viewed from the top, i.e. facing you and ready to be turned. Critical one to adjust is the timing delay – maximum CCW turn sets to 3 seconds delay.

How can such a sensor be made so cheaply? Ultra low-cost ASICs, in this case the heart of the PIR sensor is the BISS0001 dedicated PIR controller which does all the DSP handling the detection, processing, and signalling. It is available from $0.01 to $0.10 in volume, or for experimentation in a DIP16 package for £1.20 (single qty). The HC-SR501 PIR sensor consumes 60mA of power.

5.2. Boost Converter to 5VDC for portable power

The next step is powering the sensor. To keep things portable, we will use a boost converter to provide regulated 5VDC from a battery. The Super Micro DC-DC Converter CE012 is compact (approx 1.1cm x 1.1cm), cheap (less than £1 in pack-size of 10), has no parasitic LED indicator to draw power, and can readily provide the required 60mA output current from a single AAA 1.5V battery on which it can run 8 hours or more, boosting sufficiently down to input of 0.75V, i.e. to 50% charge on the battery, at which point the boosted output is insufficient to keep the PIR sensor’s detections reliable. Soldering a right-angle 3-pin header allows the CE012 to be inserted vertically into the breadboard for minimum space usage.

5V DC to DC boost converter, supplies 60mA down to 0.75V

Sizing the battery
What battery to use? I used a single 1.5V AAA battery (550mAh) to ensure at least a full night’s reliable detection (9hrs = 550mAh/60mA), but could also have used a 3V CR2032 coin cell battery (225mAh) if a few hours was required. Reference: (battery power table)

Electrical capacity of common battery types.
Source: https://daycounter.com/LabBook/Battery-Capacity-Comparison.phtml

5.3. Microcontroller for better usability

It is possible to use the PIR motion sensor in stand-alone mode, but a microcontroller (uC) enables a better user experience. For instance, the PIR sensor requires 60 seconds of background scanning before entering detection mode. The uC can indicate when scanning is complete and the sensor is active with a timer and red/green LEDs. Similarly, it can run more complex response decisions taking inputs from other sensors or context. For instance, if it is dark (via input from photoresistor), turn on light, otherwise ignore (this is the logic behind motion-sensing security lights). Alternatively, if detections are within seconds of each other, the warnings could be escalated, culminating in sounding a 100dB ear splitting alarm (though be warned, this alarm consumes 40mA by itself).

Which microcontroller to use?

Microcontroller Comparison table: Arduino Nano, Attiny, ESP, etc.

I typically start with the Arduino Nano based on the Atmel328P processor. This has a small form factor (4.32 x 1.78cm), is breadboard-able (DIP-30 package), relatively economical at c.£3.70 per unit, consumes 20mA without load, has 18 GPIO lines (not counting Rx/Tx and A6/A7 which are analogRead only), and is easy to prototype software applications with using my Arduino 328P version of Frank Sergeant’s 3-instruction Forth.

But for this application, the Arduino Nano is overkill. We need at most 5 I/O lines: 1 for the sensor, 2 more for independently triggering red/green LEDs, 1 for the triggering the YS41F voice alert module, and 1 more optionally for the 100dB alarm. Given that we will want to run it continuously for many hours, 20mA without load for the Nano is an additional 30% drain on the battery.

Shrinking the PIR Sensor Footprint by moving from Arduino Nano to ATTiny 85.

The ATTiny85 is just right. It has a tiny form factor (1.40 x 0.76cm), is breadboard-able (DIP-8 package), less than half the cost of the Nano (c.£1.50 ea with MOQ10), has no parasitic LEDs consuming power, consumes less than 1mA (300uA), and in most cases can use Arduino Nano code with almost no modification (apart from pin definitions and library inclusion). The downside? Only 5 I/O pins, but for our application, this is as much as we need (and if we needed more, we could trade up to the ATTiny84 DIP-14 packages with 11 I/O lines).

The Code
The code for the Arduino Nano can be as simple as 30 lines of C code to demonstrate the functionality of a PIR sensor. But we want to do a bit mor so our final code is 60 lines of C code (twice the size 60). Measured in size of executable, the code occupies 1.3kB of flash memory, and uses just 9 bytes (yes bytes) of SRAM for data/variables. It’s pretty small.

Arduino Nano’s are programmable from the Arduino IDE in the usual way: select Board (Nano) and Processor (Atmel328P Old Bootloader), set COM port, don’t worry about the programmer (none needed) and then Upload (Ctrl+U).

What does the code do? We instruct the microcontroller (uC) to wait a minute (60,000 ms) while the PIR sensor completes its background scan, and we flash the red LED every 6 seconds to give a visual cue that scanning is underway. When background scanning is complete, we turn on the green LED (turn off red LED). The PIR sensor is now active and in a continuous loop checking the PIR sensor’s signal line for a detection. Any infra-red differential detection, typically caused by body heat movement, will trigger the PIR sensor to raise the signal line which the uC detects and runs alarm(ON) code. We have configured the uC to toggle the LEDs (red now ON, green OFF) and activate a buzzer (later we will hack the buzzer signal to trigger the voice alert). More complex response behaviour could be programmed if desired.

Programming the ATTiny85
Migrating code from Nano to ATTiny85 is in most cases simply a matter of replacing pin definitions; the rest is the same. It is best to use preprocessor definitions (#define, #ifdef/#endif) to encapsulate the different pin definitions (e.g. #define NANO and #define Tiny85 or #define Tiny84) This allows maintaining one source file and avoids duplicating source code.

It is easiest to program ATTiny controllers if you use a programmer, e.g. the TinyAVR USB Programmer board which uses SPI to program the Tiny chips. The programmer has an onboard DIP-8 socket for the Tiny85 uC, but I find it easier to externalize the connections to a breadboard where they can be mapped to either Tiny85 or Tiny84 (see wiring diagram below).

Physical Wiring of the ATTiny 85/84 Microprocessor to the TinyAVR USB Programmer

Additionally you’ll need

Device Driver for TinyAVR USB Programmer (Windows) for ATTiny chips: Zadig Device Driver: libusb-win32 (v1.2.6.0).
Install Board files for Arduino IDE to enable uploading code: download ATTiny 85/84 board files by David Mellis for Arduino. Download ZIP, and unzip the attiny archive. Then copy attiny\ to ..\arduino\hardware\ folder.

After restarting the Arduino IDE, the Board Menu should show some new boards at the bottom. Select Board ATTiny85 and Processor (ATTiny85), set the Programmer to USBTinyAVR, and then Upload using Programmer (Ctrl+Shift+U).

Programming ATTiny85/84 bare chips from Arduino IDE using Tiny AVR USB Programmer.

5.4. Hacking the YS41F record/playback module to trigger off the sensor

The YS41F recordable voice alert module is powered by 3x LR1130 button batteries in series (providing 4.5V). There are several variations available on Amazon/Ebay with different trigger events, including press-button-to-play, photoresistor (when light shines), reed switch or other mechanical sensor. You’ll need to snip off the triggering functionality and replace with a trigger from the PIR sensor or, in our case, the microcontroller which is monitoring the PIR for a detection.

Hacking the YS-41F module to trigger playback when the sensor detects an event.

The key is that the sensor’s detection trigger is wired to pin 4 of the TR16F and receives some small current when a detection event occurs. We wire this (black dot/wire) to the emitter end of a transistor whose base is connected to the PIR sensor’s detection line and whose source (red dot/wire) is the 4.5V power from the YS41F module itself.

6. Demonstrating the Talking Sensor with an April Fool’s Prank!

The final sensor package consists of three parts shown below:

the “eye” (PIR sensor) is on an 20cm umbilical allowing it to be positioned optimally for detection.
the small breadboard holding the Tiny85 microcontroller (centre), CE012 DC to DC converter (top right), AAA battery with holder (bottom), transistor for the YS41F hack, and the green/red LEDs signalling background data collection state (flashing red), active monitoring (green), and detection (red).
the YS41F record/playback module, which is powered independently through its 3x LR1130 1.5V button batteries (4.5V total), and directly drives the 0.5W speaker (the volume is surprisingly loud!).

Talking Sensor: HC-SR501 Passive Infra-Red (PIR) Sensor triggering YS41F Recordable Alert Module via ATTiny85 microcontroller, powered by 1 AAA battery boost converted using CE012 DC-DC converter

April Fool’s Day — A little fun with the Talking Motion Sensor

Ahead of April Fool’s Day this year, my daughter (9yo) had been cajoling me to come up with some pranks as she was planning some of her own. For my part, I though to catch her and her brother (5.5yo) by surprise by bringing their junk-model robots to life…

After the kids went to bed, I wired up the Talking PIR motion sensor described above. I had previously written some compact Arduino Nano code for the PIR sensor, c. 30 lines of Arduino C-code which compiled down to a mere 900kB. But the Nano while relatively small (compared to the Uno or Raspberry Pi), has nonetheless a bigger form factor than I wanted for this stealthy mission. In addition, with its built-in LEDs always on and many more peripherals than I needed, it was consuming a hefty 100mW of power (20mA @5VDC) continuously on top of the thirsty PIR sensor’s 200mW current draw (couldn’t do anything about the latter). With a typical AAA battery holding approx 500mAh of power, or 750mW over its life, a Nano would drain the battery after 3-4 hours. I wanted this to run comfortably all night on a single AAA battery. Porting to the ATTINY85 8-bit microcontroller takes a few minutes, and the 980 compiled bytes fits easily into the Tiny85’s 8KB flash memory (12% usage), requiring just 9 bytes (0.5%) of SRAM. The 5 GPIO lines on the Tiny85 are more than enough to read the data signal from the PIR, toggle RED and GREEN status LEDs and trigger the voice alert module (one line left over). Nicest, the Tiny85 is a small form factor (DIP-8) chip, which allows the whole device to be put together onto a 170 tie-point solderless breadboard, with a little hot glue holding the AAA battery holder. The result is a lower power sensor, consuming just 90mW all-in (60mA * 1.5VDC), microcontroller + PIR sensor, which should be able to run continuously for 8 hours on a single AAA battery. (The YS41F is separately powered through 3x LR1130 1.5V batteries in series.)

Where to hide the sensor? This was the most fun part. The previous weekend, Jasmine (9yo) and Adam (6yo) had built some 2-foot high junk-modeling robots and we had been talking about how cool it would be if they could see and talk. After a bit of testing and fixing (nothing ever goes right first time!), I set up their now enhanced seeing-and-talking robots at the bottom of the stairs and turned off the lights and went to bed.

Adhesive on the backs of the YS41F and the 170 tie-point breadboard allow both to be mounted easily and (if desired) out of sight. In the photo we deployed the system conspicuously on the shoulder of the junk model robot which is being animated (eyes and voice).

The next morning, I heard the them excitedly chattering in the hallway. Apparently, Jasmine had been caught red-handed as she crept down the stairs in the darkness to set up her gags before the rest of us awoke. To her surprise, she was being called out by her own robot. Mission accomplished 🙂

Success! Talking Motion Sensor runs all night and make the critical detections on April Fools morning.

Demonstration Video
Watch the video to see the Talking PIR Sensor in action on April Fool’s Day!

7. Epilogue: Looking Ahead

The potential for voice control in low-cost devices is not yet market saturated. At scale, the electronics for the talking PIR sensor could likely be manufactured onto a surface mount PCB for £2-£3 ea. Are there comparable products available on the market? Not at this price point or ease of use. Comparable products start at £20. They are more bulky (though with nice plastic housing), and remove the ability to aim the sensor and hide the electronics discretely, which this prototype allows.

Talking Products makes a £25 PIR motion sensor that plays a pre-recorded MP3 sound file upon detection.
Another with an outdoor housing for £23.
Another for £18.
Also here £18.
smart motion sensor £16.

Application ideas

Unmanned reception desk or check-out station plays a welcome/greeting message as you approach and a human is sent.
Information/instructions as you approach a vending machine or elevator.
A Halloween display where spooky sounds or a blood-curdling shriek sound as you approach pumpkin or door.
Cuddly toy “wakes up” and speaks as child approaches
COVID-safe Bathroom reminds you to wash your hands
Eco-friendly Light switch reminds you to shut off the light.
Security Gate/Door reminds user please shut it.
Sound effects for a game (shooting, dodging, running, I’m hit!, different types of weaponry, car/motorbike/helicopter/cannon). Or sounds based on game/simulation state e.g. acceleration/deceleration/skid/crash.
Child-minding sensor that checks time when detecting child, and if it’s not morning says Go back to bed.
Sensory’s list of favorite Voice products (these are not cheap)

Other ideas? Feel free to share yours, and your designs/builds/applications, in the comments below.

Knowledge Engineering & Emerging Technologies: A View of what’s coming in the Next Decade (2005-2015)* « Mathematical Science & Technologies

January 1, 2024 at 10:33

[…] Part 3: The Embedded Stack Chapter 10: Microcontrollers, Sensors, and Embedded Systems: Low-Cost Experimenting with Arduino Chapter 11: Voice Controlled Hardware and the Human-Sensor Interface […]

A microcontroller development kit for under £10 (Arduino) « Mathematical Science & Technologies

June 3, 2024 at 07:31

[…] Making Sensors Talk for under £5, and Voice Controlled Hardware […]

Mathematical Science & Technologies

Voice-controlled Hardware: Making Sensors Talk for Under £5