CraftedRC - Maker Tools & Hobby Gear

Rebuilding My Smart Speaker: NVMe, Voice Parsing, and Clean API Power

By Joe Stasio on May 13, 2025

The Project Finally Snapped Into Place

I’ve been building this Raspberry Pi-based smart speaker in bursts—between other projects, experiments, and burnout cycles. This past week, everything locked in. It was supposed to be a bug fix. Instead, it became a full-stack voice overhaul.

The NVMe Backbone: iUniker HAT+ on the Pi 5

First, hardware. The iUniker PCIe M.2 HAT+ turned out to be a game-changer. With it, my Raspberry Pi 5 boots and runs off a full-blown NVMe SSD—no more SD card bottlenecks. That alone made `yt_dlp` and `mpv` feel snappier, streams initiate faster, and boot-up time dropped by half. I added extra standoffs and used all 4 screw positions for stability. Zero rattle. All power.

Audio Was Broken—Then ALSA Fixed It

For a while, nothing would play. I was debugging `mpv` output thinking it was the stream URL or `yt_dlp` formatting, but turns out PulseAudio was messing up routing. The fix? Pass `--ao=alsa` to `mpv` and route everything directly through the Pi's audio layers. Music came back—loud, clean, and responsive.

Dropping URL Streaming Entirely

I rebuilt `StreamController` to kill off all prompt-based remote URLs. Everything’s local now—prompts go to `yt_dlp`, stream URLs get extracted, and `mpv` handles the play. The benefit: fewer moving parts, zero dependency on `stream.php`, and better control over fallback and retry logic.

Natural TTS with Piper

Goodbye `espeak-ng`, you robotic bastard. After many failed installs, I compiled Piper manually and paired it with the Lessac voice model. The difference is night and day. It sounds almost human, uses no cloud APIs, and plays instantly through `aplay` or `mpv` after synthesis. Now, the assistant actually feels like one.

Exposing the Intent Router as an API

Internally, my `IntentRouter` was doing solid work—parsing stuff like "set a timer for five minutes" into a dict with action, domain, and parameters. But it was locked in the CLI. So I added /api/intent/parse to FastAPI. Now I can feed strings from a web UI, Postman, or curl, and get back the structured intent in JSON. It's testable, hackable, and future-proofed for whatever frontend I glue on top of it.

The Frontend Hook

I added a basic form to the UI—just a textarea and a button. It submits prompts via fetch to the new intent endpoint and renders the result. No guessing what the assistant will do anymore. I see it. I can validate it. And I can extend it later to simulate full dispatch chains.

Swagger, But Useful

FastAPI gives a Swagger doc by default, but I had to clean up the models, especially around `PromptRequest`. I was wrongly reusing that for intent parsing when `pre_prompt` didn’t apply. Fixed that. Now `/api/intent/parse` expects only what it should—and the Swagger doc reflects reality.

The Stack, Stacked

Raspberry Pi 5
iUniker M.2 NVMe HAT+
FastAPI for all backend logic
MPV for audio playback via ALSA
Piper TTS with Lessac voice
IntentRouter exposed as API
Local CLI control + live Web UI

No More Guessing

This rebuild hit everything: speed, sound quality, API ergonomics, clarity. I don’t need to ssh into the Pi and trace threads manually just to figure out why a song didn’t play. I see it. I test it. I control it. This wasn’t just a fix. It was evolution.