The Project Finally Snapped Into Place
I’ve been building this Raspberry Pi-based smart speaker in bursts—between other projects, experiments, and burnout cycles. This past week, everything locked in. It was supposed to be a bug fix. Instead, it became a full-stack voice overhaul.
The NVMe Backbone: iUniker HAT+ on the Pi 5
First, hardware. The iUniker PCIe M.2 HAT+ turned out to be a game-changer. With it, my Raspberry Pi 5 boots and runs off a full-blown NVMe SSD—no more SD card bottlenecks. That alone made `yt_dlp` and `mpv` feel snappier, streams initiate faster, and boot-up time dropped by half. I added extra standoffs and used all 4 screw positions for stability. Zero rattle. All power.
Audio Was Broken—Then ALSA Fixed It
For a while, nothing would play. I was debugging `mpv` output thinking it was the stream URL or `yt_dlp` formatting, but turns out PulseAudio was messing up routing. The fix? Pass `--ao=alsa` to `mpv` and route everything directly through the Pi's audio layers. Music came back—loud, clean, and responsive.
Dropping URL Streaming Entirely
I rebuilt `StreamController` to kill off all prompt-based remote URLs. Everything’s local now—prompts go to `yt_dlp`, stream URLs get extracted, and `mpv` handles the play. The benefit: fewer moving parts, zero dependency on `stream.php`, and better control over fallback and retry logic.
Natural TTS with Piper
Goodbye `espeak-ng`, you robotic bastard. After many failed installs, I compiled Piper manually and paired it with the Lessac voice model. The difference is night and day. It sounds almost human, uses no cloud APIs, and plays instantly through `aplay` or `mpv` after synthesis. Now, the assistant actually feels like one.
Exposing the Intent Router as an API
Internally, my `IntentRouter` was doing solid work—parsing stuff like "set a timer for five minutes" into a dict with action, domain, and parameters. But it was locked in the CLI. So I added /api/intent/parse
to FastAPI. Now I can feed strings from a web UI, Postman, or curl, and get back the structured intent in JSON. It's testable, hackable, and future-proofed for whatever frontend I glue on top of it.
The Frontend Hook
I added a basic form to the UI—just a textarea and a button. It submits prompts via fetch to the new intent endpoint and renders the result. No guessing what the assistant will do anymore. I see it. I can validate it. And I can extend it later to simulate full dispatch chains.
Swagger, But Useful
FastAPI gives a Swagger doc by default, but I had to clean up the models, especially around `PromptRequest`. I was wrongly reusing that for intent parsing when `pre_prompt` didn’t apply. Fixed that. Now `/api/intent/parse` expects only what it should—and the Swagger doc reflects reality.
The Stack, Stacked
- Raspberry Pi 5
- iUniker M.2 NVMe HAT+
- FastAPI for all backend logic
- MPV for audio playback via ALSA
- Piper TTS with Lessac voice
- IntentRouter exposed as API
- Local CLI control + live Web UI
No More Guessing
This rebuild hit everything: speed, sound quality, API ergonomics, clarity. I don’t need to ssh into the Pi and trace threads manually just to figure out why a song didn’t play. I see it. I test it. I control it. This wasn’t just a fix. It was evolution.