Crafted RC | Maker Gear & Tools Curated by Code
← Back

Intent Routing on a Pi5 Smart Speaker

By Joe Stasio on May 13, 2025

After months of tinkering, cursing, and late-night debugging sessions, my Raspberry Pi 5-powered smart speaker is finally operational. While commercial options like Amazon Echo and Google Home dominate the market, there's something deeply satisfying about building your own system from scratch—one that you control completely, with no data being shipped off to corporate servers.

In this post, I'll break down the intent routing system that powers my speaker, with special focus on how I integrated WiZ smart bulbs using nothing but simple UDP commands. I'll also dive into the challenges of running everything locally on a Pi5, and how I structured the system to handle multiple types of intents efficiently.

The Architecture

At a high level, the system works like this:

  1. Local speech recognition captures voice commands
  2. A compact language model detects intent and extracts parameters
  3. The intent router dispatches commands to appropriate handlers
  4. Handlers execute specific functions (lights, music, weather, etc.)
  5. Text-to-speech reads back responses

The whole stack runs on a Raspberry Pi 5 with 8GB RAM, which provides enough horsepower to handle local speech processing without cloud dependencies. I'm using a custom 3D-printed enclosure with an integrated microphone array and a decent 3W speaker.

Intent Recognition and Routing

For intent recognition, I initially tried running a full-sized language model on the Pi, but quickly discovered that was too ambitious. Instead, I fine-tuned a smaller model specifically for intent detection. The model takes transcribed speech and outputs a structured JSON object containing:

  • Intent category (lights, music, weather, etc.)
  • Command type (on/off, color change, volume, etc.)
  • Parameters (which lights, what color, etc.)
{
  "category": "lights",
  "command": "change_color",
  "parameters": {
    "room": "living_room",
    "color": "daylight"
  }
}

The intent router is a FastAPI service that receives this output and dispatches to the appropriate handler class. Here's a simplified version of my router:

@app.post("/process-intent")
async def process_intent(intent_data: dict):
    category = intent_data.get("category")
    if category == "lights":
        return await light_handler.process(intent_data)
    elif category == "music":
        return await music_handler.process(intent_data)
    elif category == "weather":
        return await weather_handler.process(intent_data)
    return {"status": "error", "message": "Unknown intent category"}

The Beauty of WiZ Lights: Just UDP

The most elegant part of this project has been integrating WiZ smart bulbs. Unlike many smart home devices that require cloud services, complicated APIs, or proprietary bridges, WiZ bulbs communicate over your local network using simple UDP messages. This was a revelation.

After discovering their protocol through some reverse engineering and community documentation, I was able to create a clean Python class that communicates directly with the bulbs:

class WiZController:
    def __init__(self, ip_list=None, discover=False):
        self.PORT = 38899
        self.bulbs = ip_list or []
        if discover:
            self.start()

    def send(self, ip, payload):
        with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
            sock.sendto(json.dumps(payload).encode(), (ip, self.PORT))

The commands themselves are remarkably simple JSON structures. Want to turn on a light? It's just:

{
  "method": "setState",
  "params": {
    "state": true
  }
}

Want to change the color temperature to daylight? Just:

{
  "method": "setState",
  "params": {
    "state": true,
    "temp": 6500
  }
}

Challenges and Solutions

Challenge 1: Bulb Discovery

def start(self):
    local_ip = self._get_local_ip()
    network = ipaddress.IPv4Interface(f"{local_ip}/24").network
    discovery_msg = {"method": "getPilot", "params": {}}
    discovered = []
    with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
        sock.settimeout(0.5)
        sock.bind(('', 0))
        for ip in network.hosts():
            try:
                sock.sendto(json.dumps(discovery_msg).encode(), (str(ip), self.PORT))
            except:
                pass
        start_time = time.time()
        while time.time() - start_time < self.timeout:
            try:
                data, addr = sock.recvfrom(1024)
                ip = addr[0]
                if ip not in discovered and ip not in self.bulbs:
                    response = json.loads(data.decode())
                    if "result" in response:
                        discovered.append(ip)
                        self.bulb_info[ip] = response["result"]
            except socket.timeout:
                pass

Challenge 2: Color vs. Temperature Confusion

TEMP_MAP = {
  "warm white": 2700,
  "soft white": 3000,
  "neutral": 4000,
  "cool white": 5000,
  "daylight": 6500
}

def set_temperature(self, ip_list, temp):
    temp = max(2200, min(6500, temp))
    for ip in ip_list:
        self.send(ip, {
            "method": "setState",
            "params": {
                "state": True,
                "temp": temp
            }
        })

Challenge 3: Rave Mode and Threading

def rave_mode(self, ip_list, interval=1.0):
    self.stop_rave()
    self.rave_running = True
    self.rave_thread = threading.Thread(
        target=self._rave_worker,
        args=(ip_list, interval),
        daemon=True
    )
    self.rave_thread.start()

def _rave_worker(self, ip_list, interval):
    try:
        while self.rave_running:
            for ip in ip_list:
                r, g, b = [random.randint(0, 255) for _ in range(3)]
                self.send(ip, {
                    "method": "setState",
                    "params": {
                        "state": True,
                        "r": r, "g": g, "b": b
                    }
                })
            time.sleep(interval)
    except Exception as e:
        print(f"Rave mode error: {e}")

Integration With the Voice Assistant

class LightHandler:
    def __init__(self):
        self.controller = WiZController(discover=True)

    async def process(self, intent_data):
        command = intent_data.get("command")
        params = intent_data.get("parameters", {})
        room = params.get("room", "all")
        if room == "all":
            target_bulbs = self.controller.bulbs
        else:
            room_map = {
                "living_room": ["192.168.1.107", "192.168.1.143"],
                "bedroom": ["192.168.1.158", "192.168.1.166"]
            }
            target_bulbs = room_map.get(room, [])

        if command == "turn_on":
            self.controller.set_state(target_bulbs, True)
            return {"status": "success", "message": f"Turned on {room} lights"}
        elif command == "turn_off":
            self.controller.set_state(target_bulbs, False)
            return {"status": "success", "message": f"Turned off {room} lights"}
        elif command == "change_color":
            color = params.get("color", "white")
            self.controller.set_color(target_bulbs, color)
            return {"status": "success", "message": f"Changed {room} lights to {color}"}
        elif command == "set_brightness":
            level = params.get("level", 50)
            self.controller.set_brightness(target_bulbs, level)
            return {"status": "success", "message": f"Set {room} brightness to {level}%"}
        elif command == "rave_mode":
            self.controller.rave_mode(target_bulbs)
            return {"status": "success", "message": "Rave mode activated"}
        return {"status": "error", "message": "Unknown light command"}

Performance on the Pi 5

One of the most pleasant surprises has been how well the Raspberry Pi 5 handles this workload. With its improved CPU and 8GB of RAM, it manages the following simultaneously with minimal lag:

  • Local speech recognition using Whisper small model
  • Intent classification
  • UDP communication with up to 18 WiZ bulbs
  • Local music playback via MPD
  • Text-to-speech response generation

The system typically responds within 1–2 seconds of a command, which is impressively close to commercial smart speakers—all while keeping data private and running 100% locally.

What's Next

  1. Adding room presence detection to automatically contextualize commands
  2. Implementing scenes that combine lighting, music, and other elements
  3. Creating a web dashboard for configuration and monitoring
  4. Expanding to control more device types using similar direct-communication approaches

Final Thoughts

Building this system has reinforced something I've long suspected: much of the complexity in commercial smart home products is unnecessary, serving business models rather than users. The WiZ bulbs demonstrate that smart home technology can be simultaneously powerful and simple.

There's something deeply satisfying about saying "Computer, set living room to daylight" and watching the lights respond—knowing that the command never left my local network, wasn't processed by a corporation, and doesn't depend on someone else's servers staying online.

That's the beauty of DIY smart home technology: it's truly yours.