voice-assistant.io is a DIY voice terminal designed for custom hardware integration. It leverages the ESP32-S3 microcontroller and connects to the Gemini Live API, enabling users to build a programmable voice interface that bridges physical devices with AI-driven functionality. The product emphasizes a modular system architecture, allowing for flexible control over smart home systems, automation workflows, and external services.
The project provides open-source firmware, PCB designs, and tools to simplify the development process. By separating the system into distinct layers—audio capture, local logic execution, and AI model interaction—it ensures both simplicity and power. This approach allows developers to focus on specific tasks without compromising performance or scalability.
The system operates in three primary layers:
| Layer | Function |
|---|---|
| ESP32 Voice Frontend | Captures audio, runs UI, detects wake words |
| Local Assistant Agent | Executes functions and runs logic locally |
| AI Model Integration | Provides reasoning, speech understanding, and real-time responses |
This separation ensures that the hardware remains lightweight while still delivering powerful voice control capabilities. Users can train the system to recognize custom commands and execute actions via MQTT, webhooks, or direct function calls.
voice-assistant.io is ideal for a wide range of applications including smart home automation, developer tooling, custom hardware integration, and workflow automation. Its modular design allows it to be adapted to various use cases without requiring deep technical expertise.