Voice Assistant
Flibbert includes a built-in voice assistant powered by OpenAI's Realtime API that allows you to control your ESP32 devices using natural speech. Talk to your devices directly through the Flibbert dashboard and get real-time responses.
Features
- Real-time voice interaction - Speak naturally to control your devices
- Context awareness - The assistant knows which device you own
- Secure API key management - Your OpenAI API key is stored securely
- Low-latency communication - Instant voice responses
Prerequisites
Before using the voice assistant, you need:
- OpenAI API key - Get one from OpenAI Platform
- OpenAI Realtime API access - Ensure your API key has access to the Realtime API
- Connected ESP32 device - Your device should be online and connected to Flibbert
- Modern web browser - Chrome, Firefox, Safari, or Edge with microphone support
- Microphone access - Grant microphone permissions to your browser
Setup
1. Configure OpenAI API Key
- Navigate to Preferences in the Flibbert dashboard
- Click the "Connect to OpenAI" button under OpenAI Integration
- Enter your OpenAI API key when prompted
- The button will show "✓ Connected to OpenAI" when successful
2. Configure Actions with Input Schema
To enable actions on your device, add an input-schema.json file to your project. This file defines the form fields that will appear on the device Actions page in the dashboard.
Example input-schema.json:
{
"type": "object",
"title": "LED Control",
"properties": {
"ledState": {
"type": "string",
"enum": ["on", "off"],
"description": "Turn LED on or off"
}
}
}
Adding this file to your project will render a form on the device Actions page with a dropdown field for ledState (on/off). When you select a value and submit, an event will be sent to the device with the JSON data. See more information about react-jsonschema-form docs and playground.
3. Prepare Your Device Code
Your ESP32 code should handle action requests and parse the JSON data. Here's a complete example using cJSON:
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include "env.h"
#include "gpio.h"
#include "cJSON.h"
#define LED_PIN 2
int main() {
// Configure GPIO pin as output for LED
gpio_set_direction(LED_PIN, 2);
while (1) {
// Check for incoming action requests
unsigned int request = dequeue_action_request();
if (request > 1) {
// Get the JSON data from the action request
char* data = (char*)get_action_request_data(request);
printf("Received action data: %s\n", data);
// Parse JSON data
cJSON* json = cJSON_Parse(data);
if (json != NULL) {
// Extract ledState field
cJSON* ledState = cJSON_GetObjectItem(json, "ledState");
if (ledState != NULL && cJSON_IsString(ledState)) {
const char* state = cJSON_GetStringValue(ledState);
if (strcmp(state, "on") == 0) {
printf("Turning LED ON\n");
gpio_set_level(LED_PIN, 1);
} else if (strcmp(state, "off") == 0) {
printf("Turning LED OFF\n");
gpio_set_level(LED_PIN, 0);
}
}
// Clean up JSON object
cJSON_Delete(json);
} else {
printf("Failed to parse JSON data\n");
}
// Clean up action request
destroy_action_request(request);
}
delay(100);
}
return 0;
}
Key points:
- Use
dequeue_action_request()to check for incoming actions - Use
get_action_request_data(request)to retrieve the JSON string - Parse JSON with cJSON library to extract the
ledStatefield - Set GPIO pin high (1) for "on" or low (0) for "off"
- Always call
destroy_action_request(request)when done
Using the Voice Assistant
1. Starting a Conversation
- Click the microphone button to start the voice assistant
- Grant microphone permissions if prompted
- Start speaking when the connection is established
2. Voice Commands
You can give natural language commands like:
- "Turn on the LED" - Sets ledState to "on"
- "Turn off the light" - Sets ledState to "off"
- "Switch on the LED" - Sets ledState to "on"
- "Switch it off" - Sets ledState to "off"
3. Ending the Session
- Click the X button to stop the voice assistant
- The microphone will be released and the connection closed
API Costs
The OpenAI Realtime API has usage-based pricing. Voice assistant sessions consume credits based on audio input and output duration. Monitor your usage on the OpenAI Platform to track costs.
Security
Your OpenAI API key is stored securely:
- Encrypted storage - API keys are encrypted in the database
- User isolation - Each user's keys are isolated and private
- No logging - API keys are never logged or displayed in plain text
- Secure transmission - All API communications use HTTPS
Examples
Basic LED Control
You: "Turn on the LED" Assistant: "I'll turn on the LED for you"
You: "Turn it off now" Assistant: "The LED is now turned off"
The voice assistant understands natural speech and can populate the action form fields based on your commands.