Voice Assistant

Flibbert includes a built-in voice assistant powered by OpenAI's Realtime API that allows you to control your ESP32 devices using natural speech. Talk to your devices directly through the Flibbert dashboard and get real-time responses.

Features

Real-time voice interaction - Speak naturally to control your devices
Context awareness - The assistant knows which device you own
Secure API key management - Your OpenAI API key is stored securely
Low-latency communication - Instant voice responses

Prerequisites

Before using the voice assistant, you need:

OpenAI API key - Get one from OpenAI Platform
OpenAI Realtime API access - Ensure your API key has access to the Realtime API
Connected ESP32 device - Your device should be online and connected to Flibbert
Modern web browser - Chrome, Firefox, Safari, or Edge with microphone support
Microphone access - Grant microphone permissions to your browser

Setup

1. Configure OpenAI API Key

Navigate to Preferences in the Flibbert dashboard
Click the "Connect to OpenAI" button under OpenAI Integration
Enter your OpenAI API key when prompted
The button will show "✓ Connected to OpenAI" when successful

2. Configure Actions with Input Schema

To enable actions on your device, add an input-schema.json file to your project. This file defines the form fields that will appear on the device Actions page in the dashboard.

Example input-schema.json:

{
  "type": "object",
  "title": "LED Control",
  "properties": {
    "ledState": {
      "type": "string",
      "enum": ["on", "off"],
      "description": "Turn LED on or off"
    }
  }
}

Adding this file to your project will render a form on the device Actions page with a dropdown field for ledState (on/off). When you select a value and submit, an event will be sent to the device with the JSON data. See more information about react-jsonschema-form docs and playground.

3. Prepare Your Device Code

Your ESP32 code should handle action requests and parse the JSON data. Here's a complete example using cJSON:

#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include "env.h"
#include "gpio.h"
#include "cJSON.h"

#define LED_PIN 2

int main() {
    // Configure GPIO pin as output for LED
    gpio_set_direction(LED_PIN, 2);

    while (1) {
        // Check for incoming action requests
        unsigned int request = dequeue_action_request();

        if (request > 1) {
            // Get the JSON data from the action request
            char* data = (char*)get_action_request_data(request);
            printf("Received action data: %s\n", data);

            // Parse JSON data
            cJSON* json = cJSON_Parse(data);
            if (json != NULL) {
                // Extract ledState field
                cJSON* ledState = cJSON_GetObjectItem(json, "ledState");
                if (ledState != NULL && cJSON_IsString(ledState)) {
                    const char* state = cJSON_GetStringValue(ledState);

                    if (strcmp(state, "on") == 0) {
                        printf("Turning LED ON\n");
                        gpio_set_level(LED_PIN, 1);
                    } else if (strcmp(state, "off") == 0) {
                        printf("Turning LED OFF\n");
                        gpio_set_level(LED_PIN, 0);
                    }
                }

                // Clean up JSON object
                cJSON_Delete(json);
            } else {
                printf("Failed to parse JSON data\n");
            }

            // Clean up action request
            destroy_action_request(request);
        }

        delay(100);
    }

    return 0;
}

Key points:

Use dequeue_action_request() to check for incoming actions
Use get_action_request_data(request) to retrieve the JSON string
Parse JSON with cJSON library to extract the ledState field
Set GPIO pin high (1) for "on" or low (0) for "off"
Always call destroy_action_request(request) when done

Using the Voice Assistant

1. Starting a Conversation

Click the microphone button to start the voice assistant
Grant microphone permissions if prompted
Start speaking when the connection is established

2. Voice Commands

You can give natural language commands like:

"Turn on the LED" - Sets ledState to "on"
"Turn off the light" - Sets ledState to "off"
"Switch on the LED" - Sets ledState to "on"
"Switch it off" - Sets ledState to "off"

3. Ending the Session

Click the X button to stop the voice assistant
The microphone will be released and the connection closed

API Costs

The OpenAI Realtime API has usage-based pricing. Voice assistant sessions consume credits based on audio input and output duration. Monitor your usage on the OpenAI Platform to track costs.

Security

Your OpenAI API key is stored securely:

Encrypted storage - API keys are encrypted in the database
User isolation - Each user's keys are isolated and private
No logging - API keys are never logged or displayed in plain text
Secure transmission - All API communications use HTTPS

Examples

Basic LED Control

You: "Turn on the LED" Assistant: "I'll turn on the LED for you"

You: "Turn it off now" Assistant: "The LED is now turned off"

The voice assistant understands natural speech and can populate the action form fields based on your commands.

Features​

Prerequisites​

Setup​

1. Configure OpenAI API Key​

2. Configure Actions with Input Schema​

3. Prepare Your Device Code​

Using the Voice Assistant​

1. Starting a Conversation​

2. Voice Commands​

3. Ending the Session​

API Costs​

Security​

Examples​

Basic LED Control​