Voice AI

Bilingual Voice Agent on LiveKit

LiveKitPythonFastAPIWhisperElevenLabsDeepgramVAD

The Challenge

The client needed a production voice assistant that felt natural in both English and Urdu — two languages with very different phonetic patterns, speech rhythms, and user expectations. Off-the-shelf voice pipelines either lacked Urdu support, had unacceptable latency, or produced robotic TTS output that broke trust with end users.

The Solution

LiveKit Agents Framework: Built the entire voice pipeline on LiveKit Agents, handling real-time audio streaming, room management, and agent dispatch. This eliminated the need for custom WebRTC infrastructure.
STT Pipeline: Used Whisper large-v3 for transcription, fine-tuned on domain-specific vocabulary. Deepgram served as a low-latency fallback for English-only flows where speed was prioritized over accuracy.
VAD-Based Turn Detection: Implemented Silero VAD for natural end-of-turn detection, eliminating the awkward fixed-timeout pauses of most voice bots. The agent only responds when the user has genuinely finished speaking.
TTS Output: ElevenLabs for English (natural, expressive voice) with a custom Urdu TTS pipeline for the bilingual requirement. Both voices were configured to match the client's brand tone.
LLM Orchestration: FastAPI backend orchestrates the STT → LLM → TTS pipeline with async processing to minimize compounding latency across each stage.

Key Results

Sub-200ms end-to-end voice response latency (STT + LLM inference + TTS) in production.
Natural bilingual conversation in English and Urdu with consistent persona and tone.
VAD-based turn detection eliminated the robotic pause patterns common in fixed-timeout systems.
Deployed on LiveKit Cloud with automatic room scaling for concurrent sessions.
Zero custom WebRTC infrastructure required — all media handling via LiveKit Agents SDK.

Project Details

CategoryVoice AI

RoleLead Developer

ClientEnterprise · Bilingual

CompletedMarch 2025

Related case studies

Agentic RAG

Malakah: Agentic RAG Legal Assistant

97–98%

Agentic RAG legal assistant for Saudi regulatory law. 97–98% recall@10 across a bilingual English-Arabic corpus validated by legal domain experts.

Agentic RAG

Enterprise AI Search and Chatbot: Energy Sector

300+ GB

Intelligent search system processing 300+ GB of unstructured enterprise data for a major energy company, including 1K–6K page technical documents.

AI SaaS

Multi-Tenant AI SaaS Platform: Document Intelligence

AI-powered document intelligence platform with multi-tenant architecture, RBAC, and hybrid retrieval.

Multi-Tenant AI SaaS Platform: Document Intelligence

AI-powered document intelligence platform with multi-tenant architecture, RBAC, and hybrid retrieval.

Pakistani Currency Classifier

Real-time EfficientNet-based classification of Pakistani currency notes achieving 99% accuracy.