Core Concepts
Call Initiation Responsibility
The SDK supports two approaches for call initiation, differing in who is responsible for creating the call:
Client-Side Initiation
- Responsibility: The SDK directly creates calls with Hidoba Research
- Process: SDK uses your API key to create and manage the entire call lifecycle
- Use Case: Development, prototyping, simple applications
Server-Side Initiation
- Responsibility: Your backend creates calls, SDK connects using provided credentials
- Process: Your server creates the call and returns a signed URL for the SDK to use
- Use Case: Production applications, enhanced security, user management
Call Flow
Understanding the typical flow of an AI voice call helps you implement proper event handling:
- Call Initiation: User clicks call button →
onCallStarttriggered - Call Creation: Either SDK creates call directly (client-side) or your backend creates call (server-side)
- Permission Request: Browser requests microphone access automatically
- Permission Granted:
onCallStartingcallback triggered - WebSocket Connection: SDK establishes audio streaming connection
- Connected:
onConnectedcallback triggered → conversation can begin - Active Conversation: Real-time audio streaming and optional transcript display
- Call End: User hangs up →
onHangUpcallback triggered
Callback Architecture
The SDK uses an event-driven architecture with callbacks to handle different states:
- Status Callbacks: Keep your UI updated with current call state
- Error Handling: Gracefully handle connection issues and permissions
- RAG Integration: Display external document sources when AI references them
- User Feedback: Show connection progress and call quality information
Permission Management
Microphone access is handled automatically by the SDK:
- Automatic Request: Permission requested immediately after backend call creation
- Early Optimization: Stream acquired during backend polling to reduce perceived latency
- No Manual Setup: No need to request permissions separately - SDK handles it
- HTTPS Required: Browser security requires secure context for microphone access
Audio Processing
The SDK includes advanced audio processing capabilities:
- Real-time Processing: Uses AudioWorklet for low-latency audio handling
- Automatic Optimization: Built-in echo cancellation and noise reduction
- Device Flexibility: Switch microphones and speakers during active calls
- Quality Control: Automatic audio format optimization for best performance