
Closed
Posted
Paid on delivery
We are building a production-grade AI Calling Agent using a 2-VM architecture: VM-1 → Telephony, SIP trunk, media handling, STT & TTS (THIS TASK) VM-2 → AI Brain (LLM + Vector DB) – handled separately This project is strictly limited to VM-1. The freelancer will set up telephony, real-time audio processing, STT/TTS, API connectivity, and perform full fine-tuning so that live calls sound natural, stable, and low-latency. Scope of Work (VM-1 Only) VM & Network Setup OS: Ubuntu (preferred) on Proxmox Configure: Public interface → SIP + RTP Private interface → AI Brain API Optimize OS for: Low-latency audio High concurrent calls Firewall configuration: SIP (TLS) RTP port range Internal API access only SIP Trunk & Telephony Configuration Configure SIP Trunk (provider details will be shared) SIP over TLS preferred Handle: Incoming & outgoing calls NAT traversal DTMF Call start / end events Codec setup: Opus (primary) G.711 (fallback) Telephony stack: FreeSWITCH / Kamailio / OpenSIPS (freelancer must justify choice) Media & Call Flow Handling Handle RTP audio streams in real time Call flow: Receive caller audio Stream audio to STT Receive AI response Convert to TTS Play back audio to caller Support: Voice Activity Detection (VAD) Silence detection Caller barge-in (interrupt AI speech) STT (Speech-to-Text) Integration Integrate real-time STT Audio streaming in chunks Handle: Partial transcripts Final transcripts Tune for: Indian accents Background noise Fast & slow speakers Ensure no loss of first/last words TTS (Text-to-Speech) Integration Integrate low-latency TTS Telephony-optimized voices Tune: Speed Pitch Volume normalization Playback must: Start quickly Sound natural Stop immediately on barge-in API Layer for AI Brain Connectivity Expose secure APIs for VM-2: Send transcript + call context Receive response text / intent / action API requirements: HTTPS Token-based authentication Low-latency response handling Provide API documentation: Endpoints Payload structure Timeout & retry logic Fine-Tuning & Optimization (MANDATORY) A. Telephony & Audio Fine-Tuning Optimize: SIP timers RTP jitter buffers Packet loss handling Eliminate: Audio clipping Echo One-way audio Tune Opus bitrate & fallback behavior B. STT Accuracy Fine-Tuning Tune: Sample rate (16 kHz mono preferred) Chunk size Silence thresholds Reduce: False silence detection Missed or cut words Validate with real call recordings C. TTS Naturalness Fine-Tuning Optimize: Pause timing Natural speech gaps Voice clarity over phone lines Ensure human-like response timing D. End-to-End Latency Optimization Optimize full loop: Caller → STT → AI Brain → TTS → Playback Remove bottlenecks in: Audio pipeline Network calls API responses Goal: near-human conversational delay E. Failure & Fallback Handling Graceful handling for: AI Brain timeout STT/TTS failure Play fallback audio instead of dropping calls Ensure VM-1 never crashes due to external API delays Real-World Call Testing (REQUIRED) Freelancer must test & fine-tune for: Silent caller Continuous speech Caller interrupting AI Background noise Long calls (10–15 minutes) Multiple back-to-back calls Deliverables Fully working VM-1 SIP trunk tested with live calls End-to-end demo call API documentation Fine-tuning parameters & recommendations Deployment & restart guide Source/config files ownership transferred Acceptance Criteria Calls connect reliably Audio is clear & natural No noticeable lag in AI responses STT accuracy acceptable for Indian callers TTS stops instantly on barge-in System remains stable under repeated calls Required Skills SIP / RTP / VoIP FreeSWITCH / Kamailio / OpenSIPS Real-time audio streaming STT & TTS systems Linux server hardening API design AI voice bot experience (strong plus) Out of Scope Frontend or UI LLM prompt engineering Vector DB setup AI Brain logic (VM-2) Proposal Must Include Proposed telephony stack and reason Similar AI voice/IVR projects done Estimated timeline Assumptions & risks Estimated Timeline Setup & integration: 7–10 days Fine-tuning & testing: 5–7 days
Project ID: 40206286
6 proposals
Remote project
Active 4 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
6 freelancers are bidding on average ₹28,000 INR for this job

As an experienced engineer with a specialization in Robotics and Automation, I am more than well-equipped for this sophisticated AI Voice System Configuration project. Over the years, I have honed my skills in Linux server hardening, SIP, RTP, VoIP and real-time audio streaming – precisely the core proficiencies your project demands. Since robust voice activity detection, accurate STT and human-like TTS responses are key to your success, I can leverage my expertise to fine-tune and optimize it to work smoothly even with Indian accents and in diversified noisy environments. My previous hands-on experience in similar projects has not only exposed me to different real-world variables but also equipped me with the skills required to build a reliable system that can handle both call interrupted by a user and operate without crashing even when an external API fails or times out. I believe my proficiency in Kamailio/OpenSIPS/FreeSWITCH is just what you need for the telephony stack requirement. Also owing to my rich background in Robotics & Automation that involves intricate dealing with machines like Proxmox, Ubuntu etc., I can efficiently manage OS optimization to ensure low-latency audio coupled with high concurrent calls as you've specified. And finally, my systematic approach to problem-solving ensures .
₹25,000 INR in 7 days
0.0
0.0

With my diverse skills and three years of experience in full-stack development, I am ready to take on your AI Voice System Configuration & Fine-Tuning project. I am proficient with Ubuntu and have a strong understanding of Linux server hardening that will ensure your OS is optimized for low-latency audio, high concurrent calls and a secure firewall configuration. I've also worked extensively with VoIP systems involving FreeSWITCH/Kamailio/OpenSIPS stack, which would be vital in handling your telephony stack and fine-tuning codecs such as Opus and G.711. My experience extends to developing robust APIs that can efficiently handle real-time audio-streaming such as the STT/TTS integration you require. My familiarity with API design will ensure I provide you with a low-latency, highly-secure token-based authentication API layer for your VM-2 connectivity. Additionally, being adept in database management, I'll effectively transfer ownership of source/config files to you after completion. What sets me apart is not just the technical expertise but also my strong commitment to quality assurance through real-world testing. From silent callers to disruptive background noises - I have experienced them all and ensured smooth functioning. Lastly, I prioritize excellent communication and timely delivery which are key to the success of any project. Let's work together to build a reliable, natural-sounding AI voice system that meets all your needs!
₹18,000 INR in 4 days
0.0
0.0

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
₹35,000 INR in 7 days
1.6
1.6

Hi, I’m an AI expert with professional experience in computer vision, with a proven track record of working on complex image processing and AI/ML model development. With skill sets: • Algorithm Development: Strong understanding of computer vision algorithms and techniques, including convolutional neural networks (CNNs), object detection, image segmentation and feature extraction. • Model Training & fine-tuning: Develop and train machine learning models tailored for image analysis and visual data interpretation. I have worked on some well-known models like YOLO, RCNN, U-Net, Deeplab, ViT etc. • AI Integration: Implement and integrate AI models into existing software and hardware systems, ensuring high performance and scalability. • Data Analysis: Analyze and process large datasets of images and video feeds to identify patterns, trends, and insights. • Data Handling: Experience in handling and processing large datasets, including image and video data. Familiarity with data augmentation techniques and synthetic data generation. • Performance Optimization: Optimize algorithms and models for real-time processing and ensure they can handle large-scale data efficiently. • Programming Skills: Proficient in programming languages such as Python. Experience with deep learning frameworks like TensorFlow, PyTorch, or Keras. • Tools & Libraries: Proficiency with OpenCV, scikit-image, and other relevant libraries. Experience with version control systems like Git.
₹25,000 INR in 7 days
0.0
0.0

Patiala, India
Payment method verified
Member since Aug 27, 2025
₹12500-37500 INR
₹1500-12500 INR
₹12000-18000 INR
₹1500-12500 INR
₹12500-37500 INR
$250-750 USD
$250-750 USD
$30-250 USD
₹600-1500 INR
$7000 USD
$30-250 USD
$10-30 USD
$250-750 USD
₹600-1500 INR
$30-250 USD
€30-250 EUR
min €36 EUR / hour
$10-30 USD
$250-750 NZD
€8-30 EUR
€30-250 EUR
₹750-1250 INR / hour
₹600-1500 INR
£10-20 GBP
$7000 USD