Implement real-time phone speech recognition in project with Asterisk PBX and Kaldi/Vosk

In my programming project I build a system around an Asterisk VoIP server. My purpose is to enable streaming speech recognition once inbound call occurs, i.e. I want to run automatic voice recognition since starting of conversation two people are involved into. The ASR (automatic speech recognition) engine I have chosen to implement that is Kaldi powered by Vosk server ([login to view URL]). As it needs some integration into Asterisk software, I use Asterisk-specific module ([login to view URL]) to carry out ASR operations without compatibility issues. So far if anybody speaks anything while calling, it gives very clear text output. The problem I'm struggling is how to enable streaming ASR immediately during the conversation, i.e. since Dial() application of Asterisk dialplan gets executed.

That's a subject of this job - create script (most likely, with some Asterisk REST Interface components) which works as follows:

1) since Dial() application starts running, real-time audio stream gets processed via ASR engine that is waiting for inputs inside of docker container (because I deploy Kaldi as a software built in Vosk server which is compatible with Asterisk, here is the out-of-box program implementation released on Github: [login to view URL])

2) once conversation begins and voice streaming is detected, audial data flow heads the ASR powered by Vosk server (within the docker container);

3) while data flow continues because of ongoing conversation between persons, ASR generates transcribed outputs (files) which should be advanced to an HTTP server to evaluate content of them (don't concern about this part, it's beyond this particular job, surely);

4) since conversation gets wrapped up, last phrases get processed via ASR to pass final outputs to the HTTP server mentioned above;

5) whenever inbound call occurs, same steps to be carried out: audial data capture - speech recognition within the docker container - text file through to the HTTP server.

That all to be compliant with real time requirements, so data flow needs fast and seamless throughput before and after ASR processing, as a matter of course.

While searching for any helpful content on the Internet, I encountered this Stack Overflow question [login to view URL]

It makes clear the same purpose, just in other words than in my description. However, I demand implementation of the system design with Kaldi/Vosk rather then Google Speech. As for language to be used for development, I would leave some options. So, Python/Java/JS are acceptable to do that.

The job will be considered as complete and worth full payment only if there is a provable functionality of the program which enables all listed steps without implementation errors. Certainly, it must be compatible with all aforementioned software products too.

Evner: Asterisk PBX, VoIP, Linux, Python, CentOs

Se mere: j2me code real time mobile video streaming project, speech recognition project matlab, implement real time currency php, real time clock source code project pic, simple speech recognition project, speech recognition project labview, speech recognition project net, speech recognition project, real time text speech, julius speech recognition project, javascript real time phone mask, speech recognition project kinect, real time face object recognition, speech recognition project gujarati, voice changer android real time phone call, android real time license plate recognition app, google speech recognition api asterisk

Om arbejdsgiveren:
( 0 bedømmelser ) Samara, Russian Federation

Projekt ID: #29336057

1 freelancer byder i gennemsnit $140 på dette job


Hello, I'm very interested in your job as a speech processing engineer, who has many R&D experiences in LVSR(large vocabulary speech recognition) with Kaldi, deep speech2, deep speech, google API, IBM Watson, pocketsph Flere

$140 USD in 7 dage
(2 bedømmelser)