Skip to main content

Project 5 - Live Captioning App


In this blog post, I am going to show you how create a app in python which converts speech to text from the selected audio input device. To achieve this, I am going to be using an open-source speech recognition toolkit called Vosk.

NOTE: Before you start following the further steps, it is recommended that you create a virtual environment. For instructions on creating a virtual environment head over to Venv docs.

To start off, we are first going to need to way to capture audio input. To do this, we are going to use the python-sounddevice library. To get started follow the installation steps here.

Then, import the library into your script by adding the following at the top of your script.

import sounddevice as sd

Now, we need to find the sample rate of our input device. To do this, we are going to use the query devices function.

device_info = sd.query_devices(device=None, kind='input')
samplerate = int(device_info['default_samplerate'])

In this step, you can also set the device parameter to the id of the device of your choice. To find out the list of input devices and their id, you can use the following function:

def getInputDevices():
    devicesRaw = sd.query_devices()
    devices = {}
    for x in range(len(devicesRaw)):
        if (devicesRaw[x]['max_input_channels'] > 0):
            devices[x] = devicesRaw[x]['name']

    return devices

Before we go any further, we need to load the model from our speech recognition toolkit. First, we need to install the vosk python library which you can do by following these instructions. Then download the vosk model of your choice from this list, extract it and place it in your project folder. Then import the library and load the model by typing:

import vosk
model = vosk.Model('path/to/model')

Now we can start detecting the speech using the vosk library:

import queue
import sys

q = queue.Queue()  # stores the audio while it is being processed
result = ''  # complete result (more accurate but slower prediction)
partialResult = ''  # partial result (less accurate but faster prediction)


# Callback for the speech to text function
def callback(outdata, frames, time, status):
    """This is called (from a separate thread) for each audio block."""
    if status:
        print(status, file=sys.stderr)
    q.put(bytes(outdata))


# Detects text from speech input
def speechToText():
    with sd.InputStream(samplerate=samplerate, blocksize=8000, device=None,
                        dtype='int16', channels=1, callback=callback):
        rec = vosk.KaldiRecognizer(model, samplerate)
        while True:
            data = q.get()
            if rec.AcceptWaveform(data):
                global result
                result = rec.Result()[14:-3]
            else:
                global partialResult
                partialResult = rec.PartialResult()[17:-3]

Now we can read the result partial result and result variables and use them for our needs.

So what can we do next?

  1. We can add a GUI which displays the text while it is being predicted.
  2. We can save the text for future use.
  3. We can use it to dictate notes by using the PyAutoGUI  library to type the predicted captions as keyboard input.
You can find my version of this app at github/code-explorer/Captionator.
If you have any questions or suggestions, feel free to post them in the comments down below.

Comments

Popular posts from this blog

Project 3 - Analysis of sorting algorithms

Sorting algorithms are one of the most basic as well as one of the most used algorithms. They form the basis for many other data structures and algorithms and are also a great way to learn to analyse algorithms.  In this post, I would like to perform my own analysis of these sorting algorithms to understand where and why various sorting algorithms should be used. My main focus is going to be practical analysis of these sorting algorithms and I am also going to be considering the simplicity of these algorithms. So first, let us think about what basis we are going to use for these sorting algorithms. To analyse any sorting algorithm, let us measure the time it takes to sort an array of integers. The array of integers that we are going to give to the sorting algorithms should be of the following types: Random arrays. Ex: [5, 2, 9, 7, 0, 4]. Sorted arrays. Ex: [3, 5, 7, 8, 11]. Sorted arrays in reverse order. Ex: [14, 11, 7, 3, 1]. Sorted arrays with a few random elements added to the ...

Project 6 - State Space Search - 8-Puzzle

State space search is a process which is used to create simple artificial intelligence. It can be used when the problem can be represented as a set of simple states and the player / agent is the only one who can affect the environment. It allows us to generate a path from the initial state to the goal state (of which there can be many depending on the problem). 8-Puzzle / Sliding Puzzle ,  N-queens  and  Route Finding  are some of the various problems which can be solved using state space search. Let us explore the process of solving the 8-Puzzle problem using various path finding algorithms and comparing how effective each of them is. To get started, we need to start by creating the 8-Puzzle game itself. First, we need a simple way to represent different states of the puzzle. A good way to do this is to store the state as a string. For example: Next up, we need a way to generate the actions that are possible from a particular state. We can  think of it as movin...

Project 1 - Browser linked list implementation

All of us use a browser to surf through the internet. In fact you are using one right now. Have you ever wondered how the forward and backward buttons of the browser work ? Or how the undo and redo functions of your text editor works ? The answer to this is a doubly linked list. A linked list consists of various individual nodes which store some data as well as a pointer to the next node. A doubly linked list has nodes which have pointers to the previous node as well. In a circular linked list, the last node points to the first node. Here is how the application is going to work: Whenever you go to a new website, a new node is added in front of the current node. And the forward and backward button traverse through the linked list.  If you wish to tinker with the code, here is the Github Repository . You can easily expand this program to act like undo and redo buttons of a text editor. Linked list are used in various other practical applications such as photo viewers and music player...