Audio Segmentation for Audio Transcription

Audio Segmentation

Problem to Solve

Audio Segmentation is a very important processing stage for most of audio analysis applications. The goal is to split an uninterrupted audio signal into homogeneous segments. Each segment should consist of a single sound that is acoustically different from other parts of the audio file. An accurate segmentation process can identify appropriate boundaries for partitioning given audio into homogeneous regions.

Afterwards, we try to classify the homogeneous segments to figure out its sound activity (Music/Speech…).

Figure 1: Audio Segmentation

Proposed Solution

We provide a solution using two phases of segmentation, the first type of segmentation is either unsupervised or semi-supervised. In both cases, no prior knowledge on the involved classes of audio content is used. The second contains algorithms that adopt some type of prior knowledge.

Feature extraction: several audio features both from the time and frequency domain are implemented. In addition, to be more efficient we make selection of the most impacted features to increase the accuracy and decrease the time of processing.

Semi-supervised audio segmentation: in this step, we take an uninterrupted audio as input and returns segment endpoints that correspond to individual audio events, detecting silent areas of the audio.

Supervised audio segmentation: in this step we take the homogeneous segments and split it into fixed-size segments and classifies each segment separately using some supervised model. Successive segments that share a common class label are merged in a post-processing stage.

Figure 2: Pipeline Audio Segmentation

Technical Approach

After processing an audio file, our system returns a CSV file and TextGrid file containing each homogeneous segments and labels. For CSV file we have the following structure:

Start time , End time , Label

Figure 3: CSV file

Figure 4:TextGrid File

We can display our output with “Praat” solution using textGird file.

Conclusion and Recommendation

In many audio processing applications, audio segmentation plays a vital role in preprocessing step. It has a significant impact on:

Speaker diarization.
Speech recognition.
Real-time applications of multimedia.
Human-computer interaction systems.

Audio segmentation have many challenges as:

Two or more activities are very close in time.
Overlapped segment: we have speech and music palying at the same time.
The quality of the audio.