Segmentation of field recordings

Goal

We are exploring automatic segmentation and labeling of ethnomusicological field recordings. Field recordings are integral documents of folk music performances and typically contain interviews with performers intertwined with actual performances. As these are live recordings of amateur folk musicians, they may contain interruptions, false starts, environmental noises or other interfering factors.

Our goal is to design robust automatic algorithms that approximate manual segmentation of field recordings and classify segments into a set of predefined classes.
The tools developed are integrated into the Ethnomuse digital archive of Slovenian folk music and dances. The figure shows visualization of a segmented field recording within the Ethnomuse application. Various signal classes (speech, solo singing, choir singing, instrumental and bell chiming) are shown in different colors, segment boundaries are shown as vertical lines. Users can manipulate and manually adjust the found bounaries, as well as listen to the recording and annotate its contents.

The algorithm can also be used for more general speech/music segmentation and is very robust, as also demonstrated by its result at MIREX 2015 Music/Speech Classification and Detection results.

Download

The algorithm is available for download, you need Matlab to run it. This version was submitted to MIREX 2015.

Download

Algorithm

We take a probabilistic approach to segmentation and labeling of field recordings. First, short audio fragments are classified into one of the following categories: speech, solo singing, choir singing, instrumental or bell chiming performance. Then, a set of candidate segment boundaries is obtained by observing how the energy of the signal and its content change, and finally the recording is segmented with a probabilistic model that maximizes the posterior probability of segments given a set of candidate segment boundaries with their probabilities and prior knowledge of lengths of segments belonging to different categories.

For more details, see:

  • [PDF] M. Marolt, "Probabilistic segmentation and labeling of ethnomusicological field recordings," in ISMIR 2009 : proceedings of the 10th International Society for Music Information Retrieval Conference, October 26-30, 2009, Kobe, Japan, 2009, pp. 75-80.
    [Bibtex]
    @conference{7368532,
    author={Matija Marolt},
    year={2009},
    pages={75-80},
    title={Probabilistic segmentation and labeling of ethnomusicological field recordings},
    booktitle={ISMIR 2009 : proceedings of the 10th International Society for Music Information Retrieval Conference, October 26-30, 2009, Kobe, Japan},
    }

Visualization of a segmented field recording