This paper presents a novel method for transcription of folk music that exploits folk music specifics to improve transcription accuracy. In contrast to most commercial music, folk music recordings may contain various inaccuracies, as they are usually performed by amateur musicians and recorded in the field. If we use standard approaches for transcription, these inaccuracies are reflected in erroneous pitch estimates. On the other hand, the structure of folk music is simple, as songs are often composed of repeated melodic parts. In our approach, we make use of these repetitions to increase transcription robustness and improve its accuracy. The proposed method fuses three sources of information: (1) frame-based multiple F0 estimates, (2) song structure and (3) pitch drift estimates. It first selects the representative segment of the song and aligns all the other segments to it, considering temporal as well as frequency deviations. Information from all segments is summarized and used in a two-layer probabilistic model based on explicit duration HMMs, to segment frame-based information into notes. The method is evaluated and compared with state-of-the-art transcription methods, where we show that significant improvement in accuracy can be achieved.
148 files are available from the Transcription and Segmentation database are available in the ZIP file above.
The dataset includes: