Analysis and detection of singing techniques
in repertoires of J-POP solo singers
Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
Paper on ArXiv Conference page
Abstract of the paper
In this paper, we focus on singing techniques within the scope of music information retrieval research. We investigate how singers use singing techniques using real-world recordings of famous solo singers in Japanese popular music songs (J-POP). First, we built a new dataset of singing techniques. The dataset consists of 168 commercial J-POP songs, and each song is annotated using various singing techniques with timestamps and vocal pitch contours. We also present descriptive statistics of singing techniques on the dataset to clarify what and how often singing techniques appear. We further explored the difficulty of the automatic detection of singing techniques using previously proposed machine learning techniques. In the detection, we also investigate the effectiveness of auxiliary information (i.e., pitch and distribution of label duration), not only providing the baseline. The best result achieves 40.4% at macro-average F-measure on nine-way multi-class detection. We provide the annotation of the dataset and its detail on the appendix website (this site). https://yamathcy.github.io/ISMIR2022J-POP/
Dataset “COSIAN”
Description
We built a new dataset named COSIAN (a COllection of SInging voice ANnotation) to conduct the analysis. COSIAN is an annotation collection of Japanese popular (J-POP) songs, focusing on singing style and expression of famous solo-singers.
It consists of various 168 songs. There are 21 female- and 21 male singers. Each singer has four songs that have different moods from each other.
What is the motivation?
Understanding the singing voice more
The basic concept of the work is analyzing the singers’ characteristics by clarification of how they render the song. One of the naive ways to realize it is annotating the presence of singing techniques, which are produced by fluctuating the pitch, timbre, etc. However, there are no such datasets, so we decided to build it.
Metadata
It contains songlist. it contains following information;
Annotations
- Singing techniques: Overlapping strong labeled annotation (i.e. kinds and timestamps) of singing techniques.
(CAUTION) Audio files are not contained below!!
-> If you want the annotation files, access here and request a permission. The annotation is research purpose only.
The request should include the following. Otherwise it will be rejected.
- Name
- Affiliation
- Email Address
- Agree to the License
- Pitch (not publicly available): Since pitch is an essential component of singing technique analysis, we further annotated melodic pitch using Tony, followed by manual correction such as removing the unvoiced parts and reverberation tails.
Because of copyright issue, we don’t provide raw audio tracks. Instead, we provide links of music streaming service for each songs in COSIAN.
-
Spotify links: Because of copyright issue, we don’t provide raw audio tracks. Instead, we provide Spotify links of each songs in COSIAN.
-
YouTube links: We also provides YouTube links on the YouTube playlist. Note that the playlist only contains official mv without alignment information.
-
Amazon music links (work in progress): We will also provides Amazon music links for you to purchase CD recordings, which we actually used in the task.
-
Apple music links : In addition to Amazon music, we also provide Apple music links. When purchasing each music track via Apple music, please purchase them from the “apple_music” column in the spreadsheet.
Annotation procedure
We used Sonic visualiser, to annotate the singing techniques with both of the help of sound playback and visualizing the spectrograms and pitchgrams.
Annotated singing techniques
Overview
Examples of each singing technique
Data statistics
- Annotated duration
- Song released year
- Count and duration of singing techniques
- Distribution of duration of each singing technique
- Singer-wise count of singing techniques
Detected examples
These are the examples automatically detected by Focal-GT model. Note that videos are sample of audio clip, we actually used audio from the CD recordings for the task.
Good examples
#1: Sakura / Ikimono gakari | |
---|---|
Video clip 1:30-1:36 | Label (Upper: ground truth label, lower: detected labels) |
#2: Omoiga karanaru sono mae ni / Ken Hirai | |
---|---|
Video clip 2:38-2:45 | Label (Upper: ground truth label, lower: detected labels) |
Bad examples
We confirmed that one of the common mis-detection cases is from the detection of too short or frequently switching regions.
#1: Readymade / Ado | |
---|---|
Video clip 0:50-0:55 | Label (Upper: ground truth label, lower: detected labels) |
#2: Honey / L'Arc~en~Ciel | |
---|---|
Video clip 0:14-0:20 | Label (Upper: ground truth label, lower: detected labels) |
Contact
If you have any questions about the paper, please contact the first author Yuya. We also accept issues in github repository.
License
The COSIAN contains copyright material. We share COSIAN with researchers under the following conditions:
- COSIAN may only be used by the individual signing below and by members of the research group or organisation of this individual. This permission is not transferable.
- COSIAN may be used only for non-commercial research purposes.
- COSIAN (or data enabling the its reproduction) may not be sold, leased, published or distributed to any third party without written permission from the COSIAN administrator.
University of Tsukuba and KAIST shall not be held liable for any errors in the content of COSIAN nor damage arising from the use of COSIAN. The COSIAN administrator may update these conditions of use at any time.
Citation
Cite the ISMIR 2022 paper.
@inproceedings{yamamoto2022analysis,
author = {Yamamoto, Yuya and Nam, Juhan and Terasawa, Hiroko},
title = {Analysis and Detection of Singing Techniques in Repertoires of J-POP solo singers},
booktitle = {Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR)},
year = {2022}
}