Our research interests include spoken language processing, natural
language processing, multimedia information retrieval, machine learning and pattern recognition.
The research goal is to develop methods for analyzing, extracting, recognizing, indexing, and
retrieving information from audio data, with the special emphasis on speech and music.
In the speech area, our research has been focused mainly on speech recognition, speaker recognition,
speaker segmentation/clustering/diarization, spoken document retrieval/summarization, etc.
The recent achievements include a minimum-boundary-error-based discriminative acoustic model training
and decoding framework for automatic phone segmentation, a novel characterization of the alternative
hypothesis using kernel discriminant analysis for likelihood ratio-based speaker verification,
a new divide-and-conquer framework for fast speaker segmentation and diarization, and
a probabilistic generative framework for extractive spoken document summarization.
The ongoing research includes attribute-detection-based speech/language recognition,
language modeling for speech recognition/document classification/information retrieval,
voice conversion, hidden Markov model-based speech synthesis, etc.
In the music area, our research has been focused mainly on vocal melody extraction,
query by singing/humming, solo vocal modeling, music tag annotation, tag-based music information retrieval (MIR), etc.
The recent achievements include a novel cost-sensitive multi-label (CSML) learning framework for automatic
music tagging, a novel query by multiple tags with multiple levels of preference (denoted as an MTML query)
scenario and a corresponding tag cloud-based query interface for MIR. We have participated in the MIREX audio tag
classification task since 2009 and achieved top performance. Our ongoing research includes continuous improving of
our own technologies and systems, audio feature analysis, semantic visualization of music tags, and vocal separation,
so as to facilitate the management and retrieval of a large music database. Our future research directions also include
real-time music tagging, singing voice synthesis, and automatic music structure analysis/summarization.