Congratulations to Kuan-Yu Chen for receiving the Bronze Medal of the 6th Merry Electroacoustic Thesis Award (第六屆美律電聲論文獎銅質獎).

Our research interests include speech processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. Our research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special emphasis on speech and music.

In the field of speech, research has been focused mainly on speaker recognition, spoken language recognition, voice conversion, and spoken document retrieval/summarization. Our recent achievements include a new maximum mutual information-based framework for GMM-based voice conversion, subspace-based spoken language identification, and i-vector-based language modeling for spoken document retrieval. Our ongoing research includes language modeling for speech recognition/document classification/information retrieval, subspace-based speaker/spoken language recognition, discriminative training for GMM-based voice conversion, and expressive speech synthesis.  

In the music field, research has been focused mainly on vocal melody extraction, automatic music tagging, music emotion recognition, and music search. Our recent achievements in this field include a novel cost-sensitive multi-label (CSML) learning framework for music tagging, a novel query by multiple tags with multiple levels of preference (denoted as an MTML query) scenario and a corresponding tag cloud-based query interface for music search, and an acoustic emotion Gaussians model for emotion-based music annotation and retrieval. Our extended work on acoustic visual emotion Gaussians modeling for automatic music video generation won the ACM Multimedia 2012 Grand Challenge First Prize. Our ongoing research includes continuous improvement of our own technologies and systems, audio feature analysis, semantic visualization of music tags, and vocal separation, so as to facilitate the management and retrieval of a large music database. Future research directions also include singing voice synthesis, context-aware music retrieval/recommendation, and music structure analysis/summarization.