Congratulations to Hung-Yi Lo for receiving his PhD degree in Jan 2013.           Congratulations to Ju-Chiang Wang for receiving his PhD degree in Jan 2013.           Congratulations to Ju-Chiang Wang, Hsin-Min Wang, and their team for receiving ACM Multimedia 2012 Grand Challenge First Prize.           Congratulations to Yu-Chin Shih for receiving the ACLCLP Excellent Master’s Thesis Award.

Our research interests include spoken language processing, natural language processing, multimedia information retrieval, machine learning and pattern recognition. The research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with the special emphasis on speech and music.

In the speech area, our research has been focused mainly on speech recognition, speaker recognition, speaker segmentation/clustering/diarization, spoken document retrieval/summarization, etc. The recent achievements include a minimum-boundary-error-based discriminative acoustic model training and decoding framework for automatic phone segmentation, a novel characterization of the alternative hypothesis using kernel discriminant analysis for likelihood ratio-based speaker verification, a new divide-and-conquer framework for fast speaker segmentation and diarization, and a probabilistic generative framework for extractive spoken document summarization. The ongoing research includes attribute-detection-based speech/language recognition, language modeling for speech recognition/document classification/information retrieval, voice conversion, hidden Markov model-based speech synthesis, etc.  

In the music area, our research has been focused mainly on vocal melody extraction, query by singing/humming, solo vocal modeling, music tag annotation, tag-based music information retrieval (MIR), etc. The recent achievements include a novel cost-sensitive multi-label (CSML) learning framework for automatic music tagging, a novel query by multiple tags with multiple levels of preference (denoted as an MTML query) scenario and a corresponding tag cloud-based query interface for MIR. We have participated in the MIREX audio tag classification task since 2009 and achieved top performance. Our ongoing research includes continuous improving of our own technologies and systems, audio feature analysis, semantic visualization of music tags, and vocal separation, so as to facilitate the management and retrieval of a large music database. Our future research directions also include real-time music tagging, singing voice synthesis, and automatic music structure analysis/summarization.