Congratulations to Qian-Bei Hong for receiving the 2023 Excellent PhD Dissertation Award of Association for Computational Linguistics and Chinese Language Processing (ACLCLP).      Congratulations to Aleksandra Smolka for receiving the Best Paper Award at The 34th ROCLING Conference on Computational Linguistics and Speech Processing (ROCLING2022).      Congratulations to Yao-Fei Cheng and Fan-Lin Wang for receiving the Travel Grant of ISCA Interspeech2021.     

Our research interests include speech processing, natural language processing, multimedia information retrieval, machine learning, and pattern recognition. Our research goal is to develop methods for analyzing, extracting, recognizing, indexing, and retrieving information from audio data, with special emphasis on speech and music.

In the field of speech, research has been focused mainly on speaker recognition, spoken language recognition, voice conversion, and spoken document retrieval/summarization. Our recent achievements include locally linear embedding-based approaches for voice conversion and post-filtering, discriminative autoencoders for speech/speaker recognition, and novel paragraph embedding methods for spoken document retrieval/summarization. Our ongoing research includes audio-visual speaker recognition and speech enhancement, subspace neural networks for spoken language/dialect/accent recognition, many-to-one/non-parallel voice conversion, and neural network-based spoken document retrieval/summarization and question answering.  

In the music field, research has been focused mainly on vocal melody extraction and automatic generation of music video. Our recent achievements in this field include an acoustic-phonetic F0 modeling framework for vocal melody extraction and an emotion-oriented pseudo song prediction and matching framework for automatic music video generation. We have successfully implemented a complete automatic music video generation system that can automatically edit a long user-generated video into a music-compliant short professional-like video. Our ongoing research includes continuous improvement of our own technologies and systems, cover song identification, and automatic generation of set list for concert video, so as to facilitate the management and retrieval of a large music database. Future research directions also include singing voice synthesis, speech to singing voice conversion, and music structure analysis/summarization.