|
Our research interests include speech processing, natural language processing, multimedia
information retrieval, machine learning, and pattern recognition. Our research goal is to
develop methods for analyzing, extracting, recognizing, indexing, and retrieving information
from audio data, with special emphasis on speech and music.
In the field of speech, research has been focused mainly on speaker recognition, spoken language
recognition, voice conversion, and spoken document retrieval/summarization. Our recent achievements
include locally linear embedding-based approaches for voice conversion and post-filtering, discriminative
autoencoders for speech/speaker recognition, and novel paragraph embedding methods for spoken document
retrieval/summarization. Our ongoing research includes audio-visual speaker recognition and speech enhancement,
subspace neural networks for spoken language/dialect/accent recognition, many-to-one/non-parallel voice conversion,
and neural network-based spoken document retrieval/summarization and question answering.
In the music field, research has been focused mainly on vocal melody extraction and automatic generation of
music video. Our recent achievements in this field include an acoustic-phonetic F0 modeling framework for
vocal melody extraction and an emotion-oriented pseudo song prediction and matching framework for automatic
music video generation. We have successfully implemented a complete automatic music video generation system
that can automatically edit a long user-generated video into a music-compliant short professional-like video.
Our ongoing research includes continuous improvement of our own technologies and systems, cover song identification,
and automatic generation of set list for concert video, so as to facilitate the management and retrieval of a large
music database. Future research directions also include singing voice synthesis, speech to singing voice conversion,
and music structure analysis/summarization.