• SoVideo - A Mandarin Chinese broadcast news retrieval system. The system is based on technologies such as large vocabulary continuous speech recognition for Mandarin Chinese, automatic story segmentation, and information retrieval. Currently, the database consists of more than 400 hours of broadcast news, which yielded 10,343 stories by automatic story segmentation. (The online demonstration system is temporarily out of service. Please see the demonstration video.)

    demonstration video

  • AViTA - A TV news retrieval system that is developed based on automatic alignment of video and text.

  • SoMusic - A query-by-singing karaoke music retrieval system. The music database consists of 1071 songs. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to (i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively; and (ii) detect phrase onsets based on the Bayesian Information Criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass Dynamic Time Warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies.

    demonstration video

  • SoTag Web - We develop a novel content-based query-by-tag music search system for an untagged music database. We design a new tag query interface that allows users to input multiple tags with multiple levels of preference (denoted as an MTML query) by colorizing desired tags in a web-based tag cloud interface. When a user clicks and holds the left mouse button (or presses and holds his/her finger on a touch screen) on a desired tag, the color of the tag will change cyclically according to a color map (from dark blue to bright red), which represents the level of preference (from 0 to 1). In this way, the user can easily organize and check the query of multiple tags with multiple levels of preference through the colored tags. To effect MTML content-based music retrieval, we introduce three methods, namely "autotag", "fold-in", and "emsemble". The content-based music search system is implemented on the MajorMiner dataset, which consists of 2,472 10-second music clips. The labeled tags are blind in the music database. The system automatically index the music pieces only based on the audio content.

    The Novel Tag Colorizing Query Interface

  • SoTag Windows - A content-based music search system using query by multi-tags with multi-levels of preference. The demonstration system presents a novel content-based music search system that accepts a query containing multiple tags with multiple levels of preference (denoted as an MTML query) to search music from an untagged music database. We select a limited number of most frequently used music tags to form the tag space and design an interface for users to input queries by operating the scroll bars. To effect MTML content-based music retrieval, we introduce a tag-based music aspect model that jointly models the auditory features and tag labels of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag affinity-based matching, are incorporated into the pre-learned tag-based music aspect model. The content-based music search system is implemented on the MajorMiner dataset, which consists of 2,472 10-second music clips and their associated human labeled tags crawled from the MajorMiner website. The MTML query interface contains 36 top tags used in the dataset. We randomly select 1,648 music clips with their tag labels for training the tag-based music aspect model and 824 clips without using their tag labels for building the untagged music database for content-based retrieval.

    The Proposed MTML Query Interface

    The Flowchart of the SoTag System