Wedding is one of the most important ceremonies in our lives. It symbolizes the birth and creation of a new family. In this paper, we present a system for automatically segmenting a wedding ceremony video into a sequence of recognized wedding events, e.g., the couple's wedding kiss. Our goal is to develop an automatic tool for users to efficiently organize, search, and retrieve his/her treasured wedding memories. Furthermore, the event descriptions could benefit and complement the current research in semantic video understanding. Technically, three kinds of event features, i.e., the speech/music discriminator, flashlight detector, and bride indicator, are exploited to build statistical models for each wedding event. Events are then recognized by a hidden Markov model, which takes into account both the fitness of observed features and the temporal rationality of event ordering to improve the segmentation accuracy. We conducted experiments on a rich set of wedding videos, and the results demonstrate the effectiveness of our approach.
Rushes are the raw materials used to produce a video. They often contain redundant and repetitive contents. Rushes summarization aims to provide a quick overview for a rushes video. As part of TRECVID 2007, NIST initiates a rushes summarization task. This paper reports on the design of NTU rushes summarization system for this task. Our system consists of three components, shot segmentation, redundant shot detection and summary creation. To tackle the bulky rushes, we focus on efficient but effective feature representations (local color histograms and compressed-domain motion vectors) and summarization methods. In addition, we proposed a novel approach to detect clapper shots which are not only relevant to concise summarizes but also essential for indexing numerous camera takes in the rushes. Even practically efficient and requiring only 40% of the video time for computation, the proposed system achieved satisfying results in TRECVID 2007 rushes summarization task.
This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the likelihood that a major punctuation mark (MPM) can be inserted at a word boundary. The second feature is the quotation confidence (QC), which measures the likelihood that a word string is quoted as a meaningful or emphasized unit. The proposed PC and QC features are influenced by the properties of automatic Chinese punctuation generation and linguistic characteristic of the Chinese punctuation system. Because MPMs are highly correlated with prosodicacoustic features and quoted word strings serve crucial roles in human language understanding, the two features could potentially provide useful information for prosody generation. This idea was realized by employing conditional random-field-based models for predicting MPMs, quoted word string locations, and their associated confidences-that is, PC and QC-for each word boundary. The predicted punctuations and their confidences were then combined with traditional linguistic features to predict prosodic-acoustic features for performing speech synthesis using multilayer perceptrons. Both objective and subjective tests demonstrated that the prosody generated with the proposed linguistic features was superior to that generated without the proposed features. Therefore, the proposed PC and QC are identified as promising features for Mandarin prosody generation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.