The objective of this work is to provide a simple and yet efficient tool to detect human faces in video sequences. This information can be very useful for many applications such as video indexing and video browsing. In particular the paper focuses on the significant improvements made to our face detection algorithm presented by Albiol, Bouman and Delp (see IEEE Int. Conference on Image Processing, Kobe, Japan, 1999). Specifically, a novel approach to retrieve skin-like homogeneous regions is presented, which is later used to retrieve face images. Good results have been obtained for a large variety of video sequences
The objective of this work is to provide a simple and yet efficient tool to detect human faces in video sequences. This information can be very useful for many applications such as video indexing and video browsing. In particular the paper focuses on the significant improvements made to our face detection algorithm presented by Albiol, Bouman and Delp (see IEEE Int. Conference on Image Processing, Kobe, Japan, 1999). Specifically, a novel approach to retrieve skin-like homogeneous regions is presented, which is later used to retrieve face images. Good results have been obtained for a large variety of video sequences
In this paper, we describe a framework of analyzing programs belonging to different TV program genres Hidden Markov Models and pseudo-semantic feature s derived from video shots. Clustering using Gaussian mixture models is used to determine the order of the modes. Results for initial genre classification experiments using two simple features derived from video shots are given.
With the increased need of data sharing among multiple organizations like government organizations, financial corporations, medical hospitals and academic institutions, it is critical to ensure data integrity so that effective decisions can be made based on these data. An important component of any solution for assessing data integrity is represented by techniques and tools to evaluate the trustworthiness of data provenance. However, few efforts have been devoted to investigate approaches for assessing how trusted the data are, based in turn on an assessment of the data sources and intermediaries. To bridge this gap, we propose a data provenance trust model. Our trust model takes into account various factors that may affect the trustworthiness and, based on these factors, assigns trust scores to both data and data providers. Such trust scores represent key information based on which data users may decide whether to use the data and for which purposes.
The objective of this work is the integration and optimization of an automatic face detection and recognition system for video indexing applications. The system is composed of a face detection stage presented previously which provides good results while maintaining a low computational cost (see Albiol, A. et al., Proc. IEEE Int. Conf. on Image Proc., vol.2, p.239-42, 2000). The recognition stage is based on the principal component analysis (PCA) approach which has been modified to cope with the video indexing application. After the integration of the two stages, several improvements are proposed which increase the face detection and recognition rate and the overall performance of the system. Good results have been obtained using the MPEG-7 video content set used in the MPEG-7 evaluation group.
We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.
Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in evaluation of automatically generated video summaries.
We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories (see Albiol, A. et al., 3rd Int. Conf. on Audio and Video-based Biometric Person Authentication, p.366-71, 2001). We now present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient, algorithm, based on the information contained in consecutive frames. Results are provided which prove the validity of the approach
In this paper, we focus on the leaky prediction based scalable coding (LPSC) structure and present a general framework for LPSC. We demonstrate the similarity between LPSC and motion compensation based multiple description coding scheme. We show that since the information contained in the enhancement layer in LPSC is actually a mismatch between two descriptions for each frame, it cannot be guaranteed that the enhancement layer always achieves superior reconstruction quality beyond that achieved by the base layer. We derive three reconstructions for each frame under the LPSC framework, and propose a maximum-likelihood (ML) estimation scheme for LPSC video reconstruction at the decoder. This generally achieves superior decoded video quality than both the enhancement layer and the base layer.