“…For internal summarization, audio information is often utilized together with image features [14-15, 20, 26, 32, 34-35, 40-42], among which camera motion [8,12,16] and object motion [1,7,13,44,47] are frequently employed to model the significance of frames for summarization. Some work using text information overlaid to help with the video summarization is also reported [10].…”