Masaki Naito scite author profile

Masaki Naito

17Publications

30Citation Statements Received

23Citation Statements Given

How they've been cited

How they cite others

Affiliations

KDDI Research (Japan), Advanced Telecommunications Research Institute International, KDDI (Japan)

Publications

Order By: Most citations

SVM-Based Shot Boundary Detection with a Novel Feature

Matsumoto

Naito

Hoashi

et al. 2006

View full text Add to dashboard Cite

This paper describes our new algorithm for shot boundary detection and its evaluation. We adopt a 2-stage data fusion approach with SVM technique to decide whether a boundary exists or not within a given video sequence. This approach is useful to avoid huge feature space problems, even when we adopt many promising features extracted from a video sequence. We also introduce a novel feature to improve detection. The feature consists of two kinds of values extracted from a local frame sequence. One is the image difference between the target frame and that synthesized from the neighbors. The other is the difference between neighbors. This feature can be extracted quickly with a least-square technique. Evaluation of our algorithm is conducted with the TRECVID evaluation framework. Our system obtained a high performance at a shot boundary detection task in TRECVID2005.

show abstract

Robust speech detection method for telephone speech recognition system

Kuroiwa¹,

Naito²,

Yamamoto³

et al. 1999

Speech Communication

View full text Add to dashboard Cite

Model‐based speaker normalization methods for speech recognition

Naito

Deng

Sagisaka

2003

Electron Comm Jpn Pt II

View full text Add to dashboard Cite

SUMMARYA speaker normalization method using a speech generation model is proposed in order to achieve high-performance speaker adaptation with a small amount of adaptation data. The speaker-and phoneme-dependent vocal tract area function is approximated by the corresponding area function produced by the articulatory model of a standard speaker, combined with phoneme-independent feature quantities of the vocal-tract shape of the normalized target speaker as estimated from the formant frequencies of two vowels. The frequency warping functions are determined from the formant frequencies of speech calculated from the vocal-tract area functions thus obtained, and normalization of the uttered speech is performed by stretching the speech spectrum in the frequency-axis direction. Continuous phoneme recognition experiments using phoneme connection rules show that the recognition error using a gender-dependent model is reduced by about 30% in the proposed method and that recognition performance superior to that of vocal-tract length normalization is obtained. The recognition performance of the proposed method is also equivalent to that of speaker adaptation by moving vector field smoothing (VFS) using 10 phonetically balanced sentences, showing that high-performance speaker adaptation using a small amount of adaptation data can be achieved by the proposed method.

show abstract

Video Story Segmentation and Its Application to Personal Video Recorders

Hoashi

Sugano

Naito

et al. 2005

View full text Add to dashboard Cite

Speaker clustering for speech recognition using vocal tract parameters

Naito

Deng

Sagisaka

2002

Speech Communication

View full text Add to dashboard Cite

Camera Motion Detection using Video Mosaicing

Naito

Matsumoto

Hoashi

et al. 2006

View full text Add to dashboard Cite

In this paper, camera motion detection methods using a background image generated by video mosaicing based on the correlation between feature points on a frame pair are described. In this method, a telop (video caption) removal method, iterative foreground and background image separation method and appropriate frame pair selection from consecutive frames are introduced to generate background images accurately. Parameters indicating the location of each frame on the background image are retrieved and used to detect the camera motion. Except for the simple threshold-based method, a method using Hidden Markov models (HMMs) is introduced to detect variable length camera motion based on the maximum likelihood criterion. The effectiveness of the proposed method is evaluated by using a TRECVID 2005 low-level feature extraction task [1].

show abstract

A comparative study on acoustic and linguistic characteristics using speech from human-to-human and human-to-machine conversations

Takezawa¹,

Sugaya²,

Naito³

et al. 2000

View full text Add to dashboard Cite

Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords

Yamada¹,

Kato²,

Naito³

et al. 2005

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Masaki Naito

SVM-Based Shot Boundary Detection with a Novel Feature

Robust speech detection method for telephone speech recognition system

Model‐based speaker normalization methods for speech recognition

Video Story Segmentation and Its Application to Personal Video Recorders

Speaker clustering for speech recognition using vocal tract parameters

Camera Motion Detection using Video Mosaicing

A comparative study on acoustic and linguistic characteristics using speech from human-to-human and human-to-machine conversations

Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords

Contact Info

Product

Resources

About