“…Then, patches with a large probability are considered to have a small amount of information and are excluded from the input. IPS is simple and effective in that it can dynamically reduce inference cost depending on the input without requiring any policy model and complicated training , unlike the previous works [ 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 ]. Modeling MPEG4-encoded video has long been conducted for network traffic prediction [ 41 , 42 , 43 , 44 ], and recently, neural-network-based video recognition models directly taking MPEG4-encoded video as input were proposed [ 5 , 45 ].…”