Abstract-This paper discusses an information theoretic approach of designing sparse kernel adaptive filters. To determine useful data to be learned and remove redundant ones, a subjective information measure called surprise is introduced. Surprise captures the amount of information a datum contains which is transferable to a learning system. Based on this concept, we propose a systematic sparsification scheme, which can drastically reduce the time and space complexity without harming the performance of kernel adaptive filters. Nonlinear regression, short term chaotic time-series prediction, and long term time-series forecasting examples are presented.