Abstract. The Carnegie–Ames–Stanford Approach (CASA) model is widely used to estimate vegetation net primary productivity (NPP) at regional scales. However, the CASA is still driven by multisource data, e.g. satellite remote sensing (RS) data, and ground observations that are time-consuming to obtain. RS data can conveniently provide real-time regional information and may replace ground observation data to drive the CASA model. We attempted to improve the CASA model in this study using the Moderate Resolution Imaging Spectroradiometer (MODIS) RS products, the GlobeLand30 RS product, and the digital elevation model data derived from radar RS. We applied it to simulate the NPP of alpine grasslands in the Qinghai Lake basin, which is located in the northeastern Qinghai–Tibetan Plateau, China. The accuracy of the RS-data-driven CASA, with a mean absolute percent error (MAPE) of 22.14 % and root mean square error (RMSE) of 26.36 g C m−2 per month, was higher than that of the multisource-data-driven CASA, with a MAPE of 44.80 % and RMSE of 57.43 g C m−2 per month. The NPP simulated by the RS-data-driven CASA in July 2020 shows an average value of 108.01 ± 26.31 g C m−2 per month, which is similar to published results and comparable with the measured NPP. The results of this work indicate that simulating alpine grassland NPP with satellite RS data rather than ground observations is feasible. We may provide a workable reference for rapid simulation of grassland NPP to satisfy the requirements of accounting carbon stocks and other applications.
Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the groundtruth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.