Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Temperature and precipitation time series datasets from 1961 to 2005 at 65 meteorological stations were used to reveal the spatial and temporal trends of climate change in Xinjiang, China. Annual and seasonal mean air temperature and total precipitation were analyzed using Mann-Kendall (MK) test, inverse distance weighted (IDW) interpolation, and R/S methods. The results indicate that: (1) both temperature and precipitation increased in the past 45 years, but the increase in temperature is more obvious than that of precipitation;(2) for temperature increase, the higher the latitude and the higher the elevation the faster the increase, though the latitude has greater influence on the increase. Northern Xinjiang shows a faster warming than southern Xinjiang, especially in summer; (3) increase of precipitation occurs mainly in winter in northern Xinjiang and in summer in southern Xinjiang. Ili, which has the most precipitation in Xinjiang, shows a weak increase of precipitation; (4) although both temperature and precipitation increased in general, the increase is different inside Xinjiang; (5) Hurst index (H) analysis indicates that climate change will continue the current trends.
We fine-tune GPT-3 to answer long-form questions using a text-based webbrowsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.
The Palmer Drought Severity Index (PDSI) can lead to controversial results in assessing droughts responding to global warming. Here we assess recent changes in the droughts over China (1961–2013) using the PDSI with two different estimates, i.e., the Thornthwaite (PDSI_th) and Penman‐Monteith (PDSI_pm) approaches. We found that droughts have become more severe in the PDSI_th but slightly lessened in the PDSI_pm estimate. To quantify and interpret the different responses in the PDSI_th and PDSI_pm, we designed numerical experiments and found that drying trend of the PDSI_th responding to the warming alone is 3.4 times higher than that of the PDSI_pm, and the latter was further compensated by decreases in wind speed and solar radiation causing the slightly wetting in the PDSI_pm. Interestingly, we found that interbasin difference in the PDSI_th and PDSI_pm responses to the warming alone tends to be larger in warmer basins, exponentially depending on mean temperature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.