The 1.69 ms spin period of PSR J1227−4853 was recently discovered in radio observations of the low-mass X-ray binary XSS J12270−4859 following the announcement of a possible transition to a rotation-powered millisecond pulsar state, inferred from decreases in optical, X-ray, and gamma-ray flux from the source. We report the detection of significant (5σ) gamma-ray pulsations after the transition, at the known spin period, using ∼1 year of data from the Large Area Telescope on board the Fermi Gamma-ray Space Telescope. The gamma-ray light curve of PSR J1227−4853 can be fit by one broad peak, which occurs at nearly the same phase as the main peak in the 1.4 GHz radio profile. The partial alignment of light-curve peaks in different wavebands suggests that at least some of the radio emission may originate at high altitude in the pulsar magnetosphere, in extended regions co-located with the gamma-ray emission site. We folded the LAT data at the orbital period, both pre-and post-transition, but find no
We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.