for their many contributions to the LIFE-M project. NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications.
This paper reviews the literature in historical record linkage in the United States and examines the performance of widely used record-linking algorithms and common variations in their assumptions. We use two high-quality, hand-linked data sets and one synthetic ground truth to examine the direct effects of linking algorithms on data quality. We find that (i) no algorithm (including hand linking) consistently produces representative samples; (ii) 15 to 37 percent of links chosen by widely used algorithms are classified as errors by trained human reviewers; and (iii) false links are systematically related to baseline sample characteristics, showing that some algorithms may introduce systematic measurement error into analyses. A case study shows that the combined effects of (i)–(iii) attenuate estimates of the intergenerational income elasticity by up to 29 percent, and common variations in algorithm assumptions result in greater attenuation. As current practice moves to automate linking and increase link rates, these results highlight the important potential consequences of linking errors on inferences with linked data. We conclude with constructive suggestions for reducing linking errors and directions for future research. (JEL C45, C81, J62, N31, N32)
Background: Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.Objectives: We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.Methods: First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse-and physician-authored common EHR note types.Results: Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.Discussion: Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
The MINiature Exoplanet Radial Velocity Array (MINERVA) is a dedicated observatory of four 0.7 m robotic telescopes fiber-fed to a KiwiSpec spectrograph. The MINERVA mission is to discover super-Earths in the habitable zones of nearby stars. This can be accomplished with MINERVA's unique combination of high precision and high cadence over long time periods. In this work, we detail changes to the MINERVA facility that have occurred since our previous paper. We then describe MINERVA's robotic control software, the process by which we perform 1D spectral extraction, and our forward modeling Doppler pipeline. In the process of improving our forward modeling procedure, we found that our spectrograph's intrinsic instrumental profile is stable for at least nine months. Because of that, we characterized our instrumental profile with a time-independent, cubic spline function based on the profile in the cross dispersion direction, with which we achieved a radial velocity precision similar to using a conventional "sum-of-Gaussians" instrumental profile: 1.8 m s −1 over 1.5 months on the RV standard star HD 122064. Therefore, we conclude that the instrumental profile need not be perfectly accurate as long as it is stable. In addition, we observed 51 Peg and our results are consistent with the literature, confirming our spectrograph and Doppler pipeline are producing accurate and precise radial velocities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.