2014
DOI: 10.1186/1751-0473-9-6
|View full text |Cite
|
Sign up to set email alerts
|

Identifying large sets of unrelated individuals and unrelated markers

Abstract: BackgroundGenetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while min… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(18 citation statements)
references
References 20 publications
0
18
0
Order By: Relevance
“…Principal components and ancestry were estimated by projecting all genotyped samples into the space of the principal components of the Human Genome Diversity Project reference panel using PLINK (938 unrelated individuals) [21, 22]. Pairwise kinship was assessed with the software KING [23], and the software fastindep was used to reduce the data to a maximal subset that contained no pairs of individuals with 3rd-or closer degree relationship [24]. We also removed patients not of recent European descent from the analysis, resulting in a final sample of 30,702 unrelated subjects.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Principal components and ancestry were estimated by projecting all genotyped samples into the space of the principal components of the Human Genome Diversity Project reference panel using PLINK (938 unrelated individuals) [21, 22]. Pairwise kinship was assessed with the software KING [23], and the software fastindep was used to reduce the data to a maximal subset that contained no pairs of individuals with 3rd-or closer degree relationship [24]. We also removed patients not of recent European descent from the analysis, resulting in a final sample of 30,702 unrelated subjects.…”
Section: Methodsmentioning
confidence: 99%
“…The UK Biobank phenome was used as a replication dataset and was based on ICD9 and ICD10 code data of 408,961 White British [15], genotyped individuals that were aggregated to PheWAS traits in a similar fashion (as described elsewhere [9]). To remove related individuals and to retain larger sample sizes, we first selected a maximal set of unrelated cases for each phenotype (defined as no pairwise relationship of 3 rd degree or closer [24, 29]) before selecting a maximal set of unrelated controls unrelated to these cases. Similar to MGI, we matched up to 10 controls to each case using the R package “MatchIt” [28].…”
Section: Methodsmentioning
confidence: 99%
“…We estimated r among individuals within each of the clusters identified at the uppermost hierarchical level using the TrioML estimator (Wang ) with 100 control samples and 1,000 bootstraps within COANCESTRY version 1.0.1.7 (Wang ). To retain the maximum number of individuals within each cluster, we identified an optimal set from a greedy heuristic in FastIndep (Abraham and Diaz ).…”
Section: Methodsmentioning
confidence: 99%
“…Similarly, heuristic approaches have been suggested to identify a maximal independent set of uncorrelated phenotypes among pairwise correlations between pairs of phenotypes . A popular method for identifying phenotypes is to aggregate ICD codes into a set of phenotype codes called “phecodes.” For example, using 1578 phecodes in MGI, we identified a maximal set of 981 phenotypes with no pairwise Pearson correlation above 0.1.…”
Section: Statistical Issues Related To Biobank Researchmentioning
confidence: 99%