Self-organizing maps (SOMs) are popular tools for grouping and visualizing data in many areas of science. This paper describes recent changes in package kohonen, implementing several different forms of SOMs. These changes are primarily focused on making the package more useable for large data sets. Memory consumption has decreased dramatically, amongst others, by replacing the old interface to the underlying compiled code by a new one relying on Rcpp. The batch SOM algorithm for training has been added in both sequential and parallel forms. A final important extension of the package's repertoire is the possibility to define and use data-dependent distance functions, extremely useful in cases where standard distances like the Euclidean distance are not appropriate. Several examples of possible applications are presented.
High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.
Quantitative trait loci (QTL) mapping approaches rely on the correct ordering of molecular markers along the chromosomes, which can be obtained from genetic linkage maps or a reference genome sequence. For apple (Malus domestica Borkh), the genome sequence v1 and v2 could not meet this need; therefore, a novel approach was devised to develop a dense genetic linkage map, providing the most reliable marker-loci order for the highest possible number of markers. The approach was based on four strategies: (i) the use of multiple full-sib families, (ii) the reduction of missing information through the use of HaploBlocks and alternative calling procedures for single-nucleotide polymorphism (SNP) markers, (iii) the construction of a single backcross-type data set including all families, and (iv) a two-step map generation procedure based on the sequential inclusion of markers. The map comprises 15 417 SNP markers, clustered in 3 K HaploBlock markers spanning 1 267 cM, with an average distance between adjacent markers of 0.37 cM and a maximum distance of 3.29 cM. Moreover, chromosome 5 was oriented according to its homoeologous chromosome 10. This map was useful to improve the apple genome sequence, design the Axiom Apple 480 K SNP array and perform multifamily-based QTL studies. Its collinearity with the genome sequences v1 and v3 are reported. To our knowledge, this is the shortest published SNP map in apple, while including the largest number of markers, families and individuals. This result validates our methodology, proving its value for the construction of integrated linkage maps for any outbreeding species.
In the context of the second framework partnership agreement between the National Institute for Public Health and the Environment of the Netherlands (RIVM) and the European Food Safety Authority (EFSA) acute cumulative dietary exposure assessments were performed for two cumulative assessment groups (CAGs) of pesticides that affect the nervous system: pesticides associated with brain and/or erythrocyte AChE inhibition (CAG-NAN, 47 pesticides) and pesticides associated with functional alterations of the motor division (CAG-NAM, 100 pesticides). The exposure assessments used pesticide monitoring data collected by Member States under their official monitoring programmes in 2014, 2015 and 2016 and individual food consumption data from ten populations of consumers from different countries and from different age groups. Exposure estimates were obtained for each group of pesticides by means of a 2-dimensional Monte Carlo simulation, which was implemented in the Monte Carlo Risk Assessment (MCRA) software. The scope of the assessment and the parameters to be used for cumulative exposure assessment were discussed and agreed by the Standing Committee on Plants, Animals, Food and Feed (SC PAFF). Based on those discussions, a very conservative tier I modelling approach and a refined, but still conservative tier II modelling approach were used. In these assessments, common risk assessment practice was followed and the cumulative exposure was calculated as total margin of exposure (MOET) at the 50 th , 90 th , 95 th , 99 th and 99.9 th percentiles of the exposure distribution. Five sensitivity analyses aiming to address major uncertainties were performed. The exposure estimates obtained in this report are used in EFSA's scientific report on the cumulative dietary risk characterisation of pesticides that have acute effects on the nervous system.
In the study of large outbred pedigrees with many founders, individual bi-allelic markers, such as SNP markers, carry little information. After phasing the marker genotypes, multi-allelic loci consisting of groups of closely linked markers can be identified, which are called “haploblocks”. Here, we describe PediHaplotyper, an R package capable of assigning consistent alleles to such haploblocks, allowing for missing and incorrect SNP data. These haploblock genotypes are much easier to interpret by the human investigator than the original SNP data and also allow more efficient QTL analyses that require less memory and computation time.Electronic supplementary materialThe online version of this article (doi:10.1007/s11032-016-0539-y) contains supplementary material, which is available to authorized users.
In the context of the second framework partnership agreement between the National Institute for Public Health and the Environment of the Netherlands (RIVM) and the European Food Safety Authority (EFSA) chronic cumulative exposure assessments were performed for two cumulative assessment groups (CAGs) of pesticides that affect the thyroid: pesticides associated with hypertrophy, hyperplasia and neoplasia of C-cells (TCP, 18 active substances) and pesticide associated with hypothyroidism (TCF, 124 active substances). The exposure assessments used monitoring data collected by Member States under their official pesticide monitoring programmes in 2014, 2015 and 2016 and individual food consumption data from ten populations of consumers from different countries and from different age groups. Exposure estimates were obtained for each group of pesticides by means of a 2-dimensional Monte Carlo simulation, which was implemented in the Monte Carlo Risk Assessment (MCRA) software. The scope of the assessment and the parameters to be used for cumulative exposure assessment were discussed and agreed by the Standing Committee on Plants, Animals, Food and Feed (SC PAFF). Based on those discussions, a very conservative tier I modelling approach and a refined, but still conservative tier II modelling approach were used. In these assessments, common risk assessment practice was followed and the cumulative exposure was calculated as the total margin of exposure (MOET) at the 50 th , 90 th , 95 th , 99 th and 99.9 th percentiles of the exposure distribution. Four sensitivity analyses were performed to better understand the uncertainties such as the replacements of non-detects in the monitoring data and the availability of processing factors. The exposure estimates obtained in this report are used in EFSA's scientific report on the cumulative dietary risk characterisation of pesticides that have chronic effects on the thyroid.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.