Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies.
With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-omics data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its further analyzed results which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly and Variation at present. Moreover, CNSA has created a correlation model of living samples, sample information and analytical data on some projects. Both living samples and analytical data are directly correlated with the sample information. From either one, information or data of the other two can be obtained, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for storing, managing and sharing of omics data. We will continue to improve the data standards and provide free access to open-data resources for worldwide scientific communities to support academic research and the bio-industry.
Database URL: https://db.cngb.org/cnsa/.
Accurate and up-to-date data on the frequency of haemoglobinopathies among the populations of Guangxi Zhuang Autonomous Region, where haemoglobinopathies are most endemic in China, are required. In our study, a total of 5789 samples obtained from members of the Han, Zhang, and Yao ethnic groups in six geographical areas of Guangxi Province were analysed systematically in terms of both haematological and molecular parameters. The results presented that the total heterozygote frequency of thalassaemias and other haemoglobinopathies was 24.51%, of which 17.55% was due to alpha-thalassaemia, 6.43% to beta-thalassaemia, 0.38% to structural haemoglobin variants, and 0.16% to delta-thalassaemia. The mutational spectrum among the local population for each type of disorder was described, including the first report on the true prevalence of three silent alpha thalassemia defects, -alpha(3.7)/(4.78%), -alpha(4.2)/(1.61%) and Hb Westmead (alpha(WS)alpha/) (1.57%) and of delta-thalassemia resulting from five novel and two rare mutations never before identified in Chinese individuals. Comparison of the frequencies of alpha-globin mutations among the ethnic groups showed that there was a statistically significant difference between the Han (15.71%) and Zhuang (20.12%), and between the Han (15.71%) and Yao (20.84%) ethnic groups. In addition, we have performed the first extensive study of haematological parameters of the Hb Westmead mutation using a group of Chinese subjects with compound heterozygosity for this variant and an alpha-thalassaemia deletion. The knowledge gained in this study will enable us to estimate the health burden in this high-risk population and to elucidate the various genetic alterations that underlie haemoglobinopathies.
Key Points
The prevalence of KLF1 mutations is significantly higher in a thalassemia endemic region of China than in a nonendemic region. KLF1 mutations ameliorate the clinical and hematologic features of β-thalassemia.
Hemoglobinopathies are among the most common autosomal-recessive disorders worldwide. A comprehensive next-generation sequencing (NGS) test would greatly facilitate screening and diagnosis of these disorders. An NGS panel targeting the coding regions of hemoglobin genes and four modifier genes was designed. We validated the assay by using 2522 subjects affected with hemoglobinopathies and applied it to carrier testing in a cohort of 10,111 couples who were also screened through traditional methods. In the clinical genotyping analysis of 1182 β-thalassemia subjects, we identified a group of additional variants that can be used for accurate diagnosis. In the molecular screening analysis of the 10,111 couples, we detected 4180 individuals in total who carried 4840 mutant alleles, and identified 186 couples at risk of having affected offspring. 12.1% of the pathogenic or likely pathogenic variants identified by our NGS assay, which were undetectable by traditional methods. Compared with the traditional methods, our assay identified an additional at-risk 35 couples. We describe a comprehensive NGS-based test that offers advantages over the traditional screening/molecular testing methods. To our knowledge, this is among the first large-scale population study to systematically evaluate the application of an NGS technique in carrier screening and molecular diagnosis of hemoglobinopathies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.