Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the species over time within a geographical area, the impact of geography over the evolutionary history, or the expected population of the species within the area. Molecular sequence data from NCBI, specifically GenBank, provide an abundance of available sequence data for phylogeography. However, geographical data is inconsistently represented and sparse across GenBank entries. This can impede analysis and in situations where the geographical information is inferred, and potentially lead to erroneous results. In this paper, we describe the current state of geographical data in GenBank, and illustrate how automated processing techniques such as named entity recognition, can enhance the geographical data available for phylogeographic studies.
BackgroundInfluenza A H5N1 has killed millions of birds and raises serious public health concern because of its potential to spread to humans and cause a global pandemic. While the early focus was in Asia, recent evidence suggests that Egypt is a new epicenter for the disease. This includes characterization of a variant clade 2.2.1.1, which has been found almost exclusively in Egypt.We analyzed 226 HA and 92 NA sequences with an emphasis on the H5N1 2.2.1.1 strains in Egypt using a Bayesian discrete phylogeography approach. This allowed modeling of virus dispersion between Egyptian governorates including the most likely origin.ResultsPhylogeography models of hemagglutinin (HA) and neuraminidase (NA) suggest Ash Sharqiyah as the origin of virus spread, however the support is weak based on Kullback–Leibler values of 0.09 for HA and 0.01 for NA. Association Index (AI) values and Parsimony Scores (PS) were significant (p-value < 0.05), indicating that dispersion of H5N1 in Egypt was geographically structured. In addition, the Ash Sharqiyah to Al Gharbiyah and Al Fayyum to Al Qalyubiyah routes had the strongest statistical support.ConclusionWe found that the majority of routes with strong statistical support were in the heavily populated Delta region. In particular, the Al Qalyubiyah governorate appears to represent a popular location for virus transition as it represented a large portion of branches in both trees. However, there remains uncertainty about virus dispersion to and from this location and thus more research needs to be conducted in order to examine this.Phylogeography can highlight the drivers of H5N1 emergence and spread. This knowledge can be used to target public health efforts to reduce morbidity and mortality. For Egypt, future work should focus on using data about vaccination and live bird markets in phylogeography models to study their impact on H5N1 diffusion within the country.
The field of phylogeography has received a lot of attention for its application to molecular evolution and geographic migration of species. More recent work has included infectious diseases especially zoonotic RNA viruses like influenza and rabies. Phylogeography of viruses has the potential to advance surveillance at agencies such as public health departments, agriculture departments, and wildlife agencies. However, little is known about how these agencies could use phylogeography for applied surveillance and the integration of animal and human sequence data. Here, we highlight its potential to support ‘translational public health’ that could bring sequence data to the forefront of surveillance. We focus on swine influenza H3N2 because of the recent link to a variant form in humans. We discuss the implications to applied surveillance and the need for an integrated biomedical informatics approach for adoption at agencies of animal and public health.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.