Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms’ accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario.
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era. TIP_finder version 1.0 is available at https://github.com/simonorozcoarias/TIP_finder.
El presente artículo tiene como objetivo modelar las variables de oferta, demanda y stock del mercado del café verde colombiano comercializado como commodity. A partir de estas variables se formula un modelo de sistemas dinámicos usando ecuaciones diferenciales ordinarias, las cuales permiten observar el comportamiento del mercado de este producto. El análisis realizado tiene en cuenta algunos parámetros, como las tasas de exportación e importación, estimados a partir de datos históricos sobre el mercado del café reportados por las entidades oficiales. En el documento se muestra la formulación del modelo, seguido por el cálculo de los puntos de equilibrio y su estabilidad. Por último, se presenta la simulación de las variables de estado.
In Colombia the agricultural sector has difficulties in the integration of technologies, due to the difficulties of the topography, very characteristic in the area of Colombian mountain ranges that pass through the main agricultural departments of the country and the limitations of human capacity; the citrus harvesting process has traditionally been done by hand, employing thousands of people who do not achieve significant yields; , thus causing an increase in production costs with an impact on harvest indicators, measured in terms of quality and productivity. This study aims to determine the impact of field slope conditions on quality indicators and indicators of effectiveness, efficiency, and loss indictors, used to evaluate the productivity of the orange harvesting process in a case study in the Department of Caldas-Colombia in order to identify opportunities for process improvement. Field information was made on orange-producing farms with different land slopes classified into four categories. Statistically significant partnerships were identified between the efficiency, efficiency and loss indicators and the field slope conditions. In addition, some of these indicators showed inverse relationships to the slope gradient. On the contrary, the quality of the fruit is not affected by the slope conditions of the land.
El objetivo de este artículo es realizar una aproximación desde una revisión sistemática de la literatura sobre los determinantes del crecimiento económico desde un enfoque de ciclo virtuoso kaldoriano enfocado en los países fundadores de la Alianza del Pacífico y la Asociación de Naciones del Sudeste Asiático a través de bases de datos como Web of Science y Scopus. Se han identificado dos vacíos de conocimiento. El primero corresponde a la falta de estudios que analicen comparativamente los determinantes kaldorianos del crecimiento económico virtuoso; el segundo está relacionado con la escasez de publicaciones relacionadas con la identificación de la convergencia/divergencia entre países y bloques comerciales, que ningún otro trabajo ha estudiado todavía. Como apartado complementario, se realiza un análisis exploratorio de conglomerados el cual reveló que aquellos formados desde 1990 hasta 2018 se han mantenido.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.