Ressom, Habtom, Robert Reynolds, and Rency S. Varghese. Increasing the efficiency of fuzzy logic-based gene expression data analysis. Physiol Genomics 13: 107-117, 2003. First published February 20, 2003 10.1152/ physiolgenomics.00097.2002-DNA microarray technology can accommodate a multifaceted analysis of the expression of genes in an organism. The wealth of spatiotemporal data generated by this technology allows researchers to potentially reverse engineer a particular genetic network. "Fuzzy logic" has been proposed as a method to analyze the relationships between genes and help decipher a genetic network. This method can identify interacting genes that fit a known "fuzzy" model of gene interaction by testing all combinations of gene expression profiles. This paper introduces improvements made over previous fuzzy gene regulatory models in terms of computation time and robustness to noise. Improvement in computation time is achieved by using a cluster analysis as a preprocessing method to reduce the total number of gene combinations analyzed. This approach speeds up the algorithm by a factor of 50% with minimal effect on the results. The model's sensitivity to noise is reduced by implementing appropriate methods of "fuzzy rule aggregation" and "conjunction" that produce reliable results in the face of minor changes in model input. microarray; clustering; gene regulatory model THROUGH THE USE OF DNA MICROARRAY technology, time series data for gene expression can easily be prepared on the genome-wide scale, allowing the transcription levels of many genes to be measured simultaneously. With such data, one can attempt to reverse engineer a network of gene interaction. The benefits of characterizing gene interaction are many, for example, the effects of drugs on a regulatory pathway can be characterized; tumor development in cells can be tracked, etc. However, there are many problems that impede understanding gene interaction, such as identifying synchronization, working with low expression levels, the existence of splice variants, etc., of which two are considered here. First, we consider the significant amount of time that is required because of the large volume of data; the complexity of a regulatory network model increases as the number of genes used in the model increases, and as a result this may take a long time to examine. Second, we discuss the fact that any attempt to model or analyze DNA microarray data is likely to be affected by measurement error.Several methods have been proposed to develop maps of gene interaction, including linear equations (6, 31), differential equations (2), Boolean networks (15, 21), and fuzzy cognitive maps (7). Woolf and Wang (33) introduced an approach based on fuzzy rules of known activator/repressor relationships of gene interaction. Using a normalized subset of Saccharomyces cerevisiae data (3), they applied every possible combination of activators and repressors for each gene. The output from the model was compared with the expression levels of the genes. Gene combination...
I investigate Russian second language readability assessment using a machine-learning approach with a range of lexical, morphological, syntactic, and discourse features. Testing the model with a new collection of Russian L2 readability corpora achieves an F-score of 0.671 and adjacent accuracy 0.919 on a 6-level classification task. Information gain and feature subset evaluation shows that morphological features are collectively the most informative. Learning curves for binary classifiers reveal that fewer training data are needed to distinguish between beginning reading levels than are needed to distinguish between intermediate reading levels.
In 1985, Zwicky argued that 'particle' is a pretheoretical notion that should be eliminated from linguistic analysis. We propose a reclassification of Russian particles that implements Zwicky's directive. Russian particles lack a coherent conceptual basis as a category and many are ambiguous with respect to part of speech. Our corpus analysis of Russian particles addresses theoretical questions about the cognitive status of parts of speech and practical concerns about how particles should be represented in computational models. We focus on nine high-frequency words commonly classed as particles: ešče, tak, ved ', slovno, daže, že, li, da, net. We show that the current tagging of particles in the manually disambiguated Morphological Standard of the Russian National Corpus is not entirely consistent, and that this can create challenges for training a part-of-speech tagger. We offer an alternative tagging scheme that eliminates the category of 'particle' altogether. We show that our enriched scheme makes it possible for a part-of-speech tagger to achieve more useful results. Our analysis of particles provides a detailed account of various sub-uses that correspond to different parts of speech, their relationships, and relative distribution. In this sense, our study also contributes to the study of words that exhibit part-of-speech ambiguities.Аннотация В работе 1985 года Цвикки утверждал, что 'частица'-это до-теоре-тическое понятие, которое нужно исключить из лингвистического анализа. Следуя установке Цвикки, мы предлагаем пересмотреть традиционный подход к русским ча-стицам и перераспределить соответствующие слова по другим частеречным классам. A. Endresen et al. Ясные содержательные основания для выделения русских частиц как отдельной кате-гории отсутствуют, частеречная принадлежность многих частиц неоднозначна. В на-шем корпусном исследовании рассмотрены теоретические вопросы о когнитивном статусе частей речи, а также практические сложности, связанные с представлени-ем частиц в компьютерных моделях обработки данных. В центре внимания девять высокочастотных слов, традиционно определяемых как частицы : еще, так, ведь, слов-но, даже, же, ли, да, нет. В статье показано, что существующая система частереч-ной разметки, принятая в Морфологическом стандарте Национального корпуса рус-ского языка (тексты со снятой омонимией), недостаточно последовательна и что это может создать проблемы при обучении частеречного анализатора. В статье предло-жена альтернативная система разметки, в которой категория 'частиц' как отдельной части речи полностью устранена. Благодаря этой улучшенной системе разметки ча-стеречный анализатор может функционировать более успешно. В статье представлен подробный анализ девяти 'частиц' с разбором основных подтипов их употреблений, которые соответствуют различным частям речи, также обсуждаются взаимосвязи вы-деленных подтипов и их распределение в использованной выборке примеров. В этом отношении, данное исследование вносит вклад в изучение слов с неоднозначной ча-стеречной принадлежностью.
In this paper, we apply usage-based linguistic analysis to systematize the inventory of orthographic errors observed in the writing of non-native users of Russian. The data comes from a longitudinal corpus (560K tokens) of non-native academic writing. Traditional spellcheckers mark errors and suggest corrections, but do not attempt to model why errors are made. Our approach makes it possible to recognize not only the errors themselves, but also the conceptual causes of these errors, which lie in misunderstandings of Russian phonotactics and morphophonology and the way they are represented by orthographic conventions. With this linguistically-based system in place, we can propose targeted grammar explanations that improve users’ command of Russian morphophonology rather than merely correcting errors. Based on errors attested in the non-native academic writing corpus, we introduce a taxonomy of errors, organized by pedagogical domains. Then, on the basis of this taxonomy, we create a set of mal-rules to expand an existing finite-state analyzer of Russian. The resulting morphological analyzer tags wordforms that fit our taxonomy with specific error tags. For each error tag, we also develop an accompanying grammar explanation to help users understand why and how to correct the diagnosed errors. Using our augmented analyzer, we build a webapp to allow users to type or paste a text and receive detailed feedback and correction on common Russian morphophonological and orthographic errors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.