Elena Filatova scite author profile

Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units with an associated mapping between these two dimensions. This representation is then used to describe the task of selecting textual units for a summary or answer as a formal optimization task. We provide approximation algorithms and empirically validate the performance of the proposed model when used with two very different sets of features, words and atomic events.

show abstract

Assigning time-stamps to event-clauses

Filatova

Hovy

2001

View full text Add to dashboard Cite

show abstract

Automatic creation of domain templates

Filatova¹,

Hatzivassiloglou²,

McKeown³

2006

View full text Add to dashboard Cite

Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning techniques to mine Information Extraction (IE) patterns for capturing information from the input text. Currently, to mine IE patterns one should know in advance the type of the information that should be captured by these patterns. In this work we propose a novel methodology for corpus analysis based on cross-examination of several document collections representing different instances of the same domain. We show that this methodology can be used for automatic domain template creation. As the problem of automatic domain template creation is rather new, there is no well-defined procedure for the evaluation of the domain template quality. Thus, we propose a methodology for identifying what information should be present in the template. Using this information we evaluate the automatically created domain templates through the text snippets retrieved according to the created templates.

show abstract

Directions for exploiting asymmetries in multilingual Wikipedia

Filatova¹

2009

View full text Add to dashboard Cite

Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for training and testing NLP applications. In this paper we present preliminary results on quantifying Wikipedia multilinguality. Our results support the observation about the substantial variation in descriptions of Wikipedia entries created in different languages. However, we believe that asymmetries in multilingual Wikipedia do not make Wikipedia an undesirable corpus for NLP applications training. On the contrary, we outline research directions that can utilize multilingual Wikipedia asymmetries to bridge the communication gaps in multilingual societies.

show abstract

Marking atomic events in sets of related texts

Filatova¹,

Hatzivassiloglou²

2004

View full text Add to dashboard Cite

The notion of an event has been widely used in the computational linguistics literature as well as in information retrieval and various NLP applications, although with significant variance in what exactly an event is. We describe an empirical study aimed at developing an operational definition of an event at the atomic (sentence or predicate) level, and use our observations to create a system for detecting and prioritizing the atomic events described in a collection of documents. We report results from testing our system on several sets of related texts, including human assessments of the system's output and a comparison with information extraction techniques.

show abstract

Occupation inference through detection and classification of biographical activities

Filatova

Prager

2012

Data & Knowledge Engineering

View full text Add to dashboard Cite

Prevalens, viral load and types diversity of high-risk HPV in patients with infl ammatory and tumor diseases

Зыкова¹,

Неродо²,

Bogomolova³

et al. 2018

Med. vestn. Ûga Ross.

View full text Add to dashboard Cite

Ростовский научно-исследовательский онкологический институт, Ростов-на-Дону, Россия Цель: анализ частоты распространения и типовой структуры вируса папилломы человека (ВПЧ) высокого онкогенного риска в зависимости от пола, возраста, наличия онкологической патологии. Материал и методы: обследованы 424 пациента клинико-диагностического отделения ФГБУ «РНИОИ» МЗ РФ. Исследовали мазки из влагалища и цервикального канала у женщин, мазки из уретры и/или мочу у мужчин. Для определения ДНК ВПЧ применяли метод ПЦР в реальном времени. Результаты: удельный вес ВПЧ-позитивных среди женщин составил 34,4%, среди мужчин-39,9%. У женщин в старших возрастных группа доля ВПЧ-позитивных снижалась, у мужчин нарастала. В возрасте до 25 и после 45 лет папилломавирусная инфекция (ПВИ) чаще регистрировалась у женщин, в возрасте 26-45 лет-у мужчин. Сочетание нескольких типов ВПЧ чаще регистрировали у молодых. Наиболее распространенным был 16-й тип ВПЧ у женщин и мужчин. Последующие ранговые места распределялись следующим образом: у женщин далее следовали 31-й, 52-й, 18-й, 56-й типы, у мужчин-52-й, 56-й, 45-й,18-й тип, 50-й1 тип был выявлен только у женщин. ПВИ среди больных с опухолевыми процессами регистрировалась в 1,9 раза чаще, чем с воспалительными. При опухолевых процессах у женщин преобладала высокая вирусная нагрузка, при воспалительных заболеваниях нагрузка с разной степенью клинической значимости встречалась одинаково часто. Сочетание одновременного инфицирования ВПЧ и возбудителями ИППП у женщин с опухолевыми заболеваниями составило 70,6 % от общего числа ИППП-позитивных, с воспалительными заболеваниями ПВИ 41,5%. У мужчин эти показатели составили 66,7 % и 38,1 % соответственно. Заключение: Проведенные исследования позволили установить различия в частоте распространения ПВИ в зависимости от пола, возраста, наличия онкологической патологии. Ключевые слова: вирус папилломы человека, ПЦР в реальном времени, распространенность ВПЧ, генотип, вирусная нагрузка.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elena Filatova

Demographics and Dynamics of Mechanical Turk Workers

A formal model for information selection in multi-sentence text extraction

Assigning time-stamps to event-clauses

Automatic creation of domain templates

Directions for exploiting asymmetries in multilingual Wikipedia

Marking atomic events in sets of related texts

Occupation inference through detection and classification of biographical activities

Prevalens, viral load and types diversity of high-risk HPV in patients with infl ammatory and tumor diseases

Contact Info

Product

Resources

About