Abstract:In this paper, a method of event ordering based on temporal information resolution is presented. This method consists of two main steps: on the one hand, the recognition and resolution of the temporal expressions that can be transformed on a date, and therefore these dates establish an order between the events that contain them. On the other hand, the detection of temporal signals, for example after, that can not be transformed on a concrete date but relate two events in a chronological way. This event orderin… Show more
“…A good amount of successful research has been accomplished in temporal expression annotation of several languages, for example English [Mani et al 2001], Italian [Negri and Marseglia 2004], Spanish [Saquete et al 2006], German [Strötgen and Gertz 2013] and Chinese [Hacioglu et al 2005]. Starting from the work on TIMEX at the Message Understanding Conference [MUC-7 1998] through recent approaches like Time Aware Information Access (TAIA) [Shokouhi 2012], several temporal annotation systems have been developed and deployed.…”
Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed. First, a rule-based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach performs with a strict F1-measure of 0.83. In another approach, a CRF-based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the time expressions from plain text and further classifies them to various classes. This approach performs with a strict F1-measure of 0.78. Next, the CRF is replaced by an SVM-based classifier and the same experiment is performed with the same features. This approach is shown to be comparable to the CRF and performs with a strict F1-measure of 0.77. Using the rule base information as an additional feature enhances the performances to 0.86 and 0.84 for the CRF and SVM respectively. With three different comparable systems performing the extraction task, merging them to take advantage of their positives is the next step. As the first merge experiment, rule-based tagged data is fed to the CRF and SVM classifiers as additional training data. Evaluation results report an increase in F1-measure of the CRF from 0.78 to 0.8. Second, a votingbased approach is implemented, which chooses the best class for each token from the outputs of the three approaches. This approach results in the best performance for this task with a strict F1-measure of 0.88. In this process a reusable gold standard dataset for temporal tagging in Hindi is also developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.
“…A good amount of successful research has been accomplished in temporal expression annotation of several languages, for example English [Mani et al 2001], Italian [Negri and Marseglia 2004], Spanish [Saquete et al 2006], German [Strötgen and Gertz 2013] and Chinese [Hacioglu et al 2005]. Starting from the work on TIMEX at the Message Understanding Conference [MUC-7 1998] through recent approaches like Time Aware Information Access (TAIA) [Shokouhi 2012], several temporal annotation systems have been developed and deployed.…”
Temporal annotation of plain text is considered a useful component of modern information retrieval tasks. In this work, different approaches for identification and classification of temporal expressions in Hindi are developed and analyzed. First, a rule-based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach performs with a strict F1-measure of 0.83. In another approach, a CRF-based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the time expressions from plain text and further classifies them to various classes. This approach performs with a strict F1-measure of 0.78. Next, the CRF is replaced by an SVM-based classifier and the same experiment is performed with the same features. This approach is shown to be comparable to the CRF and performs with a strict F1-measure of 0.77. Using the rule base information as an additional feature enhances the performances to 0.86 and 0.84 for the CRF and SVM respectively. With three different comparable systems performing the extraction task, merging them to take advantage of their positives is the next step. As the first merge experiment, rule-based tagged data is fed to the CRF and SVM classifiers as additional training data. Evaluation results report an increase in F1-measure of the CRF from 0.78 to 0.8. Second, a votingbased approach is implemented, which chooses the best class for each token from the outputs of the three approaches. This approach results in the best performance for this task with a strict F1-measure of 0.88. In this process a reusable gold standard dataset for temporal tagging in Hindi is also developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.
“…1 This evaluation has been repeated since then as part of the Automatic Content Extraction (ACE) program. 2 The most important automatic annotation systems 3 using TIDES as output are: ATEL (Hacioglu et al 2005), a system developed by Center for Computational Language and Education Research at the University of Colorado, implementing a machine learning approach for identification in English and Chinese; Chronos (Negri 2007), a knowledge-based system developed by Fondazione Bruno Kessler (FBK-irst), which is able to recognize and normalize temporal expressions in developed by the MITRE Corporation, which combines hand-coded patterns with machine learning rules to tag documents; DANTE (Mazur and Dale 2007), developed at the Center for Language Technology at Macquarie University, a system which performs recognition and normalization of temporal expressions in English, where the interface between various components is based on representing the local semantics of temporal expressions; TimexTag (Ahn et al 2005;Ahn 2006), developed at the University of Amsterdam, a system applying data-driven methods for recognition and normalization tasks; and finally, TERSEO (Saquete et al 2006), a system developed at the University of Alicante, which is a knowledge based system for Spanish that has been automatically extended to other languages, such as English, Italian and Catalan.…”
Section: Previous Workmentioning
confidence: 99%
“…For this task, any system which recognizes and normalizes temporal expressions could be used. In our approach, TERSEO system is used because this system recognizes and normalizes (Saquete et al 2006). After computing, if the result is not a complete ISO-format date, then temporalFunction=''true'' must be added to the TIMEX3 tag.…”
Section: The Meeting Has Beenmentioning
confidence: 99%
“…Some systems use rule-based approaches (Saquete et al 2006), whereas others are using machine learning approaches (Saquete et al 2008;TempEx 2008;Gerber et al 2002). Choosing the appropriate approach depends on the available resources and the requirements of the systems being developed.…”
Until recently, most systems performing temporal extraction and reasoning from text have focused on recognizing and normalizing temporal expressions alone, for which the TIDES annotation scheme has been adopted. Temporal awareness of a text, however, involves not only identifying the temporal expressions, but the events which these expressions anchor, as well as other events which must be ordered relative to them. Because of these broader concerns, TimeML has been developed as an annotation specification that encompasses not only temporal expressions, but all temporally relevant aspects of a text. The annotation schemes, however, are not interchangeable, resulting in incompatible corpora and accompanying extraction algorithms for each standard. In this paper, we describe an automatic migration process from the TIMEX2 tags of TIDES to the TIMEX3 tags of TimeML. This transformation procedure has been implemented and evaluated with two different corpora, obtaining 93.3 and 89.2% overall F-Measure respectively.
“…We took advantage of the architecture of an existing rule-based system developed for Spanish (TER-SEO, see [37]), where the recognition model is language-dependent but the normalizing procedure is completely language independent. In this way, the approach is capable of learning the recognition model automatically, adjusting the set of normalization rules using different available resources.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.