We recast dependency parsing as a sequence labeling problem, exploring several encodings of dependency trees as labels. While dependency parsing by means of sequence labeling had been attempted in existing work, results suggested that the technique was impractical. We show instead that with a conventional BIL-STM-based model it is possible to obtain fast and accurate parsers. These parsers are conceptually simple, not needing traditional parsing algorithms or auxiliary structures. However, experiments on the PTB and a sample of UD treebanks show that they provide a good speed-accuracy tradeoff, with results competitive with more complex approaches.
The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Keywords: human language, dependency length, syntactic dependencies, Nontechnical, jargon-free summary: Syntactic relations between words (e.g., the one that links a verb with its subject) exhibit a strong tendency to not cross when drawn as arrows above the sentence. Traditionally, this has been assumed to result from an independent principle of syntax that reduces crossings practically to zero. An alternative view is that the trend arises naturally from the preference in human languages for word orders that keep related words close together. Our statistical analysis discards the traditional view in the majority of languages considered. The alternative approach can lead to a simpler theory of language.26 pages, 2 figures and 3 tables.
The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Here we quantify the amount of crossings of real sentences and compare it to the predictions of a series of baselines. We conclude that crossings are really scarce in real sentences. Their scarcity is unexpected by the hubiness of the trees. Indeed, real sentences are close to linear trees, where the potential number of crossings is maximized.
In recent years, we have witnessed a rise in fake news, i.e., provably false pieces of information created with the intention of deception. The dissemination of this type of news poses a serious threat to cohesion and social well-being, since it fosters political polarization and the distrust of people with respect to their leaders. The huge amount of news that is disseminated through social media makes manual verification unfeasible, which has promoted the design and implementation of automatic systems for fake news detection. The creators of fake news use various stylistic tricks to promote the success of their creations, with one of them being to excite the sentiments of the recipients. This has led to sentiment analysis, the part of text analytics in charge of determining the polarity and strength of sentiments expressed in a text, to be used in fake news detection approaches, either as a basis of the system or as a complementary element. In this article, we study the different uses of sentiment analysis in the detection of fake news, with a discussion of the most relevant elements and shortcomings, and the requirements that should be met in the near future, such as multilingualism, explainability, mitigation of biases, or treatment of multimedia elements.
We describe an opinion mining system which classifies the polarity of Spanish texts. We propose an NLP approach that undertakes pre-processing, tokenisation and POS tagging of texts to then obtain the syntactic structure of sentences by means of a dependency parser. This structure is then used to address three of the most significant linguistic constructions for the purpose in question: intensification, subordinate adversative clauses and negation. We also propose a semi-automatic domain adaptation method to improve the accuracy of our system in specific application domains, by enriching semantic dictionaries using machine learning methods in order to adapt the semantic orientation of their words to a particular field. Experimental results are promising in both general and specific domains.
We introduce a method to reduce constituent parsing to sequence labeling. For each word w t , it generates a label that encodes: (1) the number of ancestors in the tree that the words w t and w t+1 have in common, and (2) the nonterminal symbol at the lowest common ancestor. We first prove that the proposed encoding function is injective for any tree without unary branches. In practice, the approach is made extensible to all constituency trees by collapsing unary branches. We then use the PTB and CTB treebanks as testbeds and propose a set of fast baselines. We achieve 90.7% F-score on the PTB test set, outperforming the Vinyals et al.(2015) sequence-to-sequence parser. In addition, sacrificing some accuracy, our approach achieves the fastest constituent parsing speeds reported to date on PTB by a wide margin. 1Contribution We propose a method to transform constituent parsing into sequence labeling.
We introduce dependency parsing schemata, a formal framework based on Sikkel's parsing schemata for constituency parsers, which can be used to describe, analyze, and compare dependency parsing algorithms. We use this framework to describe several well-known projective and non-projective dependency parsers, build correctness proofs, and establish formal relationships between them. We then use the framework to define new polynomial-time parsing algorithms for various mildly non-projective dependency formalisms, including well-nested structures with their gap degree bounded by a constant k in time O(n5+2k), and a new class that includes all gap degree k structures present in several natural language treebanks (which we call mildly ill-nested structures for gap degree k) in time O(n4+3k). Finally, we illustrate how the parsing schema framework can be applied to Link Grammar, a dependency-related formalism.
We present a novel unsupervised approach for multilingual sentiment analysis driven by compositional syntax-based rules. On the one hand, we exploit some of the main advantages of unsupervised algorithms: (1) the interpretability of their output, in contrast with most supervised models, which behave as a black box and (2) their robustness across different corpora and domains. On the other hand, by introducing the concept of compositional operations and exploiting syntactic information in the form of universal dependencies, we tackle one of their main drawbacks: their rigidity on data that are structured differently depending on the language concerned. Experiments show an improvement both over existing unsupervised methods, and over state-of-the-art supervised models when evaluating outside their corpus of origin. Experiments also show how the same compositional operations can be shared across languages. The system is available at http://www.grupolys. org/software/UUUSA/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.