Abstract:Gramática + Guia (o título e o subtítulo elucidam o propósito dos autores), destinado a aprendizes / usuários que estejam em nível adiantado. Os autores, conhecidos nas áreas da Lingüística Aplicada e do Ensino de Inglês, atuam na School of English Studies, University of Nottingham. Este extenso volume (quase 1.000 páginas), fruto de 7 anos de pesquisas e produção textual, contou com um Reference Panel constituído por 11 anglicistas de 10 países, o que dá ao livro um perfil quaseinternacional (África e América… Show more
“…Clauses, considered core units of grammar, center around a verb phrase that largely determines what else must or may occur [26]. Clauses can be categorized by the inner verb type:…”
Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style discourse parsing, split-and-rephrase (SPRP), text simplification (TS), simple sentence decomposition (SSD), etc. However, these works are not applicable for semantic parsing such as abstract meaning representation (AMR) parsing and semantic dependency parsing due to misalignments with semantic relations and unavailabilities to preserve the original semantics. Following the same intuition and avoiding the deficiencies of previous works, we propose a novel framework, hierarchical clause annotation (HCA), for capturing clausal structures of complex sentences, based on the linguistic research of clause hierarchy. With the HCA framework, we annotated a large HCA corpus to explore the potentialities of integrating HCA structural features into semantic parsing with complex sentences. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide neural baseline models for more-silver annotations. In evaluating the proposed models on our manually annotated HCA dataset, the performances of clause segmentation and parsing resulted in 91.3% F1-scores and 88.5% Parseval scores, respectively. Due to the same model architectures employed, the performance differences of the clause/discourse segmentation and parsing subtasks was reflected in our HCA corpus and compared discourse corpora, where our sentences contained more segment units and fewer interrelations than those in the compared corpora.
“…Clauses, considered core units of grammar, center around a verb phrase that largely determines what else must or may occur [26]. Clauses can be categorized by the inner verb type:…”
Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style discourse parsing, split-and-rephrase (SPRP), text simplification (TS), simple sentence decomposition (SSD), etc. However, these works are not applicable for semantic parsing such as abstract meaning representation (AMR) parsing and semantic dependency parsing due to misalignments with semantic relations and unavailabilities to preserve the original semantics. Following the same intuition and avoiding the deficiencies of previous works, we propose a novel framework, hierarchical clause annotation (HCA), for capturing clausal structures of complex sentences, based on the linguistic research of clause hierarchy. With the HCA framework, we annotated a large HCA corpus to explore the potentialities of integrating HCA structural features into semantic parsing with complex sentences. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide neural baseline models for more-silver annotations. In evaluating the proposed models on our manually annotated HCA dataset, the performances of clause segmentation and parsing resulted in 91.3% F1-scores and 88.5% Parseval scores, respectively. Due to the same model architectures employed, the performance differences of the clause/discourse segmentation and parsing subtasks was reflected in our HCA corpus and compared discourse corpora, where our sentences contained more segment units and fewer interrelations than those in the compared corpora.
“…Spoken grammar is defined as a linguistic feature based on the corpora of English language utterance which is distinct from written language (Leech, 2000). The grammar of speech is discussed under two camps: the first refers to works of Carter and McCarthy (2006), reflecting the difference between written and spoken genre, while the second camp represents Biber et al's (1999) comparative analysis of spoken and written genre realizing the communicative objective in the context of discourse (Leech, 2000). Carter and McCarthy's (2006) corpus-based analysis of 700 million words included 5 million words of transcribed conversations within different settings, namely private homes, shops, offices, public places, and educational institutions.…”
Section: Spoken Grammarmentioning
confidence: 99%
“…The grammar of speech is discussed under two camps: the first refers to works of Carter and McCarthy (2006), reflecting the difference between written and spoken genre, while the second camp represents Biber et al's (1999) comparative analysis of spoken and written genre realizing the communicative objective in the context of discourse (Leech, 2000). Carter and McCarthy's (2006) corpus-based analysis of 700 million words included 5 million words of transcribed conversations within different settings, namely private homes, shops, offices, public places, and educational institutions. The authors explained the four characteristics of spoken grammar beginning with the first: how an utterance or spoken feature is reflected in discourse (such as deixis, ellipsis, headers, tails, question tags, vagueness, and approximations).…”
Section: Spoken Grammarmentioning
confidence: 99%
“…Comparatively, the written form has real time response delays, is often planned, and is more explicit without having to rely on expressions, intonations, and gestures to communicate. Various researchers have identified key linguistic features and explained the characteristics of spoken text to include orthographic transcription, real time, shared context, interactivity and style (Biber et al, 1999;Biber & Conrad, 2009;Camiciottoli, 2007;Carter & McCarthy, 2006;Moreau, 2018).…”
Section: Characteristic and Aspects Of Spoken Grammarmentioning
confidence: 99%
“…This research analyzes 92 startup pitches that reflect the aspects of spoken grammar which are more frequent in spoken corpus when compared to written registers to include linguistic features such as discourse markers, dysfluencies, numeral phrases, pronoun, reduced form, parallelism and repetitions, rhetorical questions, modality, vagueness and vocatives.The spoken data are transcribed, recorded, and analyzed by frequency-based corpus linguistics tools (Leech, 2000). The key linguistic features of spoken data are analyzed and explained to reflect the characteristics of the corpus including orthographic transcription, real time, shared context, interactivity and style (Biber et al, 1999;Biber & Conrad, 2009;Carter & McCarthy, 2006;Camiciottoli, 2007;Moreau, 2018).…”
The aim of this study is to analyze spoken linguistic features of three-minute startup pitches. Linguistic features analyzed included discourse markers, dysfluency, modality, numeral phrases, pronouns, reduced forms, repetitions, rhetorical questions, vague expressions, and vocatives. The corpus is comprised of 92 startup pitches delivered in real time at a pitching competition as part of an international technology conference. The pitches were transcribed, and linguistic features were identified with the aid of concordance software. Results from the analysis of linguistic features show that startup pitches contain aspects typically found in spoken genres, reflecting orthographic transcription, real time, shared context, interactivity and style.
Most natural language processing (NLP) tasks operate an input sentence as a sequence with token-level embeddings and features, despite its clausal structures. Taking Abstract Meaning Representation (AMR) parsing as an example, recent parsers are empowered by Transformers and pre-trained language models, but long-distance dependencies (LDDs) introduced by long sequences are still open problems. We argue that LDDs are not superficially blamed on the sequence length but are essentially related to the internal clause hierarchy. Typically, non-verb words in a clause cannot depend on words outside, and verbs from different but related clauses have much longer dependencies than those in the same clause. With this intuition, we introduce a type of clausal feature, hierarchical clause annotation (HCA), into AMR parsing and propose two HCA-based approaches, HCA-based self-attention (HCA-SA) and HCA-based curriculum learning (HCA-CL), to integrate HCA trees of complex sentences for addressing LDDs. We conduct extensive experiments on two in-distribution (ID) AMR datasets (AMR 2.0 and AMR 3.0) and three out-of-distribution (OOD) ones (TLP, New3, and Bio). Experimental results show that our HCA-based approaches achieve significant and explainable improvements against the baseline model and outperform the state-of-the-art (SOTA) model when encountering sentences with complex clausal structures that introduce most LDD cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.