2008
DOI: 10.1075/ijcl.13.4.06ray
|View full text |Cite
|
Sign up to set email alerts
|

From key words to key semantic domains

Abstract: This paper reports the extension of the key words method for the comparison of corpora. Using automatic tagging software that assigns part-of-speech and semantic field (domain) tags, a method is described which permits the extraction of key domains by applying the keyness calculation to tag frequency lists. The combination of the key words and key domains methods is shown to allow macroscopic analysis (the study of the characteristics of whole texts or varieties of language) to inform the microscopic level (fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
306
0
7

Year Published

2012
2012
2022
2022

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 517 publications
(350 citation statements)
references
References 29 publications
2
306
0
7
Order By: Relevance
“…For instance, thank you very much is a recurrent spoken-form bundle in daily conversation to express one's utmost gratitude to the addressed person, while as can be seen is a common written-form bundle in academic prose to make readers aware of the research results shown in tables or figures. There are a number of parallel terms for denoting similar notions of lexical bundles in pertinent literature, such as clusters (Hyland, 2008a, b), n-grams (Stubbs, 2007), lexical phrases (Li & Schmitt, 2009), prefabricated patterns (Granger, 1998), formulaic sequences (Simpson-Vlach & Ellis, 2010), sentence stems (Pawley & Syder, 1983), extended collocations (Cortes, 2004), and multi-word expressions (Rayson, 2008).…”
Section: Definition and Characteristics Of Lexical Bundlesmentioning
confidence: 99%
“…For instance, thank you very much is a recurrent spoken-form bundle in daily conversation to express one's utmost gratitude to the addressed person, while as can be seen is a common written-form bundle in academic prose to make readers aware of the research results shown in tables or figures. There are a number of parallel terms for denoting similar notions of lexical bundles in pertinent literature, such as clusters (Hyland, 2008a, b), n-grams (Stubbs, 2007), lexical phrases (Li & Schmitt, 2009), prefabricated patterns (Granger, 1998), formulaic sequences (Simpson-Vlach & Ellis, 2010), sentence stems (Pawley & Syder, 1983), extended collocations (Cortes, 2004), and multi-word expressions (Rayson, 2008).…”
Section: Definition and Characteristics Of Lexical Bundlesmentioning
confidence: 99%
“…Corpus linguistics combined with NLP can be used to infer properties of a document by comparison to a large corpus of text whose properties are known a priori; this has also found application in RE [27], [30], particularly for abstraction identification [12]. Here, shallow semantics can be inferred from lexical form and context to (e.g.)…”
Section: Related Workmentioning
confidence: 99%
“…In both modes, we performed a manual analysis and a supervised semi-automatic analysis using Wmatrix [27].…”
Section: B Thematic Analysismentioning
confidence: 99%
“…Currently, the lexicon contains nearly 37,000 words and the idiom list contains over 16,000 multi-word units. An idiom list enables the corpus tool to identify any idiomatic expressions, usually non-decompositional sequences, and to assign a special set of tags to the words in that particular idiomatic phrase to denote a part-of-speech relation above the level of the word (Rayson, 2008). The semantic tagset, loosely based on the LongmanLexicon of Contemporary English, has a multitier structure with 21 major semantic fields and more than 232 subdivisions.…”
Section: Corpus Tool Used In This Study: Wmatrixmentioning
confidence: 99%
“…One way to address the above problems is to conduct key part-of-speech and key semantic analyses which give rise to analytical categories that (1) are fewer than keywords, thus reducing the number of categories a researcher needs to take into account, and (2) group lower frequency words which might not appear as keywords individually and could thus be overlooked (Rayson, 2008).…”
Section: Introductionmentioning
confidence: 99%