An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

Rychlý, Pavel; Kilgarriff, Adam

doi:10.3115/1557769.1557783

Cited by 24 publications

(19 citation statements)

References 12 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Finally, we also can find some work measuring both the complexity and computational efficiency of the algorithm implemented to make pairwise comparisons [9,20]. As the accuracy of any extraction system does not depend on the chosen algorithm, we will not compare systems with regard to this specific parameter.…”

Section: Related Workmentioning

confidence: 99%

“…So, there is no reason to check them. Following [9,20], we implemented an algorithm that only compares word pairs sharing at least one context. As the list of words sharing a context is small (in general, less than 1000), the quadratic complexity of the entire algorithm turns out to be manageable.…”

Section: Algorithmmentioning

confidence: 99%

“…Some are interested in testing whether changes in the word space model can improve the results [13]. And, there is also some work comparing the computational efficiency of the underlying algorithm [20].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Comparing Different Properties Involved in Word Similarity Extraction

Gamallo

2009

Progress in Artificial Intelligence

View full text Add to dashboard Cite

Abstract. In this paper, we will analyze the behavior of several parameters, namely type of contexts, similarity measures, and word space models, in the task of word similarity extraction from large corpora. The main objective of the paper will be to describe experiments comparing different extraction systems based on all possible combinations of these parameters. Special attention will be paid to the comparison between syntax-based contexts and windowing techniques, binary similarity metrics and more elaborate coefficients, as well as baseline word space models and Singular Value Decomposition strategies. The evaluation leads us to conclude that the combination of syntax-based contexts, binary similarity metrics, and a baseline word space model makes the extraction much more precise than other combinations with more elaborate metrics and complex models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Comparing Different Properties Involved in Word Similarity Extraction

Gamallo

2009

Progress in Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…The Google Book syntactic n-grams dataset provides dependency fragment counts by the years. However, instead of using the plain syntactic n-grams, we use a far richer representation of the data in the form of a distributional thesaurus (Lin, 1997;Rychlý and Kilgarriff, 2007). In specific, we prepare a distributional thesaurus (DT) for each of the time periods separately and subsequently construct the required networks.…”

Section: Datasets and Graph Constructionmentioning

confidence: 99%

That's sick dude!: Automatic identification of word sense change across different timescales

Mitra

Riedl

et al. 2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

105

View full text Add to dashboard Cite

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.

show abstract

“…In this case, the search terms which were used were potential candidates for discussing mock politeness, which had been identified by using terms discussed in the relevant literature and potential synonyms (as retrieved through the Sketch Engine distributional thesaurus (Rychly and Kilgarriff, 2007). 9 Using this method of compilation a corpus of approximately 61 million tokens was created.…”

Section: Compilation and Annotation Of The Corporamentioning

confidence: 99%

Beyond sarcasm: The metalanguage and structures of mock politeness

Taylor

2015

Journal of Pragmatics

View full text Add to dashboard Cite

Beyond sarcasm: the metalanguage and structures of mock politeness Article (Published Version) http://sro.sussex.ac.uk Taylor, Charlotte (2015) Beyond sarcasm: the metalanguage and structures of mock politeness. Journal of Pragmatics, 87. pp. 127-141. ISSN 0378-2166 This version is available from Sussex Research Online: http://sro.sussex.ac.uk/56624/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher's version. Please see the URL above for details on accessing the published version. Copyright and reuse:Sussex Research Online is a digital repository of the research output of the University.Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. AbstractThis paper aims to cast light on the somewhat neglected area of mock politeness. The principle objectives are to describe the ways that mock politeness is talked about and performed. In order to investigate such usage, I analyse data from informal, naturally occurring conversations in a UK-based online forum. The paper introduces a range of metalinguistic expressions which are used to refer to mock polite behaviours in lay interactions and describes the different structures of mock polite behaviours. The analysis shows that both metalanguage and structure are more diverse than anticipated by previous research and, as a result, the paper argues against equating mock politeness with sarcasm and calls for further research into mock politeness as an important strategy of impoliteness.

show abstract

An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

Cited by 24 publications

References 12 publications

Comparing Different Properties Involved in Word Similarity Extraction

Comparing Different Properties Involved in Word Similarity Extraction

That's sick dude!: Automatic identification of word sense change across different timescales

Beyond sarcasm: The metalanguage and structures of mock politeness

Contact Info

Product

Resources

About