This paper introduces the concept of temporal word analogies: pairs of words which occupy the same semantic space at different points in time. One well-known property of word embeddings is that they are able to effectively model traditional word analogies ("word w 1 is to word w 2 as word w 3 is to word w 4 ") through vector addition. Here, I show that temporal word analogies ("word w 1 at time t α is like word w 2 at time t β ") can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space. When applied to a diachronic corpus of news articles, this method is able to identify temporal word analogies such as "Ronald Reagan in 1987 is like Bill Clinton in 1997", or "Walkman in 1987 is like iPod in 2007". BackgroundThe meanings of utterances change over time, due both to changes within the linguistic system and to changes in the state of the world. For example, the meaning of the word awful has changed over the past few centuries from something like "aweinspiring" to something more like "very bad", due to a process of semantic drift. On the other hand, the phrase president of the United States has meant different things at different points in time due to the fact that different people have occupied that same position at different times. These are very different types of changes, and the latter may not even be considered a linguistic phenomenon, but both types of change are relevant to the concept of temporal word analogies.I define a temporal word analogy (TWA) as a pair of words which occupy a similar semantic space at different points in time. For example, assuming that there is a semantic space associated with "President of the USA", this space was occupied by Ronald Reagan in the 1980s, and by Bill Clinton in the 1990s. So a temporal analogy holds: "Ronald Reagan in 1987 is like Bill Clinton in 1997".Distributional semantics methods, particularly vector-space models of word meanings, have been employed to study both semantic change and word analogies, and as such are well-suited for the task of identifying TWAs. The principle behind these models, that the meaning of words can be captured by looking at the contexts in which they appear (i.e. other words), is not a recent idea, and is generally attributed to Harris (1954) or Firth (1957. The modern era of applying this principle algorithmically began with latent semantic analysis (LSA) (Landauer and Dumais, 1997), and the recent explosion in popularity of word embeddings is largely due to the very effective word2vec neural network approach to computing word embeddings (Mikolov et al., 2013a). In these types of vector space models (VSMs), the meaning of a word is represented as a multi-dimensional vector, and semantically-related words tend to have vectors that relate to one another in regular ways (e.g. by occupying nearby points in the vector space). One factor in word embeddings' recent popularity is their eye-catching ability to m...
We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subtask 2: identifying texts by specific time language use.
Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.
Recently, there has been an explosion of interest in the use of textual sources (e.g., market reports, news articles, company reports) to predict changes in stock and commodity markets. Most of this research is on sentiment analysis, but some of it has tried to use the news itself to predict market movements. In this paper, we use 10-years of news articlesfrom a weekly, agricultural, trade newspaper -to predict price changes in a commodity market for beef. Two experiments explore the different ways in which news reports affect the market via (i) major market-impacting events (i.e., rare natural disasters or food scandals) or (ii) minor market-impacting events (e.g., mundane reports about inflation, oil prices, etc). We find that different techniques need to be used to uncover major events (e.g., LLRs) as opposed to minor events (e.g., classifiers) and show that no single unified predictive model appears to be able to do both.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.